Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window

Speech-to-Text

Turn speech into text using Google AI

Convert audio into text transcriptions and integrate speech recognition into applications with easy-to-use APIs.

New customers also get up to $300 in free credits to try Speech-to-Text and other Google Cloud products.

Start transcribing Contact sales

Features

Advanced speech AI

Speech-to-Text can utilize Chirp, Google Cloud’s foundation model for speech trained on millions of hours of audio data and billions of text sentences. This contrasts with traditional speech recognition techniques that focus on large amounts of language-specific supervised data. These techniques give users improved recognition and transcription for more spoken languages and accents.

Support for 125 languages and variants

Build for a global user base with extensive language support. Transcribe short, long, and even streaming audio data. Speech-to-Text also offers users more accurate and globe-spanning translation and recognition with Chirp, the next generation of universal speech models. Chirp was built using self-supervised training on millions of hours of audio and 28 billion sentences of text spanning 100+ languages.

Transcribe short, long, or streaming audio

View guide

Pretrained or customizable models for transcription

Choose from a selection of trained models for voice control, phone call, and video transcription optimized for domain-specific quality requirements. Easily customize, experiment with, create, and manage custom resources with the Speech-to-Text UI.

Out-of-the-box regulatory and security compliance

Speech-to-Text API v2 gives enterprise and business customers added security and regulatory requirements out of the box. Data residency enables the invocation of transcription models through a fully regionalized service that taps into Google Cloud regions like Singapore and Belgium. Recognizer resourcefulness eliminates the need for dedicated service accounts for authentication and authorization. Logs for resource generation and transcription are made easily available in the Google Cloud console. And Speech-to-Text API v2 offers enterprise-grade encryption with customer-managed encryption keys for all resources as well as batch transcription.

AI-powered speech recognition and transcription

Speech-to-Text uses model adaptation to improve the accuracy of frequently used words, expand the vocabulary available for transcription, and improve transcription from noisy audio. Model adaptation lets users customize Speech-to-Text to recognize specific words or phrases more frequently than other options that might otherwise be suggested. For example, you could bias Speech-to-Text towards transcribing "weather" over "whether."

Streaming speech recognition

Receive real-time speech recognition results as the API processes the audio input streamed from your application’s microphone or sent from a prerecorded audio file (inline or through Cloud Storage).

Speech adaptation

Customize speech recognition to transcribe domain-specific terms and rare words by providing hints and boost your transcription accuracy of specific words or phrases. Automatically convert spoken numbers into addresses, years, currencies, and more using classes.

Speech-to-Text On-Prem

Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started.

Multichannel recognition

Speech-to-Text can recognize distinct channels in multichannel situations (for example, video conference) and annotate the transcripts to preserve the order.

Noise robustness

Speech-to-Text can handle noisy audio from many environments without requiring additional noise cancellation.

Domain-specific models

Choose from a selection of trained models for voice control and phone call and video transcription optimized for domain-specific quality requirements. For example, our enhanced phone call model is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.

Content filtering

Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter out profane words in text results.

Transcription evaluation

Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on your configuration.

Automatic punctuation (beta)

Speech-to-Text accurately punctuates transcriptions, such as by providing commas, question marks, and periods.

Speaker diarization

Know who said what by receiving automatic predictions about which of the speakers in a conversation spoke each utterance.

How It Works

Speech-to-Text has three main methods to perform speech recognition: synchronous, asynchronous, and streaming. Each method returns text results based on if transcription is needed in post processing, periodically, or in real time. Simply put, you'll input audio data and then receive a text-based response.

View documentation

Learn how to add Speech-to-Text to your apps

Demo

Test out the Speech-to-Text API

Quickly create audio transcription from a file upload or directly speaking into a mic.

Common Uses

Transcribe audio

Create an audio transcription

Learn how to use the Speech-to-Text API from within the Cloud Console by creating an audio transcription in just a few steps. You can also transcribe short, long, and streaming audio.

Start using Speech-to-Text

Tutorials, quickstarts, & labs

Create an audio transcription

Learn how to use the Speech-to-Text API from within the Cloud Console by creating an audio transcription in just a few steps. You can also transcribe short, long, and streaming audio.

Start using Speech-to-Text

Caption videos using AI

Create subtitles for videos using AI

Transcribe your audio and video to include captions. Add subtitles to existing content or in real time to streaming content. Our video transcription model is ideal for indexing or subtitling video and/or multispeaker content and uses similar machine learning technology as YouTube does for video captioning. This tutorial shows you how to use the Google Cloud AI services Speech-to-Text API and Translation API to add subtitles to videos and to provide localized subtitles in other languages.

Watch automated subtitles tutorial

Tutorials, quickstarts, & labs

Create subtitles for videos using AI

Transcribe your audio and video to include captions. Add subtitles to existing content or in real time to streaming content. Our video transcription model is ideal for indexing or subtitling video and/or multispeaker content and uses similar machine learning technology as YouTube does for video captioning. This tutorial shows you how to use the Google Cloud AI services Speech-to-Text API and Translation API to add subtitles to videos and to provide localized subtitles in other languages.

Watch automated subtitles tutorial

Add Speech-to-Text to apps

How to add Speech-to-Text to apps

Learn how you can quickly and easily enable Speech-to-Text for your application with Google Cloud. This video covers how to add AI to your application without extensive machine learning model experience. Using the pretrained Speech-to-Text API you'll quickly and easily enable AI for your application.

Watch example video

Advanced transcription powered by Google AI and API UI

Add voice control to apps

Tutorials, quickstarts, & labs

How to add Speech-to-Text to apps

Learn how you can quickly and easily enable Speech-to-Text for your application with Google Cloud. This video covers how to add AI to your application without extensive machine learning model experience. Using the pretrained Speech-to-Text API you'll quickly and easily enable AI for your application.

Watch example video

Add voice control to apps

Translate audio into text

Language, speech, text, and translation with Google Cloud APIs

In this course, you'll use the Speech-to-Text API to transcribe an audio file into a text file, translate with the Google Cloud Translation API, and create synthetic speech with Natural Language AI.

Start course

Tutorials, quickstarts, & labs

Language, speech, text, and translation with Google Cloud APIs

In this course, you'll use the Speech-to-Text API to transcribe an audio file into a text file, translate with the Google Cloud Translation API, and create synthetic speech with Natural Language AI.

Start course

Pricing

How Speech-to-Text pricing works	Speech-to-Text pricing is based on the API version, channels, batch methods, and any additional Google Cloud service costs like storage.
API version	Service and capability	Pricing
Speech-to-Text V1 API	V1 offers data residency for multi region only. Models include short, long, phone call, and video. V1 does not include audit logging. New customers get $300 in free credits and 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.	$0.024 per min
Speech-to-Text V2 API	V2 offers data residency for multi and single region. Models include short, long, telephony, video, and Chirp. V2 does include audit logging and support for customer managed encryption keys.	$0.016 per min

How Speech-to-Text pricing works

Speech-to-Text pricing is based on the API version, channels, batch methods, and any additional Google Cloud service costs like storage.

API version

Service and capability

Pricing

Speech-to-Text V1 API

V1 offers data residency for multi region only. Models include short, long, phone call, and video. V1 does not include audit logging. New customers get $300 in free credits and 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.

$0.024

per min

Speech-to-Text V2 API

V2 offers data residency for multi and single region. Models include short, long, telephony, video, and Chirp. V2 does include audit logging and support for customer managed encryption keys.

$0.016

per min

View pricing details for Speech-to-Text.

How Speech-to-Text pricing works

Speech-to-Text pricing is based on the API version, channels, batch methods, and any additional Google Cloud service costs like storage.

Speech-to-Text V1 API

Service and capability

V1 offers data residency for multi region only. Models include short, long, phone call, and video. V1 does not include audit logging. New customers get $300 in free credits and 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.

Pricing

$0.024

per min

Speech-to-Text V2 API

Service and capability

V2 offers data residency for multi and single region. Models include short, long, telephony, video, and Chirp. V2 does include audit logging and support for customer managed encryption keys.

Pricing

$0.016

per min

View pricing details for Speech-to-Text.

Pricing calculator

Estimate your monthly Speech-To-Text costs, including region specific pricing and fees.

Estimate costs

Custom quote

Connect with our sales team to get a custom quote for your organization.

Request a quote

Start your proof of concept

New customers get up to $300 in free credits to try Speech-to-Text and other Google Cloud products

Get started for free

Speech-to-Text

Turn speech into text using Google AI

Product highlights

Advanced speech AI

Support for 125 languages and variants

Pretrained or customizable models for transcription

Out-of-the-box regulatory and security compliance

AI-powered speech recognition and transcription

Streaming speech recognition

Speech adaptation

Speech-to-Text On-Prem

Multichannel recognition

Noise robustness

Domain-specific models

Content filtering

Transcription evaluation

Automatic punctuation (beta)

Speaker diarization

Test out the Speech-to-Text API

Transcribe audio

Create an audio transcription

Tutorials, quickstarts, & labs

Create an audio transcription

Caption videos using AI

Create subtitles for videos using AI

Tutorials, quickstarts, & labs

Create subtitles for videos using AI

Add Speech-to-Text to apps

How to add Speech-to-Text to apps

Tutorials, quickstarts, & labs

How to add Speech-to-Text to apps

Translate audio into text

Language, speech, text, and translation with Google Cloud APIs

Tutorials, quickstarts, & labs

Language, speech, text, and translation with Google Cloud APIs

Pricing calculator

Custom quote

Start your proof of concept

New customers get up to $300 in free credits to try Speech-to-Text and other Google Cloud products

Have a large project?

Speech-to-Text On-Prem

Speech-to-Text basics

Speech-to-Text code samples