The following repository holds a Python CLI which aims to perform transcriptions of long audio files using Speech-to-Text V2 API from GCP. Read the following README.md file for configurations required.
This script requires to install and configure the CLI for Google Cloud in order to use Python client and interact with GCP services. Refer to installation documentation (docs).
If no other configuration is found for SDK, execute the following command:
gcloud initUser local credentials for Python client (no JSON keys):
gcloud auth application-default loginThis service needs to be enabled in the GCP project, as well as Cloud Storage. Refer to GCP console for enabling both services or execute the following command:
gcloud services enable speech.googleapis.com storage.googleapis.comThis resource if required for long running transcriptions. If no other bucket is available, the following command deploys a single bucket:
gcloud storage buckets create "gs://{bucket-name}" --project="{project-id}" --location="us-central1"This script uses Dynamic batch speech recognition from Speech to Text API V2 for the transcription (docs). Pricing for this API is 0.003 USD per minute (docs).
Pricing due to this service depends on size of stored data (USD/GB per month), bucket configurations and operations. Refer to pricing documentation (docs).
By default, the suggested command from above deploys a bucket in a single region (us-central1)
and of type Standard Storage.
mkdir -p ./assets/downloads ./assets/temp ./assets/tracking/requests ./assets/tracking/validationThe following commands will create a .env file that stores GCP project ID and
the GCS bucket name. Replace with corresponding values.
echo "GCP_PROJECT_ID=\"{project-id}\"" >> ./.env
echo "GCP_BUCKET_NAME=\"{bucket-name}\"" >> ./.env
echo "GCP_LANGUAGE_CODE=\"{language-code}\"" >> ./.envFor GCP_LANGUAGE_CODE variable, refer to codes available for GCP's Chirp model at region us-central1 (docs). This code should represent the same/similar language from the audio you'd like to transcript.
Windows
python -m venv .venv
.venv\Scripts\activate.bat
pip3 install -r requirements.txtLinux
python -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txtSee available commands at:
python src/main.py --helpSee help for individual commands:
python src/main.py {command} --helpRefer to the file tutorial.md for a transcription example.
- Cloud Speech-to-Text V2 API: https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-speech-to-text-v2-api
- Google transcription models: https://cloud.google.com/speech-to-text/v2/docs/transcription-model
- Chirp language model: https://cloud.google.com/speech-to-text/v2/docs/chirp-model
- Speech-to-Text pricing: https://cloud.google.com/speech-to-text/pricing
- GCP Operations Python client: https://googleapis.dev/python/google-api-core/latest/operation.html
- GCP Speech-to-Text Python client: https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.services.speech.SpeechClient
- GCP Speech-to-Text API recognizers: https://cloud.google.com/speech-to-text/v2/docs/reference/rest/v2/projects.locations.recognizers/batchRecognize
- Language support for Speech-to-Text V2 API: https://cloud.google.com/speech-to-text/v2/docs/speech-to-text-supported-languages
- API Quotas: https://cloud.google.com/speech-to-text/v2/quotas