Skip to content

drodrigo7/e22-gcp-stt-transcriptor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transcription CLI with GCP's Speech-to-Text V2 (Chirp Model)

The following repository holds a Python CLI which aims to perform transcriptions of long audio files using Speech-to-Text V2 API from GCP. Read the following README.md file for configurations required.

1. Google Cloud

1.1. GCP SDK

This script requires to install and configure the CLI for Google Cloud in order to use Python client and interact with GCP services. Refer to installation documentation (docs).


1.2. GCP Configuration

1.2.1. CLI Initialization

If no other configuration is found for SDK, execute the following command:

gcloud init

1.2.2. Default Credentials

User local credentials for Python client (no JSON keys):

gcloud auth application-default login

1.2.3. GCP Speech-to-Text API

This service needs to be enabled in the GCP project, as well as Cloud Storage. Refer to GCP console for enabling both services or execute the following command:

gcloud services enable speech.googleapis.com storage.googleapis.com

1.2.4. Storage Bucket

This resource if required for long running transcriptions. If no other bucket is available, the following command deploys a single bucket:

gcloud storage buckets create "gs://{bucket-name}" --project="{project-id}" --location="us-central1"

2. GCP Pricing

2.1. Speech-to-Text V2

This script uses Dynamic batch speech recognition from Speech to Text API V2 for the transcription (docs). Pricing for this API is 0.003 USD per minute (docs).

2.2. Google Cloud Storage

Pricing due to this service depends on size of stored data (USD/GB per month), bucket configurations and operations. Refer to pricing documentation (docs). By default, the suggested command from above deploys a bucket in a single region (us-central1) and of type Standard Storage.


3. Environment Configuration

3.1. Directories

mkdir -p ./assets/downloads ./assets/temp ./assets/tracking/requests ./assets/tracking/validation

3.2. Environment variables

The following commands will create a .env file that stores GCP project ID and the GCS bucket name. Replace with corresponding values.

echo "GCP_PROJECT_ID=\"{project-id}\"" >> ./.env
echo "GCP_BUCKET_NAME=\"{bucket-name}\"" >> ./.env
echo "GCP_LANGUAGE_CODE=\"{language-code}\"" >> ./.env

For GCP_LANGUAGE_CODE variable, refer to codes available for GCP's Chirp model at region us-central1 (docs). This code should represent the same/similar language from the audio you'd like to transcript.

3.3. Virtual environment

Windows

python -m venv .venv
.venv\Scripts\activate.bat
pip3 install -r requirements.txt

Linux

python -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

4. CLI usage

See available commands at:

python src/main.py --help

See help for individual commands:

python src/main.py {command} --help

5. Tutorial

Refer to the file tutorial.md for a transcription example.


6. Additional resources

About

Python CLI for transcription of long audio files

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages