A 'simple' subtitle generator created for my final project for CS50 2024 using google cloud speech-to-text api, flask, HTMX and Celery with RabbitMQ. CAUTION: this project is purely a prototype. Use at your own discretion! Because of the scope of the final project I decided not to implement authentication! the flask project can be deployed in a WSGI server for production. But as a self hosted tool on a private network it runs fine for now. This project uses a paid API that has a free tier, please consult the GCLOUD documentation on quotas and billing.
Video Demo CS50 3 minute version: youtube
(assuming you are running this in linux or WSL)
-
Make sure you create a project and setup billing in google cloud
-
Make sure you can authenticate through a service account json file (not recommended) or better use gcloud CLI to setup Application Default Credentials
-
Clone this repository
-
Install ffmpeg and make sure the
ffmpeg
andffprobe
commands are on your PATHbash sudo apt-get install ffmpeg
-
Make sure you run RabbitMQ running it through a docker container is surprisingly easy!
docker run -d -p 5672:5672 rabbitmq
-
create and enter virtual environment in your repo folder
python -m venv .venv source .venv/bin/activate
-
install all dependencies
pip install -r requirements.txt
-
Run the Celery worker, this is going to accept all the tasks from our app. With this command it runs in the foreground, you probably want to daemonize this file in production.
celery -A subtitler.celery_worker worker --loglevel INFO
-
Init the database
flask --app subtitler init-db
-
Make sure you configure your application through the config. You can overwrite the app.config in a
config.py
file in a folder calledinstance
in the root folder of the cloned repo. if you start the application the folder will be created automatically, these are the most important config keys:example config.py:
# please use a random and private key for production don't commit this key into a VCS SECRET_KEY = "dev", # configure these paths in your instance folder config.py for file uploads in a containerized read-only env DATABASE = os.path.join(app.instance_path, 'subtitler.sqlite'), UPLOAD_FOLDER = os.path.join(app.root_path, 'uploads'), # change this to your created storage bucket for your gcloud storage bucket STORAGE_BUCKET = 'your_gc_bucket', ALLOWED_EXTENSIONS = {'m4v','mp4','h264','mov'}, MAX_CONTENT_LENGTH = 32 * 1000 * 1000, PAGE_SIZE = 5, CELERY=dict( broker_url= "pyamqp://guest@localhost", result_backend = "rpc://", task_ignore_Result = True )
-
Run the subtitler flask app in debug mode
flask --app subtitler/ run --debug
-
Go to http://localhost:5000 and start creating subtitles!
-
subtitler/__init__.py
This file turns the subtitler application into a regular package. I choose the application factory with blueprints strategy to make sure the multiple applications can be instantiated and each route lives in it's own file. This also makes deploying the application and future testing of the application easier. In the root
__init__.py
file all the extensions are registered with the app, these will get their config from the application context when the config object is created for later use throughout the application. This application can later be used in your Celery worker. -
subtitler/static/
The static folder will contain all the static files for the frontend like CSS, javascript, images, favicons and other static files.
-
subtitler/tasks/video.py
The tasks folder contains all the Celery long running tasks. In this case it will contain one task
video.process()
in the packagevideo.py
. This task will do all the work when the video is received, using the util packagespeech_interface.py
. This function will communicate with the celery worker and handle resubmission and result processing. At the moment I am not keeping track of results. But the tasks will update the database and I use the project.status column to check the status of the processing by means of polling. -
subtitler/utils/speech_interface.py
This is the utility package that takes care of all the processing of the video, in short it contains functions that:
- will use FFPROBE to check for a valid video.
- will extract a poster image from the video using FFMPEG
- will extract the audio to a *.flac file through FFMPEG
- will upload this *.flac to a gcloud storage bucket
- will start transcribing this *.flac file to transcribe the speech to a LongRunningRecognizeResponse object
- will process these results and take all the offsets from wordinfo objects and cut all the transcripts up into two line 42 character wide subtitle lines.
- will store these lines in the database under the project id with text, start and end times for later use.
-
subtitler/utils/VTT.py
This package contains functions that create a VTT file from our database entries. But also contain functions that transform floats to proper VTT time labels. For example:
create_time_label(140.123)
will output00:02:20.123
conform VTT standard.It also contains functions to load a VTT file as a StringIO in-memory file-like object for creating a *.VTT file on the fly.
-
subtitler/celery_worker.py
This is the file that gets run by Celery, when running it contains an entire instance of our application allowing you to interact with the database and other functions.
-
subtitler/celery.py
This file is used to init Celery.
-
subtitler/db.py
This file creates, registers and initializes our database. It also registers some
Click
command-line functions and also contains some functions for retrieving rows from the database. -
subtitler/htmx.py
Together with the HTMX javascript file. I'm using the flask-htmx package to simplify generating htmx specific responses and process htmx specific requests. HTMX is build on the HATEOAS (hypermedia as the state of application state) concept. Instead of sending large json objects and a large client library that has to maintain a representation of the state of the application client side. HTMX will retrieve the state server side by replacing html in the dom by retrieving html components rendered server-side.
-
subtitler/templates
This folder contains all the Jinja templates. A base layout, pages, partials and other includes. Jinja works really well together with htmx, and the jinja2-fragments package that makes rendering a inline block from your template possible when a request comes from htmx and keeps all your blocks in the same file as the entire page. example:
if htmx: # this is a htmx request render only the project table return render_block('pages/projects.html', 'projects_block', projects = projects, pagination = pagination) # it is a normal request from the browser send back the entire project page return render_template('pages/projects.html', projects = projects, pagination = pagination)
-
subtitler/subtitles.py
This is the blueprint for all the subtitle routes. It contains all the endpoints for dealing with subtitle lines. It also contains al the functions that modify subtitles and filters for our templates.
-
subtitler/projects.py
This is the blueprint for all the project routes. It contains all the endpoints for dealing with your project. It also contains al the functions that modify your project and the function that starts a task in a celery worker.
-
requirements.txt
The requirements file tells pip to install all the right packages in your venv folder (when activated)!
- Proper error handling everywhere
- Authentication
- VTT parsing for generating editable subtitles from a VTT file.
- During the development of this app it became apparent to me how well all the GCloud APIs work together. While I really like self hosting everything. Creating and managing tasks running a message queue and implementing authentication between services and users is a lot of overhead. This project will benefit a lot from moving it entirely to the cloud with app engine, cloud functions, cloud triggers cloud storage and database all in the google ecosystem.
- It currently uses API v1 but should switch to API v2 soon to benefit from nice features like: batched long running tasks and higher quality transcriptions. These are also cheaper.
- move to SCSS, Tailwind or move towards using a pre built frontend library. I created all the CSS and components myself, but does not feel very scalable at the moment, but I like doing it.
Thanks to all the maintainers for these packages. You can read more about the technologies used below:
check out CS50: https://cs50.harvard.edu/x/2024/
Please open an issue for support.
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.