This project allows you to download, process, and transcribe YouTube videos using AssemblyAI. The entire process is containerized using Docker, making it easy to set up and run on any machine.
Before you can run this project, ensure you have the following:
- Docker: Installed on your machine. You can download Docker from here.
- AssemblyAI Account: Sign up for a free account at AssemblyAI to obtain an API key.
-
Sign Up for AssemblyAI:
- Go to AssemblyAI's website and sign up for a free account.
- After signing up, navigate to your dashboard where you'll find your API key. Copy this key, as you'll need it in the next step.
-
Create a
.envFile:- In your project directory (the directory where you will run the Docker container), create a
.envfile. This file will store your AssemblyAI API key. - Add the following line to the
.envfile:ASSEMBLYAI_API_KEY=<YOUR_API_KEY>
- Replace
<YOUR_API_KEY>with the API key you copied from the AssemblyAI dashboard.
Example
.envfile:ASSEMBLYAI_API_KEY=1234567890abcdef1234567890abcdef
- In your project directory (the directory where you will run the Docker container), create a
-
Save the
.envFile:- Make sure the
.envfile is saved in the directory where you will be running the Docker commands.
- Make sure the
Follow these steps to pull the Docker image and run the transcription process:
-
Pull the Docker Image:
- Pull the pre-built Docker image from Docker Hub using the following command:
docker pull agentmaddy/yt-transcriptor
- Pull the pre-built Docker image from Docker Hub using the following command:
-
Run the Docker Container:
-
Use the following command to run the Docker container. This command mounts your current directory to the
/appdirectory inside the container and runs the transcription script. -
Replace
FILE_NAMEwith your desired output file name and the YouTube URL with the URL of the video you want to transcribe. -
Add a .env file and necessary credentials.
docker run --env-file .env -v $(pwd):/data agentmaddy/yt-transcriptor python /app/app.py -o /data/output --urls https://www.youtube.com/watch\?v\=KUECJHlV1LE
-
Explanation:
-v $(pwd):/data: Mounts the current directory on your local machine to the/datadirectory inside the Docker container. This allows the container to save output files directly to your local directory.agentmaddy/yt-transcriptor: The Docker image you pulled from Docker Hub.python app.py -o FILE_NAME --urls https://www.youtube.com/watch?v=KUECJHlV1LE: The command run inside the container, whereapp.pyprocesses the YouTube video, extracts the audio, and generates a transcription.
-
-
Accessing the Output:
- The output files, including the
.mp3audio file and the transcription.txtfile, will be saved in the directory where you ran the Docker command. - For example, if you run the command from
/home/user/projects, the output files will be saved in/home/user/projects.
- The output files, including the
To transcribe a YouTube video with the title "How to Dockerize Your Python Applications", you would run the following:
docker run --env-file .env -v $(pwd):/data agentmaddy/yt-transcriptor python /app/app.py -o /data/output --urls https://www.youtube.com/watch\?v\=KUECJHlV1LE- This command will download the video, extract the audio, and save the transcription as
docker_tutorial_transcript.txtin your current directory.
- Docker Not Found: Make sure Docker is installed and running on your machine.
- Invalid API Key: Ensure your
.envfile is correctly set up with a valid AssemblyAI API key. - Permission Issues: If you encounter permission issues when running the Docker container, try running the command with
sudo.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Set Up AssemblyAI: Sign up, obtain an API key, and create a
.envfile with the API key. - Pull Docker Image: Use
docker pull agentmaddy/yt-transcriptorto get the image. - Run the Docker Container: Execute the Docker command with your desired output file name and YouTube URL.
This documentation provides a comprehensive guide on how to set up and use your Dockerized YouTube transcription project. You can add this content to your README.md file in your GitHub repository to guide users through the process. Let me know if you need further assistance!