A Python script that converts PDF files to speech (MP3) using Google Cloud Text-to-Speech API, effectively creating audiobooks from text documents.
Table of Contents
- Extract text from PDF files
- Convert text to natural-sounding speech
- Output as MP3 audio files
- Handle large PDFs by splitting text into manageable chunks
- Customizable voice options (language, gender)
- Python 3.7+
- Google Cloud account (free tier available)
- VS Code (recommended) or any Python IDE
- Clone the repository:
git clone https://github.com/garschke/pdf-to-speech.git
cd pdf-to-speech
- Create and activate a virtual environment:
python -m venv venv # Windows
.\venv\Scripts\activate # Windows
python3 -m venv .venv # Mac/Linux
source .venv/bin/activate # Mac/Linux
- Install dependencies:
pip3 install -r requirements.txt
- Go to Google Cloud Console
- Create a new project
- Enable the Text-to-Speech API
- Create a service account and download the JSON key file
- Save the key file in the project folder as
google_credentials.json
Run the script with:
python pdf_to_speech.py # Windows
python3 pdf_to_speech.py # Mac/Linux
When prompted:
- Enter the path to your PDF file
- Enter the desired output MP3 filename (default: output.mp3)
The script will:
- Extract text from the PDF
- Convert the text to speech using Google's API
- Save the audio as an MP3 file
pdf-to-speech/
├── pdf_to_speech.py # Main conversion script
├── google_credentials.json # Google Cloud credentials
├── requirements.txt # Dependencies
├── .gitignore # Files to ignore in version control
├── static/
│ └── image/ # Images (pdf to text audiobook logo)
├── modules/
│ └── logger.py # Logging module
├── input/
│ └── test.pdf # Example PDF file (used if no filepath provided)
├── output/
│ └── output.mp3 # MP3 output (default output filepath if none provided)
└── README.md # This file
You can modify these aspects in the code:
- Voice parameters (language, gender, specific voice model)
- Audio format (currently MP3)
- Text chunk size (default 4950 characters per API request)
- Note: Although Google has 5000 byte limit for API, going above the 4950 default might throw errors
Without DEBUG logging
python3 pdf_to_speech.py
Enter path to PDF file (default: input/test.pdf):
Enter the output MP3 file name (default: output/output.mp3):
| Processing chunk 1/1...
| Audio content written to file 'output/output.mp3'
With DEBUG logging
python3 pdf_to_speech.py
| Starting PDF to Speech conversion app
Enter path to PDF file (default: input/test.pdf):
| No PDF file path provided so default input/test.pdf used
| Valid PDF file found!
| File 'input/test.pdf' exists and is a PDF file.
Enter the output MP3 file name (default: output/output.mp3):
| Output file name: output/output.mp3
| Starting conversion...
| Extracting text from input/test.pdf...
|
Extraxted text:
PDF to Speech :PDF to speech dot PY , is a Python script that converts PDF files to speech in the form of MP3 files, using the Google Cloud Text-to-Speech API, effectively creating audiobooks from text documents.
| Text length: 212 characters, 37 words
| Converting text to Speech...
| Splitting text into chuncks of 4950 charaters
| Processing chunk 1/1...
| Chunk length: 212 characters, 37 words
| Chunk 1 processed.
| All chnunks processed.
| Total audio length: 133248 bytes
| Writing audio content to file 'output/output.mp3'...
| Audio content written to file 'output/output.mp3'
| PDF to Speech conversion app finished
- 🔲 - Progress Tracking: Add a progress bar for large PDFs
- 🔲 - SSML Support: Implement Speech Synthesis Markup Language for better pronunciation
- 🔲 - GUI: Create a simple Flask web interface or Tkinter desktop app
- 🔲 - Configuration: Allow users to select different voices and languages
Contributions are welcome! Please open an issue or submit a pull request for any improvements.
- Create your feature branch: git checkout -b feature/NewFeature
- Commit your changes: git commit -m 'Add new feature'
- Push to the branch: git push origin feature/NewFeature
- Open a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Cloud Text-to-Speech API
- PyPDF2 library for PDF text extraction
- Python community for excellent tooling