PDF to Speech Converter

A Python script that converts PDF files to speech (MP3) using Google Cloud Text-to-Speech API, effectively creating audiobooks from text documents.

Table of Contents

PDF to Speech Converter
Features
Installation
User Guide
Project Structure
Configuration
Example
Future Enhancements
Contributing
- Fork the repository
License
Acknowledgements

Features

Extract text from PDF files
Convert text to natural-sounding speech
Output as MP3 audio files
Handle large PDFs by splitting text into manageable chunks
Customizable voice options (language, gender)

Installation

Prerequisites

Python 3.7+
Google Cloud account (free tier available)
VS Code (recommended) or any Python IDE

Steps

Clone the repository:

git clone https://github.com/garschke/pdf-to-speech.git
cd pdf-to-speech

Create and activate a virtual environment:

python -m venv venv           # Windows
.\venv\Scripts\activate       # Windows

python3 -m venv .venv         # Mac/Linux
source .venv/bin/activate     # Mac/Linux

Install dependencies:

pip3 install -r requirements.txt

Setup Google Cloud Text-to-Speech API

Go to Google Cloud Console
Create a new project
Enable the Text-to-Speech API
Create a service account and download the JSON key file
Save the key file in the project folder as google_credentials.json

User Guide

Run the script with:

python pdf_to_speech.py     # Windows
python3 pdf_to_speech.py    # Mac/Linux

When prompted:

Enter the path to your PDF file
Enter the desired output MP3 filename (default: output.mp3)

The script will:

Extract text from the PDF
Convert the text to speech using Google's API
Save the audio as an MP3 file

Project Structure

pdf-to-speech/
├── pdf_to_speech.py        # Main conversion script
├── google_credentials.json # Google Cloud credentials
├── requirements.txt        # Dependencies
├── .gitignore              # Files to ignore in version control
├── static/
│   └── image/              # Images (pdf to text audiobook logo)
├── modules/
│   └── logger.py           # Logging module
├── input/
│   └── test.pdf            # Example PDF file (used if no filepath provided)
├── output/
│   └── output.mp3          # MP3 output (default output filepath if none provided)
└── README.md               # This file

Configuration

You can modify these aspects in the code:

Voice parameters (language, gender, specific voice model)
- see https://cloud.google.com/text-to-speech/docs/list-voices-and-types#list_of_all_supported_languages
Audio format (currently MP3)
- see https://cloud.google.com/speech-to-text/docs/encoding#audio-encodings
Text chunk size (default 4950 characters per API request)
- Note: Although Google has 5000 byte limit for API, going above the 4950 default might throw errors

Example

Without DEBUG logging

python3 pdf_to_speech.py
Enter path to PDF file (default: input/test.pdf):
Enter the output MP3 file name (default: output/output.mp3):
| Processing chunk 1/1...
| Audio content written to file 'output/output.mp3'

With DEBUG logging

python3 pdf_to_speech.py
| Starting PDF to Speech conversion app
Enter path to PDF file (default: input/test.pdf): 
| No PDF file path provided so default input/test.pdf used
| Valid PDF file found!
| File 'input/test.pdf' exists and is a PDF file.
Enter the output MP3 file name (default: output/output.mp3): 
| Output file name: output/output.mp3
| Starting conversion...
| Extracting text from input/test.pdf...
| 
Extraxted text:


PDF to Speech :PDF to speech dot PY ,  is a Python script that converts PDF ﬁles to speech in the form of MP3 ﬁles, using the Google Cloud Text-to-Speech API, eﬀectively creating audiobooks from text documents.


| Text length: 212 characters, 37 words
| Converting text to Speech...
| Splitting text into chuncks of 4950 charaters
| Processing chunk 1/1...
| Chunk length: 212 characters, 37 words
| Chunk 1 processed.
| All chnunks processed.
| Total audio length: 133248 bytes
| Writing audio content to file 'output/output.mp3'...
| Audio content written to file 'output/output.mp3'
| PDF to Speech conversion app finished

Future Enhancements

🔲 - Progress Tracking: Add a progress bar for large PDFs
🔲 - SSML Support: Implement Speech Synthesis Markup Language for better pronunciation
🔲 - GUI: Create a simple Flask web interface or Tkinter desktop app
🔲 - Configuration: Allow users to select different voices and languages

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements.

Fork the repository

Create your feature branch: git checkout -b feature/NewFeature
Commit your changes: git commit -m 'Add new feature'
Push to the branch: git push origin feature/NewFeature
Open a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Google Cloud Text-to-Speech API
PyPDF2 library for PDF text extraction
Python community for excellent tooling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF to Speech Converter

Features

Installation

Prerequisites

Steps

Setup Google Cloud Text-to-Speech API

User Guide

Project Structure

Configuration

Example

Future Enhancements

Contributing

Fork the repository

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
input		input
modules		modules
output		output
static/images		static/images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pdf_to_speech.py		pdf_to_speech.py
requirements.txt		requirements.txt

License

Garschke/pdf-to-speech

Folders and files

Latest commit

History

Repository files navigation

PDF to Speech Converter

Features

Installation

Prerequisites

Steps

Setup Google Cloud Text-to-Speech API

User Guide

Project Structure

Configuration

Example

Future Enhancements

Contributing

Fork the repository

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages