Skip to content

Garschke/pdf-to-speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF to Speech Converter

Python Google Cloud

A Python script that converts PDF files to speech (MP3) using Google Cloud Text-to-Speech API, effectively creating audiobooks from text documents.

Table of Contents
  1. PDF to Speech Converter
  2. Features
  3. Installation
  4. User Guide
  5. Project Structure
  6. Configuration
  7. Example
  8. Future Enhancements
  9. Contributing
  10. License
  11. Acknowledgements

Features

  • Extract text from PDF files
  • Convert text to natural-sounding speech
  • Output as MP3 audio files
  • Handle large PDFs by splitting text into manageable chunks
  • Customizable voice options (language, gender)

Installation

Prerequisites

  • Python 3.7+
  • Google Cloud account (free tier available)
  • VS Code (recommended) or any Python IDE

Steps

  1. Clone the repository:
git clone https://github.com/garschke/pdf-to-speech.git
cd pdf-to-speech
  1. Create and activate a virtual environment:
python -m venv venv           # Windows
.\venv\Scripts\activate       # Windows

python3 -m venv .venv         # Mac/Linux
source .venv/bin/activate     # Mac/Linux
  1. Install dependencies:
pip3 install -r requirements.txt

Setup Google Cloud Text-to-Speech API

  1. Go to Google Cloud Console
  2. Create a new project
  3. Enable the Text-to-Speech API
  4. Create a service account and download the JSON key file
  5. Save the key file in the project folder as google_credentials.json

User Guide

Run the script with:

python pdf_to_speech.py     # Windows
python3 pdf_to_speech.py    # Mac/Linux

When prompted:

  1. Enter the path to your PDF file
  2. Enter the desired output MP3 filename (default: output.mp3)

The script will:

  1. Extract text from the PDF
  2. Convert the text to speech using Google's API
  3. Save the audio as an MP3 file

Project Structure

pdf-to-speech/
├── pdf_to_speech.py        # Main conversion script
├── google_credentials.json # Google Cloud credentials
├── requirements.txt        # Dependencies
├── .gitignore              # Files to ignore in version control
├── static/
│   └── image/              # Images (pdf to text audiobook logo)
├── modules/
│   └── logger.py           # Logging module
├── input/
│   └── test.pdf            # Example PDF file (used if no filepath provided)
├── output/
│   └── output.mp3          # MP3 output (default output filepath if none provided)
└── README.md               # This file

Configuration

You can modify these aspects in the code:

Example

Without DEBUG logging

python3 pdf_to_speech.py
Enter path to PDF file (default: input/test.pdf):
Enter the output MP3 file name (default: output/output.mp3):
| Processing chunk 1/1...
| Audio content written to file 'output/output.mp3'

With DEBUG logging

python3 pdf_to_speech.py
| Starting PDF to Speech conversion app
Enter path to PDF file (default: input/test.pdf): 
| No PDF file path provided so default input/test.pdf used
| Valid PDF file found!
| File 'input/test.pdf' exists and is a PDF file.
Enter the output MP3 file name (default: output/output.mp3): 
| Output file name: output/output.mp3
| Starting conversion...
| Extracting text from input/test.pdf...
| 
Extraxted text:


PDF to Speech :PDF to speech dot PY ,  is a Python script that converts PDF files to speech in the form of MP3 files, using the Google Cloud Text-to-Speech API, effectively creating audiobooks from text documents.


| Text length: 212 characters, 37 words
| Converting text to Speech...
| Splitting text into chuncks of 4950 charaters
| Processing chunk 1/1...
| Chunk length: 212 characters, 37 words
| Chunk 1 processed.
| All chnunks processed.
| Total audio length: 133248 bytes
| Writing audio content to file 'output/output.mp3'...
| Audio content written to file 'output/output.mp3'
| PDF to Speech conversion app finished

Future Enhancements

  1. 🔲 - Progress Tracking: Add a progress bar for large PDFs
  2. 🔲 - SSML Support: Implement Speech Synthesis Markup Language for better pronunciation
  3. 🔲 - GUI: Create a simple Flask web interface or Tkinter desktop app
  4. 🔲 - Configuration: Allow users to select different voices and languages

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements.

Fork the repository

  1. Create your feature branch: git checkout -b feature/NewFeature
  2. Commit your changes: git commit -m 'Add new feature'
  3. Push to the branch: git push origin feature/NewFeature
  4. Open a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Google Cloud Text-to-Speech API
  • PyPDF2 library for PDF text extraction
  • Python community for excellent tooling

About

PDF to Speech Converter with Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages