Skip to content

This project is a FastAPI-based service that extracts and compresses images from PDF files while ensuring: Each image is < 300 KB, The total size of images < 3 MB, Proportional compression is applied. This service returns the compressed images as a result.

Notifications You must be signed in to change notification settings

Shaukataliii/patwar-pdf-compressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 PDF Image Extractor & Compressor 🚀

A FastAPI-based service to extract and compress images from PDFs while ensuring:
✅ Each image is < 300 KB
✅ Total compressed images are < 3 MB
✅ Adaptive, proportional compression for best quality


🚀 Features

  • Extracts all images from a PDF
  • Compresses images adaptively to fit size constraints
  • Ensures quality preservation while reducing file size
  • Returns a ZIP file containing the compressed images
  • Fast & scalable using FastAPI

🛠️ Installation

1⃣ Clone the repository

git clone https://github.com/Shaukataliii/patwar-pdf-compressor
cd patwar-pdf-compressor

Create a virtual environment (optional but recommended)

python -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

▶️ Usage

1⃣ Run the FastAPI server

uvicorn main:app --reload

By default, the API runs at http://127.0.0.1:8000.

2⃣ Use the API

Go to http://127.0.0.1:8000/docs to access Swagger UI. Use the /process endpoint to upload a PDF file.

Example API Call (cURL)

curl -X 'POST' 'http://127.0.0.1:8000/process' \
     -H 'accept: application/zip' \
     -H 'Content-Type: multipart/form-data' \
     -F 'file=@sample.pdf' \
     --output compressed_images.zip

📚 API Documentation

🔹 POST /process

Uploads a PDF, extracts images, compresses them, and returns a ZIP file.

Request

  • Method: POST
  • URL: /process
  • Headers: Content-Type: multipart/form-data
  • Body:
    • file (PDF file)

Response

  • Success: 200 OK – Returns a ZIP file containing the compressed images.
  • Errors:
    • 400 Bad Request – If the file is not a PDF or has no images.
    • 500 Internal Server Error – If an unknown error occurs.

⚙️ Tech Stack

  • Python 🐍
  • FastAPI 🚀
  • PyMuPDF (fitz) 📄 (for extracting images)
  • Pillow (PIL) 🎨 (for image processing)

🤝 Contributing

Feel free to contribute! Fork the repository and submit a pull request.

🐟 License

This project is licensed under the MIT License.

About

This project is a FastAPI-based service that extracts and compresses images from PDF files while ensuring: Each image is < 300 KB, The total size of images < 3 MB, Proportional compression is applied. This service returns the compressed images as a result.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages