A powerful FastAPI-based service that extracts text and metadata from PDF documents using advanced AI vision models. This service converts PDFs to images and processes them through the Vintern AI model to extract structured information including dates, document numbers, authors, titles, and full text content.
- PDF to Image Conversion: Converts PDF pages to high-quality PNG images
- AI-Powered Text Extraction: Uses Vintern-1B-v3.5 vision model for intelligent text recognition
- Structured Data Extraction: Automatically extracts:
- Document dates (day, month, year)
- Document numbers and symbols
- Author information
- Document titles
- Full text content
- Multi-language Support: Optimized for Vietnamese documents with accent handling
- GPU Acceleration: CUDA support for faster processing
- Memory Management: Intelligent memory handling with lightweight mode for low-resource systems
- Local Model Caching: Downloads and caches AI models locally for offline use
PDF-to-Text/
βββ app/
β βββ main.py # FastAPI application entry point
β βββ routers/ # API route definitions
β β βββ pdf_router.py # PDF processing endpoints
β βββ services/ # Core business logic
β β βββ pdf_service.py # PDF to image conversion
β β βββ vintern.py # AI model service
β β βββ parser.py # Data parsing utilities
β βββ config/ # Configuration management
β β βββ settings.py # Environment variables and settings
β βββ template/ # Response templates
β β βββ result.py # Standardized response format
β βββ utils/ # Utility functions
β βββ prom.py # AI prompt templates
βββ model-image/ # Local AI model cache
βββ requirements.txt # Python dependencies
- Backend Framework: FastAPI
- AI Model: Vintern-1B-v3.5 (5CD-AI)
- Deep Learning: PyTorch with CUDA support
- Image Processing: PIL (Pillow), torchvision
- PDF Processing: pdf2image
- Memory Management: psutil
- API Documentation: Auto-generated OpenAPI/Swagger
- Python 3.8+
- CUDA-compatible GPU (optional, for acceleration)
- At least 4GB RAM (8GB+ recommended)
- 6GB+ GPU memory if using CUDA
-
Clone the repository
git clone <repository-url> cd PDF-to-text
-
Create virtual environment
python -m venv env # On Windows env\Scripts\activate # On macOS/Linux source env/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Environment Setup Create a
.env
file in the root directory:HOST=0.0.0.0 PORT=8000 API_PREFIX=/api/v1 API_TITLE=PDF Storage API API_VERSION=1.0.0
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
The service will be available at:
- API: http://localhost:8000/api/v1
- Documentation: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Endpoint: POST /api/v1/upload/
Request: Multipart form data with PDF file
Response: Structured JSON with extracted information
{
"SheetTotal": 3,
"IssuedYear": 2024,
"Field1": "Author Name",
"Field2": "DOC-001",
"Field3": "SYM-001",
"Field6": "15/12/2024",
"Field7": "Document Title",
"Field8": "Full Title Text",
"Field13": 15,
"Field14": 12,
"Field15": 2024,
"Field32": "ThΖ°α»ng",
"Field33": "TiαΊΏng Viα»t",
"Field34": "BαΊ£n chΓnh",
"ContentLength": 1250,
"PageCountA4": 3,
"SearchMeta": "processed search metadata"
}
curl -X POST "http://localhost:8000/api/v1/upload/" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf"
import requests
url = "http://localhost:8000/api/v1/upload/"
files = {"file": open("document.pdf", "rb")}
response = requests.post(url, files=files)
result = response.json()
print(f"Document Title: {result['Field7']}")
print(f"Author: {result['Field1']}")
print(f"Date: {result['Field6']}")
Variable | Default | Description |
---|---|---|
HOST |
0.0.0.0 |
Server host address |
PORT |
8000 |
Server port |
API_PREFIX |
/api/v1 |
API endpoint prefix |
API_TITLE |
PDF Storage API |
API title for documentation |
API_VERSION |
1.0.0 |
API version |
The service automatically detects system resources and configures the AI model accordingly:
- Lightweight Mode: For systems with limited memory (< 6GB GPU / < 8GB RAM)
- Standard Mode: For systems with adequate resources
- Device Selection: Automatically chooses between CPU and CUDA GPU
- Processing Speed: 2-5 seconds per page (GPU), 10-20 seconds per page (CPU)
- Memory Usage: 2-6GB GPU memory, 4-8GB RAM
- Concurrent Requests: Supports multiple simultaneous PDF uploads
- Model Loading: First request loads model (~30 seconds), subsequent requests are instant
-
CUDA Out of Memory
- The service automatically falls back to CPU mode
- Check GPU memory usage with
nvidia-smi
-
Model Download Issues
- Ensure stable internet connection for first run
- Check available disk space (models require ~2GB)
-
Performance Issues
- Verify CUDA installation and drivers
- Check system resource availability
- GPU Status: Check GPU usage and model placement
- System Resources: Monitor memory and performance
- Model Verification: Verify AI model loading status
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- 5CD-AI for the Vintern-1B-v3.5 vision model
- FastAPI team for the excellent web framework
- PyTorch community for deep learning tools
For support and questions:
- Create an issue in the repository
- Check the API documentation at
/docs
- Review the troubleshooting section above
Note: This service is optimized for Vietnamese documents but works with documents in other languages as well. The AI model automatically adapts to different document types and languages.