A Flask-based REST API that supports uploading, storing, parsing, and retrieving files with real-time progress tracking. The application provides complete CRUD functionality for file management with support for various file formats including PDF, DOCX, TXT, and more.
- β File Upload - Support for multipart file uploads with unique ID assignment
- β Progress Tracking - Real-time upload and processing progress monitoring
- β File Parsing - Intelligent document content extraction using LlamaIndex
- β CRUD Operations - Complete Create, Read, Update, Delete functionality
- β Metadata Storage - File information including size, type, and timestamps
- β Error Handling - Comprehensive error responses and validation
- β Format Support - PDF, DOCX, TXT, CSV, and other document formats
- Backend Framework: Flask
- File Parsing Engine: LlamaIndex Core (SimpleDirectoryReader)
- Storage: Local file system with metadata tracking
- Data Format: JSON responses
- Testing: Postman for API testing and debugging
POST /files
Content-Type: multipart/form-data
Body: file (form-data)
Success Response (200):
{
"message": "File uploaded successfully",
"file id": 1234,
"filename": "document.pdf",
"File Type": "application/pdf",
"File size": "2.45 MB",
"Date Created": "2024-01-15T10:30:00"
}
Error Response (400):
{
"message": "File already uploaded",
"Metadata": {
"file id": 1234,
"filename": "document.pdf",
"status": "ready to use",
"size": "2.45 MB",
"progress": "100%"
}
}
GET /files/{file_id}/progress
Response (200):
{
"file id": 1234,
"filename": "document.pdf",
"status": "processing",
"Size": "2.45 MB",
"Progress": "75%"
}
Status Values:
uploading
- File is being uploaded (0-19%)processing
- File is being processed (20-99%)ready to use
- File is ready for retrieval (100%)
GET /files/{file_id}/
Response (200):
{
"file id": 1234,
"filename": "document.pdf",
"File Type": "application/pdf",
"File size": "2.45 MB",
"Date Created": "2024-01-15T10:30:00",
"file content": [
{
"doc_id": "uuid-string",
"text": "Extracted file content...",
"metadata": {
"page_label": "1",
"text": "Page content..."
}
}
]
}
GET /files
Response (200):
{
"Complete Data Info": {
"1234": ["document.pdf", "application/pdf", "2.45 MB", "2024-01-15T10:30:00"],
"5678": ["spreadsheet.xlsx", "application/xlsx", "1.23 MB", "2024-01-15T11:00:00"]
}
}
DELETE /files/{file_id}
Response (200):
{
"message": "File '/document.pdf' deleted successfully."
}
Error Response (404):
{
"error": "unknown file_id"
}
- Python 3.7+
- pip (Python package manager)
git clone https://github.com/yourusername/file-parser-crud-api.git
cd file-parser-crud-api
pip install -r requirements.txt
Or install manually:
pip install flask llama-index-core
About LlamaIndex: LlamaIndex is a powerful data framework designed to help you build applications with large language models (LLMs). In this project, we use:
SimpleDirectoryReader
- For intelligent document loading and parsing- Supports multiple file formats: PDF, DOCX, TXT, CSV, HTML, and more
- Automatically extracts text content and metadata from documents
- Provides structured document chunking for better content organization
Update the UPLOAD_FOLDER
path in the code to match your system:
UPLOAD_FOLDER = 'C:/Users/kowsh/OneDrive/Desktop/NITRO' # Update this path
python app.py
The API will be available at http://localhost:5000
curl -X POST \
http://localhost:5000/files \
-H 'Content-Type: multipart/form-data' \
-F 'file=@/path/to/your/document.pdf'
curl -X GET http://localhost:5000/files/1234/progress
curl -X GET http://localhost:5000/files/1234/
curl -X GET http://localhost:5000/files
curl -X DELETE http://localhost:5000/files/1234
file-parser-crud-api/
β
βββ app.py # Main Flask application
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ uploads/ # Directory for uploaded files (created automatically)
This API has been thoroughly tested using Postman for debugging and validation. A complete Postman collection is included with:
- β All API endpoints with sample requests
- β Environment variables for easy testing
- β Pre-configured test cases
- β Response validation scripts
- β File upload examples
- Open Postman
- Click "New" in the top left
- Select the type of request and paste the address
URL
of the hosted flask server - Configure base URL:
http://localhost:5000
on need!
You can test all endpoints using the curl examples below or use the provided Postman collection for a more user-friendly testing experience.
- File upload with unique ID assignment
- Progress tracking with status updates
- Asynchronous file parsing simulation
- Complete CRUD operations
- Error handling and validation
- JSON API responses
- File size calculation and display
- Timestamp tracking
- Duplicate file detection
- Automatic directory creation
- Content type detection
The API provides comprehensive error handling:
- 404 Not Found: Invalid file ID
- 400 Bad Request: Missing file or duplicate upload
- 500 Internal Server Error: Server-side processing errors
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Ajay Vasan - Computer Science Engineering Student at Lovely Professional University
- π LinkedIn: linkedin.com/in/ajay-vasan
- π§ Email: mrajayvasan@gmail.com
- π» GitHub: github.com/AjayVasan
- Google Adversarial Nibbler Project - Adversarial Tester (Oct 2024 - Jan 2025)
- IBM Cybersecurity Program - Internship at Allsoft Solutions (June - July 2024)
- Specialized in Machine Learning, AI Safety Testing, and Cybersecurity
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have questions, please open an issue on GitHub.
Note: This project demonstrates advanced file parsing capabilities using LlamaIndex and comprehensive API design. Tested extensively with Postman for reliability and performance. For production use, consider implementing additional security measures, database integration, and proper authentication mechanisms.
Built with β€οΈ by Ajay Vasan