Skip to content

AjayVasan/File-Parser-CRUD-API-with-Progress-Tracking-NITRO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

File Parser CRUD API

A Flask-based REST API that supports uploading, storing, parsing, and retrieving files with real-time progress tracking. The application provides complete CRUD functionality for file management with support for various file formats including PDF, DOCX, TXT, and more.

Features

  • βœ… File Upload - Support for multipart file uploads with unique ID assignment
  • βœ… Progress Tracking - Real-time upload and processing progress monitoring
  • βœ… File Parsing - Intelligent document content extraction using LlamaIndex
  • βœ… CRUD Operations - Complete Create, Read, Update, Delete functionality
  • βœ… Metadata Storage - File information including size, type, and timestamps
  • βœ… Error Handling - Comprehensive error responses and validation
  • βœ… Format Support - PDF, DOCX, TXT, CSV, and other document formats

Tech Stack

  • Backend Framework: Flask
  • File Parsing Engine: LlamaIndex Core (SimpleDirectoryReader)
  • Storage: Local file system with metadata tracking
  • Data Format: JSON responses
  • Testing: Postman for API testing and debugging

API Endpoints

1. Upload File

POST /files
Content-Type: multipart/form-data

Body: file (form-data)

Success Response (200):

{
  "message": "File uploaded successfully",
  "file id": 1234,
  "filename": "document.pdf",
  "File Type": "application/pdf",
  "File size": "2.45 MB",
  "Date Created": "2024-01-15T10:30:00"
}

Error Response (400):

{
  "message": "File already uploaded",
  "Metadata": {
    "file id": 1234,
    "filename": "document.pdf",
    "status": "ready to use",
    "size": "2.45 MB",
    "progress": "100%"
  }
}

2. Get Upload Progress

GET /files/{file_id}/progress

Response (200):

{
  "file id": 1234,
  "filename": "document.pdf",
  "status": "processing",
  "Size": "2.45 MB",
  "Progress": "75%"
}

Status Values:

  • uploading - File is being uploaded (0-19%)
  • processing - File is being processed (20-99%)
  • ready to use - File is ready for retrieval (100%)

3. Get File Content

GET /files/{file_id}/

Response (200):

{
  "file id": 1234,
  "filename": "document.pdf",
  "File Type": "application/pdf",
  "File size": "2.45 MB",
  "Date Created": "2024-01-15T10:30:00",
  "file content": [
    {
      "doc_id": "uuid-string",
      "text": "Extracted file content...",
      "metadata": {
        "page_label": "1",
        "text": "Page content..."
      }
    }
  ]
}

4. List All Files

GET /files

Response (200):

{
  "Complete Data Info": {
    "1234": ["document.pdf", "application/pdf", "2.45 MB", "2024-01-15T10:30:00"],
    "5678": ["spreadsheet.xlsx", "application/xlsx", "1.23 MB", "2024-01-15T11:00:00"]
  }
}

5. Delete File

DELETE /files/{file_id}

Response (200):

{
  "message": "File '/document.pdf' deleted successfully."
}

Error Response (404):

{
  "error": "unknown file_id"
}

Installation & Setup

Prerequisites

  • Python 3.7+
  • pip (Python package manager)

1. Clone the Repository

git clone https://github.com/yourusername/file-parser-crud-api.git
cd file-parser-crud-api

2. Install Dependencies

pip install -r requirements.txt

Or install manually:

pip install flask llama-index-core

About LlamaIndex: LlamaIndex is a powerful data framework designed to help you build applications with large language models (LLMs). In this project, we use:

  • SimpleDirectoryReader - For intelligent document loading and parsing
  • Supports multiple file formats: PDF, DOCX, TXT, CSV, HTML, and more
  • Automatically extracts text content and metadata from documents
  • Provides structured document chunking for better content organization

3. Configure Upload Directory

Update the UPLOAD_FOLDER path in the code to match your system:

UPLOAD_FOLDER = 'C:/Users/kowsh/OneDrive/Desktop/NITRO'  # Update this path

4. Run the Application

python app.py

The API will be available at http://localhost:5000

Usage Examples

Upload a File

curl -X POST \
  http://localhost:5000/files \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@/path/to/your/document.pdf'

Check Progress

curl -X GET http://localhost:5000/files/1234/progress

Retrieve File Content

curl -X GET http://localhost:5000/files/1234/

List All Files

curl -X GET http://localhost:5000/files

Delete a File

curl -X DELETE http://localhost:5000/files/1234

Project Structure

file-parser-crud-api/
β”‚
β”œβ”€β”€ app.py                 # Main Flask application
β”œβ”€β”€ README.md             # Project documentation
β”œβ”€β”€ requirements.txt      # Python dependencies
└── uploads/             # Directory for uploaded files (created automatically)

Testing

This API has been thoroughly tested using Postman for debugging and validation. A complete Postman collection is included with:

Postman Collection Features

  • βœ… All API endpoints with sample requests
  • βœ… Environment variables for easy testing
  • βœ… Pre-configured test cases
  • βœ… Response validation scripts
  • βœ… File upload examples

Import Postman Collection

  1. Open Postman
  2. Click "New" in the top left
  3. Select the type of request and paste the address URL of the hosted flask server
  4. Configure base URL: http://localhost:5000 on need!

Manual Testing Endpoints

You can test all endpoints using the curl examples below or use the provided Postman collection for a more user-friendly testing experience.

Features Implemented

βœ… Core Requirements

  • File upload with unique ID assignment
  • Progress tracking with status updates
  • Asynchronous file parsing simulation
  • Complete CRUD operations
  • Error handling and validation
  • JSON API responses

⚑ Additional Features

  • File size calculation and display
  • Timestamp tracking
  • Duplicate file detection
  • Automatic directory creation
  • Content type detection

Error Handling

The API provides comprehensive error handling:

  • 404 Not Found: Invalid file ID
  • 400 Bad Request: Missing file or duplicate upload
  • 500 Internal Server Error: Server-side processing errors

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

About the Developer

Ajay Vasan - Computer Science Engineering Student at Lovely Professional University

Experience Highlights:

  • Google Adversarial Nibbler Project - Adversarial Tester (Oct 2024 - Jan 2025)
  • IBM Cybersecurity Program - Internship at Allsoft Solutions (June - July 2024)
  • Specialized in Machine Learning, AI Safety Testing, and Cybersecurity

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you encounter any issues or have questions, please open an issue on GitHub.


Note: This project demonstrates advanced file parsing capabilities using LlamaIndex and comprehensive API design. Tested extensively with Postman for reliability and performance. For production use, consider implementing additional security measures, database integration, and proper authentication mechanisms.

Built with ❀️ by Ajay Vasan

About

πŸš€ Flask API - Upload, parse & manage files with progress tracking and CRUD operations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages