Skip to content

Demonstration of multiprocessing by taking leverage of all the core of ur CPU, imports data from big csv files to your relational db

Notifications You must be signed in to change notification settings

rohit114/csv-uploader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSV Uploader API

About

The CSV Uploader API is a FastAPI-based application designed to facilitate the uploading of CSV files containing massive data (100MB+). This application supports bulk import from csv to relational db, using multiprocessing for parallel processing (speed depends on CPU core), and allows querying for information with various filters. Built with modern technologies, it provides a robust backend solution for managing data and exploring data.

Tech Stack

  • FastAPI: A modern web framework for building APIs with Python 3.7+ based on standard Python type hints.
  • PostgreSQL: A powerful, open-source relational database system known for its reliability, feature robustness, and performance.
  • SQLAlchemy ORM: An SQL toolkit and Object-Relational Mapping (ORM) library for Python, providing a full suite of well-known enterprise-level persistence patterns.

Local Setup

To set up the project locally, follow these steps:

  1. Clone the repository:

    git clone https://github.com/rohit114/csv-uploader.git
    
    cd csv-uploader
    
  2. Create virtual environment:

    python3 -m venv venv
    
  3. Activate virtual environment:

    (Linux/Mac)
    source venv/bin/activate 
    
    (Windows)
    venv\Scripts\activate
    
  4. Install dependencies:

    pip install -r requirements.txt
    
  5. Run Application: NOTE: 1. rename sample.env to .env -> add DATABASE_URL, API_KEY ** NOTE: 2. create databse in postgres as per DATABASE_URL

    uvicorn app.main:app --reload
    

Docker Setup

  • docker-compose build
  • docker-compose up
  • Application will listen to PORT as per exposed PORT docker-compose file

API Documentation

  1. Generate CSV data 10 Lakh records (around 160MB) sample_data.csv

    • python3 app/utils/csv_seeder.py
  2. Uplaod csv

    • About:

      • make sure to create a database as per POSTGRES_DB .env file
      • can support large csv file (tested locally with 10 Lakh records, takes around 17-20 seconds ( with 8 core CPU) to bulk insert in table )
    • METHOD: POST

    • URL: {{BASE_URL}}/upload/

    • HEADER: x_api_key: xxxxxxx as per .env, Content-Type : multipart/form-data

    • BODY: form-data : key=file, value= attach sample_data.csv (generated in step 1)

    • api will return 200 OK { "status": "File processed successfully"} on success else throw error

  3. Explore Game data:

    • METHOD: GET
    • URL: {{BASE_URL}}/games/
    • HEADER: x_api_key as per sample.env file
    • Query Params:
      • limit (optional default 10)
      • offset (optional default 0)
      • name (optional)
      • age (optional)
      • release_date_gte (optional)
      • release_date_lte (optional)
    • api will return 200 OK { "data": [list of games], "next_offset": 10 }
  4. Refer for more

Contact

About

Demonstration of multiprocessing by taking leverage of all the core of ur CPU, imports data from big csv files to your relational db

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published