Skip to content

A high-performance, production-ready API for sentiment analysis and text classification using advanced machine learning models. This API provides accurate sentiment analysis with support for batch processing, comprehensive health monitoring, and optimized performance for both development and production environments.

Notifications You must be signed in to change notification settings

OMCHOKSI108/text-classifier-model-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Classification API

A high-performance, production-ready API for sentiment analysis and text classification using advanced machine learning models. This API provides accurate sentiment analysis with support for batch processing, comprehensive health monitoring, and optimized performance for both development and production environments.

Overview

The Text Classification API is built with FastAPI and leverages multiple machine learning algorithms including XGBoost, LightGBM, CatBoost, and neural networks. It features asynchronous processing with thread pools for optimal CPU utilization, LRU caching for model responses, and comprehensive error handling. The API is containerized with Docker for easy deployment and includes professional MkDocs documentation.

Key Features

  • High-performance sentiment analysis with 86.3% accuracy validated on 10,000+ test samples
  • Support for multiple ML algorithms (XGBoost, LightGBM, CatBoost, TensorFlow)
  • Asynchronous batch processing with concurrent execution
  • Comprehensive health monitoring and metrics endpoints
  • Docker containerization optimized for production deployment
  • Memory-efficient design suitable for free tier hosting
  • Professional API documentation with MkDocs
  • RESTful API design with automatic OpenAPI documentation

Performance Metrics

The API has been thoroughly tested with 10,000 randomly generated samples, achieving:

  • Overall accuracy: 86.3%
  • Average response time: 74.26ms per prediction
  • Throughput: 608+ predictions per second
  • Average confidence: 83.3%
  • Memory usage: ~994MB with all models loaded

Dataset

This project utilizes a comprehensive dataset created by merging and cleaning 12 different text classification datasets. The datasets include customer support tickets, banking conversations, news articles, sentiment analysis data, spam detection samples, and toxic content classification data. The merged dataset was carefully preprocessed to remove duplicates, handle missing values, balance class distributions, and ensure data quality for training robust machine learning models.

Quick Start

Prerequisites

  • Python 3.11+
  • Docker (optional, for containerized deployment)

Local Development

  1. Clone the repository:
git clone https://github.com/OMCHOKSI108/text-classifier-api.git
cd text-classifier-api
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the API:
python run_api.py

The API will be available at http://localhost:8000

Docker Deployment

  1. Build and run with Docker:
python run_docker.py

Documentation

To view the documentation locally:

python run_docs.py

Documentation will be available at http://localhost:8001

API Usage

Single Prediction

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "This product is amazing!"}'

Batch Prediction

curl -X POST "http://localhost:8000/batch_predict" \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Great service!", "Terrible experience"], "batch_size": 10}'

Health Check

curl http://localhost:8000/health

Project Structure

text-classifier-api/
├── api/                          # API source code and configuration
│   ├── main.py                   # FastAPI application
│   ├── api_requirements.txt      # Python dependencies
│   ├── Dockerfile                # Docker configuration
│   ├── docker-compose.yml        # Docker composition
│   ├── datasets/                 # Dataset files
│   ├── mkdocs.yml               # Documentation configuration
│   └── docs/                     # MkDocs documentation

Model Training

The machine learning model was trained on the merged dataset using multiple algorithms with hyperparameter optimization. The final model combines predictions from XGBoost, LightGBM, and CatBoost classifiers using a weighted ensemble approach for improved accuracy and robustness.

Testing

Comprehensive performance testing is available through the included testing scripts. The API has been validated with 10,000+ test samples showing consistent performance across different text types and lengths.

Deployment

The API is designed for easy deployment in various environments:

  • Local Development: Run directly with Python
  • Docker: Containerized deployment for consistency
  • Cloud Platforms: Compatible with Heroku, Railway, and similar platforms
  • Free Tier: Memory-optimized for cost-effective hosting

Documentation Deployment

To deploy the MkDocs documentation to GitHub Pages:

python deploy_docs.py

The documentation will be available at: https://OMCHOKSI108.github.io/text-classifier-api/

Git Repository

Files Included in Repository

  • api/ - Complete API source code and configuration
  • run_*.py - Runner scripts for different deployment modes
  • README.md - Project documentation
  • .gitignore - Git ignore rules for large files and sensitive data

Files Excluded from Repository

The following files are excluded via .gitignore for size and security reasons:

  • Model files (*.pkl) - Download separately or train locally
  • Dataset files (*.csv) - Generate or download as needed
  • Test outputs and logs
  • Virtual environments and cache files
  • Jupyter notebooks and development artifacts

Contributing

Contributions are welcome. Please ensure code quality and add appropriate tests for new features.

License

This project is licensed under the MIT License.

Author

OMCHOKSI108

About

A high-performance, production-ready API for sentiment analysis and text classification using advanced machine learning models. This API provides accurate sentiment analysis with support for batch processing, comprehensive health monitoring, and optimized performance for both development and production environments.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published