Text Classification API

A high-performance, production-ready API for sentiment analysis and text classification using advanced machine learning models. This API provides accurate sentiment analysis with support for batch processing, comprehensive health monitoring, and optimized performance for both development and production environments.

Overview

The Text Classification API is built with FastAPI and leverages multiple machine learning algorithms including XGBoost, LightGBM, CatBoost, and neural networks. It features asynchronous processing with thread pools for optimal CPU utilization, LRU caching for model responses, and comprehensive error handling. The API is containerized with Docker for easy deployment and includes professional MkDocs documentation.

Key Features

High-performance sentiment analysis with 86.3% accuracy validated on 10,000+ test samples
Support for multiple ML algorithms (XGBoost, LightGBM, CatBoost, TensorFlow)
Asynchronous batch processing with concurrent execution
Comprehensive health monitoring and metrics endpoints
Docker containerization optimized for production deployment
Memory-efficient design suitable for free tier hosting
Professional API documentation with MkDocs
RESTful API design with automatic OpenAPI documentation

Performance Metrics

The API has been thoroughly tested with 10,000 randomly generated samples, achieving:

Overall accuracy: 86.3%
Average response time: 74.26ms per prediction
Throughput: 608+ predictions per second
Average confidence: 83.3%
Memory usage: ~994MB with all models loaded

Dataset

This project utilizes a comprehensive dataset created by merging and cleaning 12 different text classification datasets. The datasets include customer support tickets, banking conversations, news articles, sentiment analysis data, spam detection samples, and toxic content classification data. The merged dataset was carefully preprocessed to remove duplicates, handle missing values, balance class distributions, and ensure data quality for training robust machine learning models.

Quick Start

Prerequisites

Python 3.11+
Docker (optional, for containerized deployment)

Local Development

Clone the repository:

git clone https://github.com/OMCHOKSI108/text-classifier-api.git
cd text-classifier-api

Install dependencies:

pip install -r requirements.txt

Run the API:

python run_api.py

The API will be available at http://localhost:8000

Docker Deployment

Build and run with Docker:

python run_docker.py

Documentation

To view the documentation locally:

python run_docs.py

Documentation will be available at http://localhost:8001

API Usage

Single Prediction

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "This product is amazing!"}'

Batch Prediction

curl -X POST "http://localhost:8000/batch_predict" \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Great service!", "Terrible experience"], "batch_size": 10}'

Health Check

curl http://localhost:8000/health

Project Structure

text-classifier-api/
├── api/                          # API source code and configuration
│   ├── main.py                   # FastAPI application
│   ├── api_requirements.txt      # Python dependencies
│   ├── Dockerfile                # Docker configuration
│   ├── docker-compose.yml        # Docker composition
│   ├── datasets/                 # Dataset files
│   ├── mkdocs.yml               # Documentation configuration
│   └── docs/                     # MkDocs documentation

Model Training

The machine learning model was trained on the merged dataset using multiple algorithms with hyperparameter optimization. The final model combines predictions from XGBoost, LightGBM, and CatBoost classifiers using a weighted ensemble approach for improved accuracy and robustness.

Testing

Comprehensive performance testing is available through the included testing scripts. The API has been validated with 10,000+ test samples showing consistent performance across different text types and lengths.

Deployment

The API is designed for easy deployment in various environments:

Local Development: Run directly with Python
Docker: Containerized deployment for consistency
Cloud Platforms: Compatible with Heroku, Railway, and similar platforms
Free Tier: Memory-optimized for cost-effective hosting

Documentation Deployment

To deploy the MkDocs documentation to GitHub Pages:

python deploy_docs.py

The documentation will be available at: https://OMCHOKSI108.github.io/text-classifier-api/

Git Repository

Files Included in Repository

api/ - Complete API source code and configuration
run_*.py - Runner scripts for different deployment modes
README.md - Project documentation
.gitignore - Git ignore rules for large files and sensitive data

Files Excluded from Repository

The following files are excluded via .gitignore for size and security reasons:

Model files (*.pkl) - Download separately or train locally
Dataset files (*.csv) - Generate or download as needed
Test outputs and logs
Virtual environments and cache files
Jupyter notebooks and development artifacts

Contributing

Contributions are welcome. Please ensure code quality and add appropriate tests for new features.

License

This project is licensed under the MIT License.

Author

OMCHOKSI108

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text Classification API

Overview

Key Features

Performance Metrics

Dataset

Quick Start

Prerequisites

Local Development

Docker Deployment

Documentation

API Usage

Single Prediction

Batch Prediction

Health Check

Project Structure

Model Training

Testing

Deployment

Documentation Deployment

Git Repository

Files Included in Repository

Files Excluded from Repository

Contributing

License

Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
api		api
.gitignore		.gitignore
README.md		README.md
deploy_docs.py		deploy_docs.py
run_api.py		run_api.py
run_docker.py		run_docker.py
run_docs.py		run_docs.py

OMCHOKSI108/text-classifier-model-api

Folders and files

Latest commit

History

Repository files navigation

Text Classification API

Overview

Key Features

Performance Metrics

Dataset

Quick Start

Prerequisites

Local Development

Docker Deployment

Documentation

API Usage

Single Prediction

Batch Prediction

Health Check

Project Structure

Model Training

Testing

Deployment

Documentation Deployment

Git Repository

Files Included in Repository

Files Excluded from Repository

Contributing

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages