A high-performance, production-ready API for sentiment analysis and text classification using advanced machine learning models. This API provides accurate sentiment analysis with support for batch processing, comprehensive health monitoring, and optimized performance for both development and production environments.
The Text Classification API is built with FastAPI and leverages multiple machine learning algorithms including XGBoost, LightGBM, CatBoost, and neural networks. It features asynchronous processing with thread pools for optimal CPU utilization, LRU caching for model responses, and comprehensive error handling. The API is containerized with Docker for easy deployment and includes professional MkDocs documentation.
- High-performance sentiment analysis with 86.3% accuracy validated on 10,000+ test samples
- Support for multiple ML algorithms (XGBoost, LightGBM, CatBoost, TensorFlow)
- Asynchronous batch processing with concurrent execution
- Comprehensive health monitoring and metrics endpoints
- Docker containerization optimized for production deployment
- Memory-efficient design suitable for free tier hosting
- Professional API documentation with MkDocs
- RESTful API design with automatic OpenAPI documentation
The API has been thoroughly tested with 10,000 randomly generated samples, achieving:
- Overall accuracy: 86.3%
- Average response time: 74.26ms per prediction
- Throughput: 608+ predictions per second
- Average confidence: 83.3%
- Memory usage: ~994MB with all models loaded
This project utilizes a comprehensive dataset created by merging and cleaning 12 different text classification datasets. The datasets include customer support tickets, banking conversations, news articles, sentiment analysis data, spam detection samples, and toxic content classification data. The merged dataset was carefully preprocessed to remove duplicates, handle missing values, balance class distributions, and ensure data quality for training robust machine learning models.
- Python 3.11+
- Docker (optional, for containerized deployment)
- Clone the repository:
git clone https://github.com/OMCHOKSI108/text-classifier-api.git
cd text-classifier-api- Install dependencies:
pip install -r requirements.txt- Run the API:
python run_api.pyThe API will be available at http://localhost:8000
- Build and run with Docker:
python run_docker.pyTo view the documentation locally:
python run_docs.pyDocumentation will be available at http://localhost:8001
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"text": "This product is amazing!"}'curl -X POST "http://localhost:8000/batch_predict" \
-H "Content-Type: application/json" \
-d '{"texts": ["Great service!", "Terrible experience"], "batch_size": 10}'curl http://localhost:8000/healthtext-classifier-api/
├── api/ # API source code and configuration
│ ├── main.py # FastAPI application
│ ├── api_requirements.txt # Python dependencies
│ ├── Dockerfile # Docker configuration
│ ├── docker-compose.yml # Docker composition
│ ├── datasets/ # Dataset files
│ ├── mkdocs.yml # Documentation configuration
│ └── docs/ # MkDocs documentation
The machine learning model was trained on the merged dataset using multiple algorithms with hyperparameter optimization. The final model combines predictions from XGBoost, LightGBM, and CatBoost classifiers using a weighted ensemble approach for improved accuracy and robustness.
Comprehensive performance testing is available through the included testing scripts. The API has been validated with 10,000+ test samples showing consistent performance across different text types and lengths.
The API is designed for easy deployment in various environments:
- Local Development: Run directly with Python
- Docker: Containerized deployment for consistency
- Cloud Platforms: Compatible with Heroku, Railway, and similar platforms
- Free Tier: Memory-optimized for cost-effective hosting
To deploy the MkDocs documentation to GitHub Pages:
python deploy_docs.pyThe documentation will be available at: https://OMCHOKSI108.github.io/text-classifier-api/
api/- Complete API source code and configurationrun_*.py- Runner scripts for different deployment modesREADME.md- Project documentation.gitignore- Git ignore rules for large files and sensitive data
The following files are excluded via .gitignore for size and security reasons:
- Model files (
*.pkl) - Download separately or train locally - Dataset files (
*.csv) - Generate or download as needed - Test outputs and logs
- Virtual environments and cache files
- Jupyter notebooks and development artifacts
Contributions are welcome. Please ensure code quality and add appropriate tests for new features.
This project is licensed under the MIT License.
OMCHOKSI108