Zenith NLP Framework


  ____..--'    .-''-.  ,---.   .--..-./`) ,---------. .---.  .---.         ,---.   .--.  .---.     .-------.  
 |        |  .'_ _   \ |    \  |  |\ .-.')\          \|   |  |_ _|         |    \  |  |  | ,_|     \  _(`)_ \ 
 |   .-'  ' / ( ` )   '|  ,  \ |  |/ `-' \ `--.  ,---'|   |  ( ' )         |  ,  \ |  |,-./  )     | (_ o._)| 
 |.-'.'   /. (_ o _)  ||  |\_ \|  | `-'`"`    |   \   |   '-(_{;}_)        |  |\_ \|  |\  '_ '`)   |  (_,_) / 
    /   _/ |  (_,_)___||  _( )_\  | .---.     :_ _:   |      (_,_)         |  _( )_\  | > (_)  )   |   '-.-'  
  .'._( )_ '  \   .---.| (_ o _)  | |   |     (_I_)   | _ _--.   |         | (_ o _)  |(  .  .-'   |   |      
.'  (_'o._) \  `-'    /|  (_,_)\  | |   |    (_(=)_)  |( ' ) |   |         |  (_,_)\  | `-'`-'|___ |   |      
|    (_,_)|  \       / |  |    |  | |   |     (_I_)   (_{;}_)|   |         |  |    |  |  |        \/   )      
|_________|   `'-..-'  '--'    '--' '---'     '---'   '(_,_) '---'         '--'    '--'  `--------``---'

Zenith NLP Framework

A Framework for Advanced Natural Language Processing

ZenithNLP is an advanced, from-scratch NLP framework built with PyTorch for training, fine-tuning, and deploying modern transformer-based models. It serves as a comprehensive toolkit for NLP practitioners and researchers, featuring a modular architecture and a full suite of MLOps capabilities.

✨ Features

State-of-the-Art Model Architectures: From-scratch implementations of:
- BERT (Encoder-only) for tasks like classification and NER.
- GPT (Decoder-only) for causal language modeling and text generation.
- Seq2SeqTransformer (Encoder-Decoder) for translation and summarization.
Advanced Training Techniques:
- Parameter-Efficient Fine-Tuning (PEFT): Integrated LoRA (Low-Rank Adaptation) for efficient fine-tuning of large models.
- Distributed Training: Support for multi-GPU training using PyTorch's DistributedDataParallel.
- Advanced Optimization: Includes learning rate scheduling with warm-up and gradient clipping.
Full MLOps Pipeline:
- Configuration Management: Powered by Hydra, allowing for flexible and reproducible experiments through YAML files.
- Experiment Tracking: Integrated with MLflow to log parameters, metrics, and model artifacts automatically.
- Containerization: Fully containerized with Docker and Docker Compose for reproducible environments and easy deployment of the MLflow UI.
- Continuous Integration: Automated testing pipeline with GitHub Actions and pytest.
Flexible API for Deployment:
- A ready-to-use FastAPI server that can dynamically load and serve any model trained with the framework.
Custom Core Components:
- A trainable Byte-Pair Encoding (BPE) Tokenizer built from scratch.
- Modular implementations of MultiHeadAttention, PositionalEncoding, and other core transformer building blocks.

🚀 Getting Started

1. Installation (from PyPI)

Note: Once published, you will be able to install the framework directly from PyPI.

pip install zenith-nlp-framework

2. Local Development Setup

# 1. Clone the repository
git clone https://github.com/cattolatte/zenith-nlp-framework.git
cd zenith-nlp-framework

# 2. Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate

# 3. Install all dependencies
pip install -r requirements.txt

# 4. Install the project in editable mode
pip install -e .

📖 Tutorial: Training a Text Classifier

This framework is designed for flexibility. Here’s how you can train your own text classification model.

1. Prepare Your Data and Configs

Place your training data (e.g., my_data.csv) in a local data/ directory. Use the configs/ directory as a template. You can modify config.yaml or create a new one to point to your data file and adjust model/training parameters.

2. Run Training

Run the text classification task script. All parameters are managed by the Hydra configuration files in the configs/ directory.

# Run with default settings from the config files
python3 -m my_nlp_framework.tasks.text_classification

You can easily override any parameter from the command line:

# Train for more epochs with a different learning rate
python3 -m my_nlp_framework.tasks.text_classification training.epochs=10 training.learning_rate=0.0005

# Train with LoRA enabled
python3 -m my_nlp_framework.tasks.text_classification model.use_lora=True model.lora_rank=8

3. Track Experiments with MLflow

Before training, launch the MLflow UI to track your experiments in real-time. The docker-compose.yml file is pre-configured for you.

# Start the MLflow server in the background
docker-compose up -d

Navigate to http://localhost:5000 in your browser to view the MLflow dashboard.

🌐 Serving Your Model via API

Once you have a trained model (.pth file) and tokenizer (.json file), you can easily deploy it with the built-in FastAPI server.

python3 -m my_nlp_framework.inference.api \
    --model-path /path/to/your/trained_model.pth \
    --tokenizer-path /path/to/your/tokenizer.json \
    --vocab-size 10000 \
    --num-classes 2

The API will be available at http://localhost:8000/docs for interactive testing.

🐳 Running with Docker

You can also run the entire training process within a Docker container for perfect reproducibility.

# 1. Build the Docker image
docker build -t zenith-nlp-framework:latest .

# 2. Run a task (mounting your local data directory)
docker run --rm -v "$(pwd)/data":/app/data zenith-nlp-framework:latest \
  python -m my_nlp_framework.tasks.text_classification

🏛️ Framework Architecture

This framework is organized into several key modules:

src/my_nlp_framework/core: Contains the fundamental building blocks like attention mechanisms, LoRA layers, and tokenizers.
src/my_nlp_framework/models: Defines high-level model architectures like BERT and GPT.
src/my_nlp_framework/data: Includes flexible data loaders.
src/my_nlp_framework/training: A powerful, centralized training engine with advanced features.
src/my_nlp_framework/tasks: Example scripts that show how to use the framework to solve end-to-end problems.
src/my_nlp_framework/inference: Code for deploying and serving trained models.
configs/: Centralized YAML configuration files for Hydra.
tests/: Unit and integration tests for the framework.

🤝 Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Made with ❤️ by K Satya Sai Nischal

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
configs		configs
scripts		scripts
src/my_nlp_framework		src/my_nlp_framework
tests		tests
.dockerignore		.dockerignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
data_collection.py		data_collection.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zenith NLP Framework

A Framework for Advanced Natural Language Processing

📜 Table of Contents

✨ Features

🚀 Getting Started

1. Installation (from PyPI)

2. Local Development Setup

📖 Tutorial: Training a Text Classifier

1. Prepare Your Data and Configs

2. Run Training

3. Track Experiments with MLflow

🌐 Serving Your Model via API

🐳 Running with Docker

🏛️ Framework Architecture

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

License

cattolatte/zenith-nlp-framework

Folders and files

Latest commit

History

Repository files navigation

Zenith NLP Framework

A Framework for Advanced Natural Language Processing

📜 Table of Contents

✨ Features

🚀 Getting Started

1. Installation (from PyPI)

2. Local Development Setup

📖 Tutorial: Training a Text Classifier

1. Prepare Your Data and Configs

2. Run Training

3. Track Experiments with MLflow

🌐 Serving Your Model via API

🐳 Running with Docker

🏛️ Framework Architecture

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages