Skip to content

Commit 2fee13b

Browse files
authored
Merge pull request #295 from stochasticai/glennko/gpt-oss-support
feat: Add comprehensive OpenAI GPT-OSS model support
2 parents 1db4ec3 + 462940a commit 2fee13b

23 files changed

+921
-51
lines changed

.github/workflows/ci.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [ main, master, dev ]
6+
pull_request:
7+
branches: [ main, master, dev ]
8+
9+
jobs:
10+
pre-commit:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Checkout
14+
uses: actions/checkout@v4
15+
16+
- name: Set up Python
17+
uses: actions/setup-python@v4
18+
with:
19+
python-version: '3.10'
20+
21+
- name: Install pre-commit
22+
run: |
23+
python -m pip install --upgrade pip
24+
pip install pre-commit
25+
26+
- name: Run pre-commit
27+
run: |
28+
pre-commit run -a --show-diff-on-failure
29+
30+
- name: Verify key files present
31+
run: |
32+
test -f AGENTS.md

AGENTS.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Repository Guidelines
2+
3+
## Project Structure & Module Organization
4+
- Source code: `src/xturing/` (key packages: `models/`, `engines/`, `datasets/`, `preprocessors/`, `trainers/`, `cli/`, `utils/`, `config/`, `ui/`, `self_instruct/`, `model_apis/`, `registry.py`).
5+
- Tests: `tests/xturing/` mirroring package layout (`test_*.py`).
6+
- Examples & docs: `examples/`, `docs/`; assets and workflows: `.github/`.
7+
8+
## Build, Test, and Development Commands
9+
- Setup (editable + dev tools):
10+
- `python -m venv .venv && source .venv/bin/activate`
11+
- `pip install -e .`
12+
- `pip install -r requirements-dev.txt`
13+
- Formatting & lint (pre-commit):
14+
- `pre-commit install && pre-commit install --hook-type commit-msg`
15+
- `pre-commit run -a` (runs black, isort, autoflake, yaml checks, gitlint, absolufy-imports)
16+
- Run tests (pytest):
17+
- `pytest -q` or target a subset: `pytest tests/xturing/models -k gpt2`
18+
- CPU-only example: `CUDA_VISIBLE_DEVICES=-1 pytest -q -k cpu`
19+
- Local CLI/UI:
20+
- `xturing chat -m <path-to-model-dir>`
21+
- `python -c "from xturing.ui import Playground; Playground().launch()"`
22+
23+
## Coding Style & Naming Conventions
24+
- Python, 4-space indent; keep functions small and typed where practical.
25+
- Tools: black (PEP 8, 88 cols), isort (`--profile black`), autoflake (remove unused), absolufy-imports.
26+
- Naming: modules/functions `snake_case`, classes `PascalCase`, constants `UPPER_SNAKE`.
27+
- Prefer explicit imports from `xturing.*`; avoid unused/relative imports.
28+
29+
## Testing Guidelines
30+
- Framework: `pytest`. Place tests under `tests/xturing/<area>/test_*.py`.
31+
- Keep tests fast and deterministic; avoid network and large model downloads.
32+
- Use small fixtures (e.g., `TextDataset`, `InstructionDataset`) and CPU where possible.
33+
- Add tests for new code paths and regressions; run `pytest -q` before pushing.
34+
35+
## Commit & Pull Request Guidelines
36+
- Conventional, clear titles (<= 80 chars); avoid “WIP”. Provide context in body (wrap ~120 cols).
37+
- Examples: `feat(models): add llama2 INT4 path`, `fix(datasets): handle missing target`.
38+
- Run `pre-commit` and tests locally.
39+
- Open PRs against `dev`; include description, linked issues, and screenshots/CLI snippets when UI/UX changes.
40+
41+
## Security & Configuration Tips
42+
- Do not commit secrets. For API-backed features, export keys: `OPENAI_API_KEY`, `COHERE_API_KEY`, `AI21_API_KEY`.
43+
- Tunable defaults live in `src/xturing/config/*.yaml` (generation/finetuning). Document changes impacting behavior.

CLAUDE.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
xTuring is a Python library for fine-tuning, evaluation and data generation for Large Language Models (LLMs). It provides fast, efficient fine-tuning of open-source LLMs like Mistral, LLaMA, GPT-J with memory-efficient methods including LoRA and quantization (INT8/INT4).
7+
8+
## Development Commands
9+
10+
### Environment Setup
11+
```bash
12+
python -m venv .venv && source .venv/bin/activate
13+
pip install -e .
14+
pip install -r requirements-dev.txt
15+
```
16+
17+
### Code Quality & Pre-commit
18+
```bash
19+
pre-commit install && pre-commit install --hook-type commit-msg
20+
pre-commit run -a # runs black, isort, autoflake, yaml checks, gitlint, absolufy-imports
21+
```
22+
23+
### Testing
24+
```bash
25+
pytest -q # run all tests
26+
pytest tests/xturing/models -k gpt2 # run specific tests
27+
CUDA_VISIBLE_DEVICES=-1 pytest -q -k cpu # CPU-only tests
28+
```
29+
30+
### CLI Usage
31+
```bash
32+
xturing chat -m <path-to-model-dir> # chat interface
33+
python -c "from xturing.ui import Playground; Playground().launch()" # UI playground
34+
```
35+
36+
## Architecture Overview
37+
38+
### Core Components
39+
- **Models** (`src/xturing/models/`): Registry-based system supporting 15+ LLM architectures (LLaMA, GPT-2, Falcon, etc.) with variants for LoRA, INT8, and INT4 quantization
40+
- **Engines** (`src/xturing/engines/`): Inference engines handling model loading, generation, and quantization optimizations
41+
- **Datasets** (`src/xturing/datasets/`): Dataset abstractions for text, instruction, and text-to-image data
42+
- **Trainers** (`src/xturing/trainers/`): PyTorch Lightning-based training pipeline with DeepSpeed integration
43+
- **CLI** (`src/xturing/cli/`): Command-line interface with chat, UI, and API commands
44+
45+
### Registry Pattern
46+
The codebase uses a registry pattern (`src/xturing/registry.py`) where models, datasets, and engines register themselves by name:
47+
```python
48+
# Models register like: BaseModel.add_to_registry("llama_lora", LlamaLora)
49+
model = BaseModel.create("llama_lora") # Factory method access
50+
```
51+
52+
### Model Variants
53+
Models follow a naming convention:
54+
- Base: `llama`, `gpt2`, `falcon`
55+
- LoRA: `llama_lora`, `gpt2_lora`
56+
- INT8: `llama_int8`, `gpt2_int8`
57+
- Combined: `llama_lora_int8`
58+
- INT4: Use `GenericLoraKbitModel('<model_path>')` class
59+
60+
### Key Directories
61+
- `config/`: YAML configuration files for model defaults
62+
- `preprocessors/`: Data preprocessing utilities
63+
- `self_instruct/`: Self-instruction data generation
64+
- `model_apis/`: Integration with OpenAI, Cohere, AI21 APIs
65+
- `ui/`: Gradio-based UI components
66+
- `utils/`: Shared utilities and external logger configuration
67+
68+
## Development Guidelines
69+
70+
### Code Style
71+
- Python with 4-space indentation
72+
- Tools: black (88 cols), isort (--profile black), autoflake, absolufy-imports
73+
- Naming: `snake_case` for functions/modules, `PascalCase` for classes
74+
75+
### Testing
76+
- Framework: pytest with markers for `slow` and `gpu` tests
77+
- Keep tests fast and deterministic, avoid large model downloads
78+
- Use small fixtures and CPU where possible
79+
80+
### Environment Variables
81+
For API-backed features, export keys:
82+
- `OPENAI_API_KEY`
83+
- `COHERE_API_KEY`
84+
- `AI21_API_KEY`
85+
86+
### Pull Requests
87+
- Target `dev` branch
88+
- Run pre-commit and tests locally before submitting
89+
- Follow conventional commit format: `feat(models): add llama2 INT4 path`

CONTRIBUTING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
We welcome and appreciate contributions to xTuring! Whether it's a bug fix, a new feature, or simply a typo, every little bit helps.
44

5+
Before starting, please skim the [Repository Guidelines](AGENTS.md) for project structure, local commands, style, and testing conventions.
6+
57
## Getting Started
68

79
1. Fork the repository on GitHub

README.md

Lines changed: 52 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<img src=".github/stochastic_logo_light.svg#gh-light-mode-only" width="250" alt="Stochastic.ai"/>
33
<img src=".github/stochastic_logo_dark.svg#gh-dark-mode-only" width="250" alt="Stochastic.ai"/>
44
</p>
5-
<h3 align="center">Build, modify, and control your own personalized LLMs</h3>
5+
<h3 align="center">Fine‑tune, evaluate, and run private, personalized LLMs</h3>
66

77
<p align="center">
88
<a href="https://pypi.org/project/xturing/">
@@ -20,17 +20,14 @@
2020

2121
___
2222

23-
`xTuring` provides fast, efficient and simple fine-tuning of open-source LLMs, such as Mistral, LLaMA, GPT-J, and more.
24-
By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it
25-
simple to build, modify, and control LLMs. The entire process can be done inside your computer or in your
26-
private cloud, ensuring data privacy and security.
23+
`xTuring` makes it simple, fast, and cost‑efficient to fine‑tune open‑source LLMs (e.g., GPT‑OSS, LLaMA/LLaMA 2, Falcon, GPT‑J, GPT‑2, OPT, Bloom, Cerebras, Galactica) on your own data — locally or in your private cloud.
2724

28-
With `xTuring` you can,
29-
- Ingest data from different sources and preprocess them to a format LLMs can understand
30-
- Scale from single to multiple GPUs for faster fine-tuning
31-
- Leverage memory-efficient methods (i.e. INT4, LoRA fine-tuning) to reduce hardware costs by up to 90%
32-
- Explore different fine-tuning methods and benchmark them to find the best performing model
33-
- Evaluate fine-tuned models on well-defined metrics for in-depth analysis
25+
Why xTuring:
26+
- Simple API for data prep, training, and inference
27+
- Private by default: run locally or in your VPC
28+
- Efficient: LoRA and low‑precision (INT8/INT4) to cut costs
29+
- Scales from CPU/laptop to multi‑GPU easily
30+
- Evaluate models with built‑in metrics (e.g., perplexity)
3431

3532
<br>
3633

@@ -43,32 +40,52 @@ pip install xturing
4340

4441
## 🚀 Quickstart
4542

43+
Run a small, CPU‑friendly example first:
44+
4645
```python
4746
from xturing.datasets import InstructionDataset
4847
from xturing.models import BaseModel
4948

50-
# Load the dataset
51-
instruction_dataset = InstructionDataset("./examples/models/llama/alpaca_data")
49+
# Load a toy instruction dataset (Alpaca format)
50+
dataset = InstructionDataset("./examples/models/llama/alpaca_data")
5251

53-
# Initialize the model
54-
model = BaseModel.create("llama_lora")
52+
# Start small for quick iterations (works on CPU)
53+
model = BaseModel.create("distilgpt2_lora")
5554

56-
# Finetune the model
57-
model.finetune(dataset=instruction_dataset)
55+
# Fine‑tune and then generate
56+
model.finetune(dataset=dataset)
57+
output = model.generate(texts=["Explain quantum computing for beginners."])
58+
print(f"Model output: {output}")
59+
```
5860

59-
# Perform inference
60-
output = model.generate(texts=["Why LLM models are becoming so important?"])
61+
Want bigger models and reasoning controls? Try GPT‑OSS variants (requires significant resources):
62+
63+
```python
64+
from xturing.models import BaseModel
6165

62-
print("Generated output by the model: {}".format(output))
66+
# 120B or 20B variants; also support LoRA/INT8/INT4 configs
67+
model = BaseModel.create("gpt_oss_20b_lora")
6368
```
6469

6570
You can find the data folder [here](examples/models/llama/alpaca_data).
6671

6772
<br>
6873

6974
## 🌟 What's new?
70-
We are excited to announce the latest enhancements to our `xTuring` library:
71-
1. __`LLaMA 2` integration__ - You can use and fine-tune the _`LLaMA 2`_ model in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_ and _LoRA fine-tuning with INT4 precision_ using the `GenericModel` wrapper and/or you can use the `Llama2` class from `xturing.models` to test and finetune the model.
75+
Highlights from recent updates:
76+
1. __GPT‑OSS integration__ – Use and fine‑tune `gpt_oss_120b` and `gpt_oss_20b` with off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 options. Includes configurable reasoning levels and harmony response format support.
77+
```python
78+
from xturing.models import BaseModel
79+
80+
# Use the production-ready 120B model
81+
model = BaseModel.create('gpt_oss_120b_lora')
82+
83+
# Or use the efficient 20B model for faster inference
84+
model = BaseModel.create('gpt_oss_20b_lora')
85+
86+
# Both models support reasoning levels via system prompts
87+
```
88+
2. __LLaMA 2 integration__ – Off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 via `GenericModel` or `Llama2`.
7289
```python
7390
from xturing.models import Llama2
7491
model = Llama2()
@@ -78,7 +95,7 @@ from xturing.models import BaseModel
7895
model = BaseModel.create('llama2')
7996

8097
```
81-
2. __`Evaluation`__ - Now you can evaluate any `Causal Language Model` on any dataset. The metrics currently supported is [`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
98+
3. __Evaluation__ – Evaluate any causal LM on any dataset. Currently supports [`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
8299
```python
83100
# Make the necessary imports
84101
from xturing.datasets import InstructionDataset
@@ -87,8 +104,8 @@ from xturing.models import BaseModel
87104
# Load the desired dataset
88105
dataset = InstructionDataset('../llama/alpaca_data')
89106

90-
# Load the desired model
91-
model = BaseModel.create('gpt2')
107+
# Load the desired model (try GPT-OSS for advanced reasoning)
108+
model = BaseModel.create('gpt_oss_20b')
92109

93110
# Run the Evaluation of the model on the dataset
94111
result = model.evaluate(dataset)
@@ -97,7 +114,7 @@ result = model.evaluate(dataset)
97114
print(f"Perplexity of the evalution: {result}")
98115

99116
```
100-
3. __`INT4` Precision__ - You can now use and fine-tune any LLM with `INT4 Precision` using `GenericLoraKbitModel`.
117+
4. __INT4 precision__ – Fine‑tune many LLMs with INT4 using `GenericLoraKbitModel`.
101118
```python
102119
# Make the necessary imports
103120
from xturing.datasets import InstructionDataset
@@ -113,7 +130,7 @@ model = GenericLoraKbitModel('tiiuae/falcon-7b')
113130
model.finetune(dataset)
114131
```
115132

116-
4. __CPU inference__ - The CPU, including laptop CPUs, is now fully equipped to handle LLM inference. We integrated [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) to conserve memory by compressing the model with [weight-only quantization algorithms](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md) and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.
133+
5. __CPU inference__ – Run inference on CPUs (including laptops) via [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), using weightonly quantization and optimized kernels on Intel platforms.
117134

118135
```python
119136
# Make the necessary imports
@@ -128,7 +145,7 @@ output = model.generate(texts=["Why LLM models are becoming so important?"])
128145
print(output)
129146
```
130147

131-
5. __Batch integration__ - By tweaking the 'batch_size' in the .generate() and .evaluate() functions, you can expedite results. Using a 'batch_size' greater than 1 typically enhances processing efficiency.
148+
6. __Batching__ – Set `batch_size` in `.generate()` and `.evaluate()` to speed up processing.
132149
```python
133150
# Make the necessary imports
134151
from xturing.datasets import InstructionDataset
@@ -220,7 +237,7 @@ Contribute to this by submitting your performance results on other GPUs by creat
220237

221238
<br>
222239

223-
## 📎 Fine-tuned model checkpoints
240+
## 📎 Finetuned model checkpoints
224241
We have already fine-tuned some models that you can use as your base or start playing with.
225242
Here is how you would load them:
226243

@@ -246,25 +263,26 @@ Below is a list of all the supported models via `BaseModel` class of `xTuring` a
246263
|DistilGPT-2 | distilgpt2|
247264
|Falcon-7B | falcon|
248265
|Galactica | galactica|
266+
|GPT-OSS (20B/120B) | gpt_oss_20b, gpt_oss_120b|
249267
|GPT-J | gptj|
250268
|GPT-2 | gpt2|
251-
|LlaMA | llama|
252-
|LlaMA2 | llama2|
269+
|LLaMA | llama|
270+
|LLaMA2 | llama2|
253271
|OPT-1.3B | opt|
254272

255-
The above mentioned are the base variants of the LLMs. Below are the templates to get their `LoRA`, `INT8`, `INT8 + LoRA` and `INT4 + LoRA` versions.
273+
The above are the base variants. Use these templates for `LoRA`, `INT8`, and `INT8 + LoRA` versions:
256274

257275
| Version | Template |
258276
| -- | -- |
259277
| LoRA| <model_key>_lora|
260278
| INT8| <model_key>_int8|
261279
| INT8 + LoRA| <model_key>_lora_int8|
262280

263-
** In order to load any model's __`INT4+LoRA`__ version, you will need to make use of `GenericLoraKbitModel` class from `xturing.models`. Below is how to use it:
281+
To load a model’s __INT4 + LoRA__ version, use the `GenericLoraKbitModel` class:
264282
```python
265283
model = GenericLoraKbitModel('<model_path>')
266284
```
267-
The `model_path` can be replaced with you local directory or any HuggingFace library model like `facebook/opt-1.3b`.
285+
Replace `<model_path>` with a local directory or a Hugging Face model like `facebook/opt-1.3b`.
268286

269287
## 📈 Roadmap
270288
- [x] Support for `LLaMA`, `GPT-J`, `GPT-2`, `OPT`, `Cerebras-GPT`, `Galactica` and `Bloom` models

pyproject.toml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,15 @@ keywords = [
4343
dependencies = [
4444
"torch >= 1.9.0",
4545
"pytorch-lightning",
46-
"transformers==4.39.3",
46+
"transformers>=4.53.0",
4747
"datasets==2.14.5",
48+
"pyarrow >= 8.0.0, < 21.0.0",
49+
"scipy >= 1.0.0",
4850
"evaluate==0.4.0",
4951
"bitsandbytes==0.41.1",
5052
"sentencepiece",
51-
"deepspeed==0.9.5",
52-
"gradio",
53+
"deepspeed>=0.15.1",
54+
"gradio>=5.31.0",
5355
"click",
5456
"wget",
5557
"ai21",

pytest.ini

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[pytest]
2+
testpaths = tests
3+
addopts = -q
4+
markers =
5+
slow: marks tests as slow (deselect with '-m "not slow"')
6+
gpu: requires a GPU-enabled environment
7+
filterwarnings =
8+
ignore::DeprecationWarning

requirements-dev.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
pre-commit
22
pytest
33
autoflake
4-
absoulify-imports
4+
absolufy-imports
5+
pyarrow >= 8.0.0, < 21.0.0
6+
scipy >= 1.0.0

0 commit comments

Comments
 (0)