You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+
## Project Overview
6
+
xTuring is a Python library for fine-tuning, evaluation and data generation for Large Language Models (LLMs). It provides fast, efficient fine-tuning of open-source LLMs like Mistral, LLaMA, GPT-J with memory-efficient methods including LoRA and quantization (INT8/INT4).
-**Models** (`src/xturing/models/`): Registry-based system supporting 15+ LLM architectures (LLaMA, GPT-2, Falcon, etc.) with variants for LoRA, INT8, and INT4 quantization
40
+
-**Engines** (`src/xturing/engines/`): Inference engines handling model loading, generation, and quantization optimizations
41
+
-**Datasets** (`src/xturing/datasets/`): Dataset abstractions for text, instruction, and text-to-image data
42
+
-**Trainers** (`src/xturing/trainers/`): PyTorch Lightning-based training pipeline with DeepSpeed integration
43
+
-**CLI** (`src/xturing/cli/`): Command-line interface with chat, UI, and API commands
44
+
45
+
### Registry Pattern
46
+
The codebase uses a registry pattern (`src/xturing/registry.py`) where models, datasets, and engines register themselves by name:
<h3align="center">Build, modify, and control your own personalized LLMs</h3>
5
+
<h3align="center">Fine‑tune, evaluate, and run private, personalized LLMs</h3>
6
6
7
7
<palign="center">
8
8
<ahref="https://pypi.org/project/xturing/">
@@ -20,17 +20,14 @@
20
20
21
21
___
22
22
23
-
`xTuring` provides fast, efficient and simple fine-tuning of open-source LLMs, such as Mistral, LLaMA, GPT-J, and more.
24
-
By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it
25
-
simple to build, modify, and control LLMs. The entire process can be done inside your computer or in your
26
-
private cloud, ensuring data privacy and security.
23
+
`xTuring` makes it simple, fast, and cost‑efficient to fine‑tune open‑source LLMs (e.g., GPT‑OSS, LLaMA/LLaMA 2, Falcon, GPT‑J, GPT‑2, OPT, Bloom, Cerebras, Galactica) on your own data — locally or in your private cloud.
27
24
28
-
With `xTuring` you can,
29
-
-Ingest data from different sources and preprocess them to a format LLMs can understand
30
-
-Scale from single to multiple GPUs for faster fine-tuning
31
-
-Leverage memory-efficient methods (i.e. INT4, LoRA fine-tuning) to reduce hardware costs by up to 90%
32
-
-Explore different fine-tuning methods and benchmark them to find the best performing model
33
-
- Evaluate fine-tuned models on well-defined metrics for in-depth analysis
25
+
Why xTuring:
26
+
-Simple API for data prep, training, and inference
27
+
-Private by default: run locally or in your VPC
28
+
-Efficient: LoRA and low‑precision (INT8/INT4) to cut costs
29
+
-Scales from CPU/laptop to multi‑GPU easily
30
+
- Evaluate models with built‑in metrics (e.g., perplexity)
print("Generated output by the model: {}".format(output))
66
+
# 120B or 20B variants; also support LoRA/INT8/INT4 configs
67
+
model = BaseModel.create("gpt_oss_20b_lora")
63
68
```
64
69
65
70
You can find the data folder [here](examples/models/llama/alpaca_data).
66
71
67
72
<br>
68
73
69
74
## 🌟 What's new?
70
-
We are excited to announce the latest enhancements to our `xTuring` library:
71
-
1.__`LLaMA 2` integration__ - You can use and fine-tune the _`LLaMA 2`_ model in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_ and _LoRA fine-tuning with INT4 precision_ using the `GenericModel` wrapper and/or you can use the `Llama2` class from `xturing.models` to test and finetune the model.
75
+
Highlights from recent updates:
76
+
1.__GPT‑OSS integration__ – Use and fine‑tune `gpt_oss_120b` and `gpt_oss_20b` with off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 options. Includes configurable reasoning levels and harmony response format support.
77
+
```python
78
+
from xturing.models import BaseModel
79
+
80
+
# Use the production-ready 120B model
81
+
model = BaseModel.create('gpt_oss_120b_lora')
82
+
83
+
# Or use the efficient 20B model for faster inference
84
+
model = BaseModel.create('gpt_oss_20b_lora')
85
+
86
+
# Both models support reasoning levels via system prompts
87
+
```
88
+
2.__LLaMA 2 integration__ – Off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 via `GenericModel` or `Llama2`.
72
89
```python
73
90
from xturing.models import Llama2
74
91
model = Llama2()
@@ -78,7 +95,7 @@ from xturing.models import BaseModel
78
95
model = BaseModel.create('llama2')
79
96
80
97
```
81
-
2.__`Evaluation`__ - Now you can evaluate any `Causal Language Model`on any dataset. The metrics currently supported is[`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
98
+
3.__Evaluation__ – Evaluate any causal LM on any dataset. Currently supports[`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
82
99
```python
83
100
# Make the necessary imports
84
101
from xturing.datasets import InstructionDataset
@@ -87,8 +104,8 @@ from xturing.models import BaseModel
# Load the desired model (try GPT-OSS for advanced reasoning)
108
+
model = BaseModel.create('gpt_oss_20b')
92
109
93
110
# Run the Evaluation of the model on the dataset
94
111
result = model.evaluate(dataset)
@@ -97,7 +114,7 @@ result = model.evaluate(dataset)
97
114
print(f"Perplexity of the evalution: {result}")
98
115
99
116
```
100
-
3.__`INT4` Precision__ - You can now use and fine-tune any LLM with `INT4 Precision` using `GenericLoraKbitModel`.
117
+
4.__INT4 precision__ – Fine‑tune many LLMs with INT4 using `GenericLoraKbitModel`.
101
118
```python
102
119
# Make the necessary imports
103
120
from xturing.datasets import InstructionDataset
@@ -113,7 +130,7 @@ model = GenericLoraKbitModel('tiiuae/falcon-7b')
113
130
model.finetune(dataset)
114
131
```
115
132
116
-
4.__CPU inference__- The CPU, including laptop CPUs, is now fully equipped to handle LLM inference. We integrated [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) to conserve memory by compressing the model with [weight-only quantization algorithms](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md)and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.
133
+
5.__CPU inference__– Run inference on CPUs (including laptops) via [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), using weight‑only quantization and optimized kernels on Intel platforms.
117
134
118
135
```python
119
136
# Make the necessary imports
@@ -128,7 +145,7 @@ output = model.generate(texts=["Why LLM models are becoming so important?"])
128
145
print(output)
129
146
```
130
147
131
-
5.__Batch integration__ - By tweaking the 'batch_size' in the .generate() and .evaluate() functions, you can expedite results. Using a 'batch_size' greater than 1 typically enhances processing efficiency.
148
+
6.__Batching__ – Set `batch_size` in `.generate()` and `.evaluate()` to speed up processing.
132
149
```python
133
150
# Make the necessary imports
134
151
from xturing.datasets import InstructionDataset
@@ -220,7 +237,7 @@ Contribute to this by submitting your performance results on other GPUs by creat
220
237
221
238
<br>
222
239
223
-
## 📎 Fine-tuned model checkpoints
240
+
## 📎 Fine‑tuned model checkpoints
224
241
We have already fine-tuned some models that you can use as your base or start playing with.
225
242
Here is how you would load them:
226
243
@@ -246,25 +263,26 @@ Below is a list of all the supported models via `BaseModel` class of `xTuring` a
246
263
|DistilGPT-2 | distilgpt2|
247
264
|Falcon-7B | falcon|
248
265
|Galactica | galactica|
266
+
|GPT-OSS (20B/120B) | gpt_oss_20b, gpt_oss_120b|
249
267
|GPT-J | gptj|
250
268
|GPT-2 | gpt2|
251
-
|LlaMA| llama|
252
-
|LlaMA2| llama2|
269
+
|LLaMA| llama|
270
+
|LLaMA2| llama2|
253
271
|OPT-1.3B | opt|
254
272
255
-
The above mentioned are the base variants of the LLMs. Below are the templates to get their `LoRA`, `INT8`, `INT8 + LoRA`and `INT4 + LoRA` versions.
273
+
The above are the base variants. Use these templates for `LoRA`, `INT8`, and `INT8 + LoRA` versions:
256
274
257
275
| Version | Template |
258
276
| -- | -- |
259
277
| LoRA| <model_key>_lora|
260
278
| INT8| <model_key>_int8|
261
279
| INT8 + LoRA| <model_key>_lora_int8|
262
280
263
-
** In order to load any model's __`INT4+LoRA`__version, you will need to make use of`GenericLoraKbitModel` class from `xturing.models`. Below is how to use it:
281
+
To load a model’s __INT4 + LoRA__version, use the`GenericLoraKbitModel` class:
264
282
```python
265
283
model = GenericLoraKbitModel('<model_path>')
266
284
```
267
-
The `model_path` can be replaced with you local directory or any HuggingFace library model like `facebook/opt-1.3b`.
285
+
Replace `<model_path>`with a local directory or a Hugging Face model like `facebook/opt-1.3b`.
268
286
269
287
## 📈 Roadmap
270
288
-[x] Support for `LLaMA`, `GPT-J`, `GPT-2`, `OPT`, `Cerebras-GPT`, `Galactica` and `Bloom` models
0 commit comments