Skip to content

Commit d6d0869

Browse files
authored
Merge pull request #1 from bigcode-project/finetune
Add instructions and code for loading and finetuning StarCoder2 models
2 parents b8d318c + ce4e46f commit d6d0869

File tree

3 files changed

+331
-1
lines changed

3 files changed

+331
-1
lines changed

README.md

Lines changed: 177 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,177 @@
1-
# starcoder2
1+
# StarCoder 2
2+
3+
<p align="center"><a href="https://huggingface.co/bigcode">[🤗 Models]</a> | <a href="">[Paper]</a> | <a href="https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode">[VSCode]</a>
4+
</p>
5+
6+
StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from [The Stack v2]() and some natural language text such as Wikipedia, Arxiv, and GitHub issues. The models use Grouped Query Attention, a context window of 16,384 tokens, with sliding window attention of 4,096 tokens. The 3B & 7B models were trained on 3+ trillion tokens, while the 15B was trained on 4+ trillion tokens.
7+
8+
9+
# Disclaimer
10+
11+
Before you can use the models, go to `hf.co/bigcode/starcoder2-15b` and accept the agreement, and make sure you are logged into the Hugging Face hub:
12+
```bash
13+
huggingface-cli login
14+
```
15+
16+
# Table of Contents
17+
1. [Quickstart](#quickstart)
18+
- [Installation](#installation)
19+
- [Model usage and memory footprint](#model-usage-and-memory-footprint)
20+
- [Text-generation-inference code](#text-generation-inference)
21+
2. [Fine-tuning](#fine-tuning)
22+
- [Setup](#setup)
23+
- [Training](#training)
24+
3. [Evaluation](#evaluation)
25+
26+
# Quickstart
27+
StarCoder2 models are intended for code completion, they are not instruction models and commands like "Write a function that computes the square root." do not work well.
28+
29+
## Installation
30+
First, we have to install all the libraries listed in `requirements.txt`
31+
```bash
32+
pip install -r requirements.txt
33+
# export your HF token, found here: https://huggingface.co/settings/account
34+
export HF_TOKEN=xxx
35+
```
36+
37+
## Model usage and memory footprint
38+
Here are some examples to load the model and generate code, with the memory footprint of the largest model, `StarCoder2-15B`. Ensure you've installed `transformers` from source (it should be the case if you used `requirements.txt`)
39+
```bash
40+
pip install git+https://github.com/huggingface/transformers.git
41+
```
42+
43+
### Running the model on CPU/GPU/multi GPU
44+
* _Using full precision_
45+
```python
46+
# pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main
47+
from transformers import AutoModelForCausalLM, AutoTokenizer
48+
49+
checkpoint = "bigcode/starcoder2-15b"
50+
device = "cuda" # for GPU usage or "cpu" for CPU usage
51+
52+
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
53+
# to use Multiple GPUs do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
54+
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
55+
56+
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
57+
outputs = model.generate(inputs)
58+
print(tokenizer.decode(outputs[0]))
59+
```
60+
61+
* _Using `torch.bfloat16`_
62+
```python
63+
# pip install accelerate
64+
import torch
65+
from transformers import AutoTokenizer, AutoModelForCausalLM
66+
67+
checkpoint = "bigcode/starcoder2-15b"
68+
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
69+
70+
# for fp16 use `torch_dtype=torch.float16` instead
71+
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)
72+
73+
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
74+
outputs = model.generate(inputs)
75+
print(tokenizer.decode(outputs[0]))
76+
```
77+
```bash
78+
>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
79+
Memory footprint: 32251.33 MB
80+
```
81+
82+
### Quantized Versions through `bitsandbytes`
83+
* _Using 8-bit precision (int8)_
84+
85+
```python
86+
# pip install bitsandbytes accelerate
87+
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
88+
89+
# to use 4bit use `load_in_4bit=True` instead
90+
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
91+
92+
checkpoint = "bigcode/starcoder2-15b_16k"
93+
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
94+
model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder2-15b_16k", quantization_config=quantization_config)
95+
96+
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
97+
outputs = model.generate(inputs)
98+
print(tokenizer.decode(outputs[0]))
99+
```
100+
```bash
101+
>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
102+
# load_in_8bit
103+
Memory footprint: 16900.18 MB
104+
# load_in_4bit
105+
>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
106+
Memory footprint: 9224.60 MB
107+
```
108+
You can also use `pipeline` for the generation:
109+
```python
110+
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
111+
checkpoint = "bigcode/starcoder2-15b"
112+
113+
model = AutoModelForCausalLM.from_pretrained(checkpoint)
114+
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
115+
116+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
117+
print( pipe("def hello():") )
118+
```
119+
120+
## Text-generation-inference:
121+
TODO
122+
123+
```bash
124+
docker run -p 8080:80 -v $PWD/data:/data -e HUGGING_FACE_HUB_TOKEN=<YOUR BIGCODE ENABLED TOKEN> -d ghcr.io/huggingface/text-generation-inference:latest --model-id bigcode/starcoder2-15b --max-total-tokens 8192
125+
```
126+
For more details, see [here](https://github.com/huggingface/text-generation-inference).
127+
128+
# Fine-tuning
129+
130+
Here, we showcase how you can fine-tune StarCoder2 models.
131+
132+
## Setup
133+
134+
Install `pytorch` [see documentation](https://pytorch.org/), for example the following command works with cuda 12.1:
135+
```bash
136+
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
137+
```
138+
139+
Install the requirements (this installs `transformers` from source to support the StarCoder2 architecture):
140+
```bash
141+
pip install -r requirements.txt
142+
```
143+
144+
Before you run any of the scripts make sure you are logged in `wandb` and HuggingFace Hub to push the checkpoints:
145+
```bash
146+
wandb login
147+
huggingface-cli login
148+
```
149+
Now that everything is done, you can clone the repository and get into the corresponding directory.
150+
151+
## Training
152+
To fine-tune efficiently with a low cost, we use [PEFT](https://github.com/huggingface/peft) library for Low-Rank Adaptation (LoRA) training and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for 4bit quantization. We also use the `SFTTrainer` from [TRL](https://github.com/huggingface/trl).
153+
154+
155+
For this example, we will fine-tune StarCoder2-3b on the `Rust` subset of [the-stack-smol](https://huggingface.co/datasets/bigcode/the-stack-smol). This is just for illustration purposes; for a larger and cleaner dataset of Rust code, you can use [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup).
156+
157+
To launch the training:
158+
```bash
159+
accelerate launch finetune.py \
160+
--model_id "bigcode/starcoder2-3b" \
161+
--dataset_name "bigcode/the-stack-smol" \
162+
--subset "data/rust" \
163+
--dataset_text_field "content" \
164+
--split "train" \
165+
--max_seq_length 1024 \
166+
--max_steps 10000 \
167+
--micro_batch_size 1 \
168+
--gradient_accumulation_steps 8 \
169+
--learning_rate 2e-5 \
170+
--warmup_steps 20 \
171+
--num_proc "$(nproc)"
172+
```
173+
174+
If you want to fine-tune on other text datasets, you need to change `dataset_text_field` argument to the name of the column containing the code/text you want to train on.
175+
176+
# Evaluation
177+
To evaluate StarCoder2 and its derivatives, you can use the [BigCode-Evaluation-Harness](https://github.com/bigcode-project/bigcode-evaluation-harness) for evaluating Code LLMs.

finetune.py

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Code adapted from https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/supervised_finetuning.py
2+
# and https://huggingface.co/blog/gemma-peft
3+
import argparse
4+
import multiprocessing
5+
import os
6+
7+
import torch
8+
import transformers
9+
from accelerate import PartialState
10+
from datasets import load_dataset
11+
from peft import LoraConfig
12+
from transformers import (
13+
AutoModelForCausalLM,
14+
BitsAndBytesConfig,
15+
logging,
16+
set_seed,
17+
)
18+
from trl import SFTTrainer
19+
20+
21+
def get_args():
22+
parser = argparse.ArgumentParser()
23+
parser.add_argument("--model_id", type=str, default="bigcode/starcoder2-3b")
24+
parser.add_argument("--dataset_name", type=str, default="the-stack-smol")
25+
parser.add_argument("--subset", type=str, default="data/rust")
26+
parser.add_argument("--split", type=str, default="train")
27+
parser.add_argument("--dataset_text_field", type=str, default="content")
28+
29+
parser.add_argument("--max_seq_length", type=int, default=1024)
30+
parser.add_argument("--max_steps", type=int, default=1000)
31+
parser.add_argument("--micro_batch_size", type=int, default=1)
32+
parser.add_argument("--gradient_accumulation_steps", type=int, default=4)
33+
parser.add_argument("--weight_decay", type=float, default=0.01)
34+
parser.add_argument("--bf16", type=bool, default=True)
35+
36+
parser.add_argument("--attention_dropout", type=float, default=0.1)
37+
parser.add_argument("--learning_rate", type=float, default=2e-4)
38+
parser.add_argument("--lr_scheduler_type", type=str, default="cosine")
39+
parser.add_argument("--warmup_steps", type=int, default=100)
40+
parser.add_argument("--seed", type=int, default=0)
41+
parser.add_argument("--output_dir", type=str, default="finetune_starcoder2")
42+
parser.add_argument("--num_proc", type=int, default=None)
43+
parser.add_argument("--push_to_hub", type=bool, default=True)
44+
return parser.parse_args()
45+
46+
47+
def print_trainable_parameters(model):
48+
"""
49+
Prints the number of trainable parameters in the model.
50+
"""
51+
trainable_params = 0
52+
all_param = 0
53+
for _, param in model.named_parameters():
54+
all_param += param.numel()
55+
if param.requires_grad:
56+
trainable_params += param.numel()
57+
print(
58+
f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
59+
)
60+
61+
62+
def main(args):
63+
# config
64+
bnb_config = BitsAndBytesConfig(
65+
load_in_4bit=True,
66+
bnb_4bit_quant_type="nf4",
67+
bnb_4bit_compute_dtype=torch.bfloat16,
68+
)
69+
lora_config = LoraConfig(
70+
r=8,
71+
target_modules=[
72+
"q_proj",
73+
"o_proj",
74+
"k_proj",
75+
"v_proj",
76+
"gate_proj",
77+
"up_proj",
78+
"down_proj",
79+
],
80+
task_type="CAUSAL_LM",
81+
)
82+
83+
# load model and dataset
84+
token = os.environ.get("HF_TOKEN", None)
85+
model = AutoModelForCausalLM.from_pretrained(
86+
args.model_id,
87+
quantization_config=bnb_config,
88+
device_map={"": PartialState().process_index},
89+
token=token,
90+
attention_dropout=args.attention_dropout,
91+
)
92+
print_trainable_parameters(model)
93+
94+
data = load_dataset(
95+
args.dataset_name,
96+
data_dir=args.subset,
97+
split=args.split,
98+
token=token,
99+
num_proc=args.num_proc if args.num_proc else multiprocessing.cpu_count(),
100+
)
101+
102+
# setup the trainer
103+
trainer = SFTTrainer(
104+
model=model,
105+
train_dataset=data,
106+
max_seq_length=args.max_seq_length,
107+
args=transformers.TrainingArguments(
108+
per_device_train_batch_size=args.micro_batch_size,
109+
gradient_accumulation_steps=args.gradient_accumulation_steps,
110+
warmup_steps=args.warmup_steps,
111+
max_steps=args.max_steps,
112+
learning_rate=args.learning_rate,
113+
lr_scheduler_type=args.lr_scheduler_type,
114+
weight_decay=args.weight_decay,
115+
bf16=args.bf16,
116+
logging_strategy="steps",
117+
logging_steps=10,
118+
output_dir=args.output_dir,
119+
optim="paged_adamw_8bit",
120+
seed=args.seed,
121+
run_name=f"train-{args.model_id.split('/')[-1]}",
122+
report_to="wandb",
123+
),
124+
peft_config=lora_config,
125+
dataset_text_field=args.dataset_text_field,
126+
)
127+
128+
# launch
129+
print("Training...")
130+
trainer.train()
131+
132+
print("Saving the last checkpoint of the model")
133+
model.save_pretrained(os.path.join(args.output_dir, "final_checkpoint/"))
134+
if args.push_to_hub:
135+
trainer.push_to_hub("Upload model")
136+
print("Training Done! 💥")
137+
138+
139+
if __name__ == "__main__":
140+
args = get_args()
141+
set_seed(args.seed)
142+
os.makedirs(args.output_dir, exist_ok=True)
143+
144+
logging.set_verbosity_error()
145+
146+
main(args)

requirements.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
git+https://github.com/huggingface/transformers.git
2+
accelerate==0.27.1
3+
datasets>=2.16.1
4+
bitsandbytes==0.41.3
5+
peft==0.8.2
6+
trl==0.7.10
7+
wandb==0.16.3
8+
huggingface_hub==0.20.3

0 commit comments

Comments
 (0)