Merge pull request #1 from bigcode-project/finetune

loubnabnl · web-flow · commit d6d08693bc84 · 2024-02-27T13:45:13.000+01:00
Add instructions and code for loading and finetuning StarCoder2 models
diff --git a/README.md b/README.md
@@ -1 +1,177 @@
-# starcoder2
+# StarCoder 2
+
+<p align="center"><a href="https://huggingface.co/bigcode">[🤗 Models]</a> | <a href="">[Paper]</a>  | <a href="https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode">[VSCode]</a> 
+</p>
+
+StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from [The Stack v2]() and some natural language text such as Wikipedia, Arxiv, and GitHub issues. The models use Grouped Query Attention, a context window of 16,384 tokens, with sliding window attention of 4,096 tokens. The 3B & 7B models were trained on 3+ trillion tokens, while the 15B was trained on 4+ trillion tokens.
+
+
+# Disclaimer
+
+Before you can use the models, go to `hf.co/bigcode/starcoder2-15b` and accept the agreement, and make sure you are logged into the Hugging Face hub:
+```bash
+huggingface-cli login
+```
+
+# Table of Contents
+1. [Quickstart](#quickstart)
+    - [Installation](#installation)
+    - [Model usage and memory footprint](#model-usage-and-memory-footprint)
+    - [Text-generation-inference code](#text-generation-inference)
+2. [Fine-tuning](#fine-tuning)
+    - [Setup](#setup)
+    - [Training](#training)
+3. [Evaluation](#evaluation)
+
+# Quickstart
+StarCoder2 models are intended for code completion, they are not instruction models and commands like "Write a function that computes the square root." do not work well. 
+
+## Installation
+First, we have to install all the libraries listed in `requirements.txt`
+```bash
+pip install -r requirements.txt
+# export your HF token, found here: https://huggingface.co/settings/account
+export HF_TOKEN=xxx
+```
+
+## Model usage and memory footprint
+Here are some examples to load the model and generate code, with the memory footprint of the largest model, `StarCoder2-15B`. Ensure you've installed `transformers` from source (it should be the case if you used `requirements.txt`)
+```bash
+pip install git+https://github.com/huggingface/transformers.git
+```
+
+### Running the model on CPU/GPU/multi GPU
+* _Using full precision_
+```python
+# pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+checkpoint = "bigcode/starcoder2-15b"
+device = "cuda" # for GPU usage or "cpu" for CPU usage
+
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+# to use Multiple GPUs do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
+model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
+
+inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
+outputs = model.generate(inputs)
+print(tokenizer.decode(outputs[0]))
+```
+
+* _Using `torch.bfloat16`_
+```python
+# pip install accelerate
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+checkpoint = "bigcode/starcoder2-15b"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+
+# for fp16 use `torch_dtype=torch.float16` instead
+model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)
+
+inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
+outputs = model.generate(inputs)
+print(tokenizer.decode(outputs[0]))
+```
+```bash
+>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
+Memory footprint: 32251.33 MB
+```
+
+### Quantized Versions through `bitsandbytes`
+* _Using 8-bit precision (int8)_
+
+```python
+# pip install bitsandbytes accelerate
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+
+# to use 4bit use `load_in_4bit=True` instead
+quantization_config = BitsAndBytesConfig(load_in_8bit=True)
+
+checkpoint = "bigcode/starcoder2-15b_16k"
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder2-15b_16k", quantization_config=quantization_config)
+
+inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
+outputs = model.generate(inputs)
+print(tokenizer.decode(outputs[0]))
+```
+```bash
+>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
+# load_in_8bit
+Memory footprint: 16900.18 MB
+# load_in_4bit
+>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
+Memory footprint: 9224.60 MB
+```
+You can also use `pipeline` for the generation:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+checkpoint = "bigcode/starcoder2-15b"
+
+model = AutoModelForCausalLM.from_pretrained(checkpoint)
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+
+pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
+print( pipe("def hello():") )
+```
+
+## Text-generation-inference: 
+TODO
+
+```bash
+docker run -p 8080:80 -v $PWD/data:/data -e HUGGING_FACE_HUB_TOKEN=<YOUR BIGCODE ENABLED TOKEN> -d  ghcr.io/huggingface/text-generation-inference:latest --model-id bigcode/starcoder2-15b --max-total-tokens 8192
+```
+For more details, see [here](https://github.com/huggingface/text-generation-inference).
+
+# Fine-tuning
+
+Here, we showcase how you can fine-tune StarCoder2 models.
+
+## Setup
+
+Install `pytorch` [see documentation](https://pytorch.org/), for example the following command works with cuda 12.1:
+```bash
+conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
+```
+
+Install the requirements (this installs `transformers` from source to support the StarCoder2 architecture):
+```bash
+pip install -r requirements.txt
+```
+
+Before you run any of the scripts make sure you are logged in `wandb` and HuggingFace Hub to push the checkpoints:
+```bash
+wandb login
+huggingface-cli login
+``` 
+Now that everything is done, you can clone the repository and get into the corresponding directory.
+
+## Training
+To fine-tune efficiently with a low cost, we use [PEFT](https://github.com/huggingface/peft) library for Low-Rank Adaptation (LoRA) training and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for 4bit quantization. We also use the `SFTTrainer` from [TRL](https://github.com/huggingface/trl).
+
+
+For this example, we will fine-tune StarCoder2-3b on the `Rust` subset of [the-stack-smol](https://huggingface.co/datasets/bigcode/the-stack-smol). This is just for illustration purposes; for a larger and cleaner dataset of Rust code, you can use [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup). 
+
+To launch the training:
+```bash
+accelerate launch finetune.py \
+        --model_id "bigcode/starcoder2-3b" \
+        --dataset_name "bigcode/the-stack-smol" \
+        --subset "data/rust" \
+        --dataset_text_field "content" \
+        --split "train" \
+        --max_seq_length 1024 \
+        --max_steps 10000 \
+        --micro_batch_size 1 \
+        --gradient_accumulation_steps 8 \
+        --learning_rate 2e-5 \
+        --warmup_steps 20 \
+        --num_proc "$(nproc)"
+```
+
+If you want to fine-tune on other text datasets, you need to change `dataset_text_field` argument to the name of the column containing the code/text you want to train on.
+ 
+# Evaluation
+To evaluate StarCoder2 and its derivatives, you can use the [BigCode-Evaluation-Harness](https://github.com/bigcode-project/bigcode-evaluation-harness) for evaluating Code LLMs.
diff --git a/finetune.py b/finetune.py
@@ -0,0 +1,146 @@
+# Code adapted from https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/supervised_finetuning.py
+# and https://huggingface.co/blog/gemma-peft
+import argparse
+import multiprocessing
+import os
+
+import torch
+import transformers
+from accelerate import PartialState
+from datasets import load_dataset
+from peft import LoraConfig
+from transformers import (
+    AutoModelForCausalLM,
+    BitsAndBytesConfig,
+    logging,
+    set_seed,
+)
+from trl import SFTTrainer
+
+
+def get_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model_id", type=str, default="bigcode/starcoder2-3b")
+    parser.add_argument("--dataset_name", type=str, default="the-stack-smol")
+    parser.add_argument("--subset", type=str, default="data/rust")
+    parser.add_argument("--split", type=str, default="train")
+    parser.add_argument("--dataset_text_field", type=str, default="content")
+
+    parser.add_argument("--max_seq_length", type=int, default=1024)
+    parser.add_argument("--max_steps", type=int, default=1000)
+    parser.add_argument("--micro_batch_size", type=int, default=1)
+    parser.add_argument("--gradient_accumulation_steps", type=int, default=4)
+    parser.add_argument("--weight_decay", type=float, default=0.01)
+    parser.add_argument("--bf16", type=bool, default=True)
+
+    parser.add_argument("--attention_dropout", type=float, default=0.1)
+    parser.add_argument("--learning_rate", type=float, default=2e-4)
+    parser.add_argument("--lr_scheduler_type", type=str, default="cosine")
+    parser.add_argument("--warmup_steps", type=int, default=100)
+    parser.add_argument("--seed", type=int, default=0)
+    parser.add_argument("--output_dir", type=str, default="finetune_starcoder2")
+    parser.add_argument("--num_proc", type=int, default=None)
+    parser.add_argument("--push_to_hub", type=bool, default=True)
+    return parser.parse_args()
+
+
+def print_trainable_parameters(model):
+    """
+    Prints the number of trainable parameters in the model.
+    """
+    trainable_params = 0
+    all_param = 0
+    for _, param in model.named_parameters():
+        all_param += param.numel()
+        if param.requires_grad:
+            trainable_params += param.numel()
+    print(
+        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
+    )
+
+
+def main(args):
+    # config
+    bnb_config = BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_quant_type="nf4",
+        bnb_4bit_compute_dtype=torch.bfloat16,
+    )
+    lora_config = LoraConfig(
+        r=8,
+        target_modules=[
+            "q_proj",
+            "o_proj",
+            "k_proj",
+            "v_proj",
+            "gate_proj",
+            "up_proj",
+            "down_proj",
+        ],
+        task_type="CAUSAL_LM",
+    )
+
+    # load model and dataset
+    token = os.environ.get("HF_TOKEN", None)
+    model = AutoModelForCausalLM.from_pretrained(
+        args.model_id,
+        quantization_config=bnb_config,
+        device_map={"": PartialState().process_index},
+        token=token,
+        attention_dropout=args.attention_dropout,
+    )
+    print_trainable_parameters(model)
+
+    data = load_dataset(
+        args.dataset_name,
+        data_dir=args.subset,
+        split=args.split,
+        token=token,
+        num_proc=args.num_proc if args.num_proc else multiprocessing.cpu_count(),
+    )
+
+    # setup the trainer
+    trainer = SFTTrainer(
+        model=model,
+        train_dataset=data,
+        max_seq_length=args.max_seq_length,
+        args=transformers.TrainingArguments(
+            per_device_train_batch_size=args.micro_batch_size,
+            gradient_accumulation_steps=args.gradient_accumulation_steps,
+            warmup_steps=args.warmup_steps,
+            max_steps=args.max_steps,
+            learning_rate=args.learning_rate,
+            lr_scheduler_type=args.lr_scheduler_type,
+            weight_decay=args.weight_decay,
+            bf16=args.bf16,
+            logging_strategy="steps",
+            logging_steps=10,
+            output_dir=args.output_dir,
+            optim="paged_adamw_8bit",
+            seed=args.seed,
+            run_name=f"train-{args.model_id.split('/')[-1]}",
+            report_to="wandb",
+        ),
+        peft_config=lora_config,
+        dataset_text_field=args.dataset_text_field,
+    )
+
+    # launch
+    print("Training...")
+    trainer.train()
+
+    print("Saving the last checkpoint of the model")
+    model.save_pretrained(os.path.join(args.output_dir, "final_checkpoint/"))
+    if args.push_to_hub:
+        trainer.push_to_hub("Upload model")
+    print("Training Done! 💥")
+
+
+if __name__ == "__main__":
+    args = get_args()
+    set_seed(args.seed)
+    os.makedirs(args.output_dir, exist_ok=True)
+
+    logging.set_verbosity_error()
+
+    main(args)
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,8 @@
+git+https://github.com/huggingface/transformers.git
+accelerate==0.27.1
+datasets>=2.16.1
+bitsandbytes==0.41.3
+peft==0.8.2
+trl==0.7.10
+wandb==0.16.3
+huggingface_hub==0.20.3