This repository provides an overview of all components from the paper Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models.
Data | CommitPackFT+OASST | Filtered version of CommitPack and OASST for high-quality commit messages that resemble instructions |
---|---|---|
Model | Astraios-1B | Collection of StarCoderBase-1B models instruction tuned on CommitPackFT + OASST with different tuning methods |
Astraios-3B | Collection of StarCoderBase-3B (3B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods | |
Astraios-7B | Collection of StarCoderBase-7B (7B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods | |
Astraios-16B | Collection of StarCoderBase-16B (16B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods | |
Evaluation | BigCloneBench | Dataset for clone detection; We use 2,000 samples for evaluation |
Devign | Dataset for defect detection; We use 2,000 samples for evaluation | |
HumanEvalPack | Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages | |
ReCode | Dataset for the robustness of code generation, covering 4 variants | |
Asleep At The Keyboard | Datasets for security of code generation; We use DoW for evaluation |
Setup: Run the bash code below to set up the PEFT methods used in our work. We additionally implement AdapterH
, AdapterP
and Parallel
methods based on the peft==0.6.0.dev0
. For more information, please refer to the peft folder.
pip install git+https://github.com/bigcode-project/astraios#subdirectory=peft
Notes:
- As
Prefix Tuning
does not work for StarCoder training, we do not evaluate this method. - For any configuration issues, please refer to the original PEFT.
- Setup: Run the bash code below to set up the evaluation repository.
git clone -b astraios https://github.com/bigcode-project/bigcode-evaluation-harness
cd bigcode-evaluation-harness
pip install -q -r requirements.txt
accelerate config
- Run: All evaluation scripts are in evaluation folder. Run each script via bash.
We use astraios-1b-lora
as an example and use the bash code to run the following tasks:
- Clone Detection
accelerate launch main.py \
--model bigcode/starcoderbase-1b \
--peft_model bigcode/astraios-1b-lora \
--tasks clone_detection \
--do_sample False \
--batch_size 1 \
--save_generations \
--trust_remote_code \
--save_generations_path generations_clone_detection_astraios-1b-lora.json \
--max_length_generation 512
- Defect Detection
accelerate launch main.py \
--model bigcode/starcoderbase-1b \
--peft_model bigcode/astraios-1b-lora \
--tasks clone_detection \
--do_sample False \
--batch_size 1 \
--save_generations \
--trust_remote_code \
--save_generations_path generations_defect_detection_astraios-1b-lora.json \
--max_length_generation 512
- HumanEvalSynthesize-Python
accelerate launch main.py \
--model bigcode/starcoderbase-1b \
--peft_model bigcode/astraios-1b-lora \
--tasks humanevalsynthesize-python \
--do_sample True \
--temperature 0.2 \
--n_samples 20 \
--batch_size 5 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_humanevalsynthesizepython_astraios-1b-lora.json \
--metric_output_path evaluation_humanevalsynthesizepython_astraios-1b-lora.json \
--max_length_generation 2048
- HumanEvalFix-Python
accelerate launch main.py \
--model bigcode/starcoderbase-1b \
--peft_model bigcode/astraios-1b-lora \
--tasks humanevalfixtests-python \
--do_sample True \
--temperature 0.2 \
--n_samples 20 \
--batch_size 1 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_humanevalfixpython_astraios-1b-lora.json \
--metric_output_path evaluation_humanevalfixpython_astraios-1b-lora.json \
--max_length_generation 2048
- HumanEvalExplain-Python
accelerate launch main.py \
--model bigcode/starcoderbase-1b \
--peft_model bigcode/astraios-1b-lora \
--tasks humanevalexplaindescribe-python \
--generation_only \
--do_sample True \
--temperature 0.2 \
--n_samples 20 \
--batch_size 5 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_humanevalexplaindescribe-python_astraios-1b-lora.json \
--max_length_generation 2048
accelerate launch main.py \
--model bigcode/starcoderbase-1b \
--peft_model bigcode/astraios-1b-lora \
--tasks humanevalexplainsynthesize-python \
--do_sample True \
--temperature 0.2 \
--n_samples 1 \
--batch_size 1 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--load_data_path generations_humanevalexplainsynthesize-python_astraios-1b-lora.json \
--save_generations_path generations_humanevalexplainsynthesize-python_astraios-1b-lora.json \
--metric_output_path evaluation_humanevalexplainpython_astraios-1b-lora.json \
--max_length_generation 2048
- ReCode-Format
accelerate launch main.py \
--model bigcode/starcoderbase-1b \
--peft_model bigcode/astraios-1b-lora \
--tasks perturbed-humaneval-format-num_seeds_5 \
--do_sample False \
--batch_size 1 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--n_samples 1 \
--batch_size 1 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_perturbed-humaneval-format-num_seeds_5_astraios-1b-lora.json \
--metric_output_path evaluation_perturbed-humaneval-format-num_seeds_5_astraios-1b-lora.json \
--max_length_generation 1024
- AATK-DoW
accelerate launch main.py \
--model bigcode/starcoderbase-1b \
--peft_model bigcode/astraios-1b-lora \
--tasks asleep_completion \
--do_sample True \
--temperature 0.2 \
--n_samples 20 \
--batch_size 1 \
--generation_only \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_asleep_completion_astraios-1b-lora.json \
--metric_output_path evaluation_asleep_completion_astraios-1b-lora.json \
--max_length_generation 1024
Note:
- When evaluating FFT models,
--peft_model
should be removed and FFT model names need to pass with--model
, e.g.:
accelerate launch main.py \
--model bigcode/astraios-1b-fft \
--tasks clone_detection \
--do_sample False \
--batch_size 1 \
--save_generations \
--trust_remote_code \
--save_generations_path generations_clone_detection_astraios-1b-fft.json \
--max_length_generation 512
- The evaluation notebook for Clone Detection and Defection is stored in evaluation/eval_code_comprehension.ipynb.
The finetuning python script is at finetune.py
. Coresponding PEFT configurations are stored in peft_config.py
.
To train all models with PEFT, run the bash code:
sh run_peft.sh
Note:
--gradient_accumulation_steps 32
is for a single GPU. If the model is trained with 8 GPUs,gradient_accumulation_steps
should be adjusted to4
.
To train all models with FFT, run the bash code:
sh run_fft.sh
Note:
--gradient_accumulation_steps 32
is for a single GPU. If the model is trained with 8 GPUs,gradient_accumulation_steps
should be adjusted to4
.
Outputs are under their corresponding subfolders in outputs folder.
Note:
- Some outputs may be missing as they were not saved initially.
Figures:
-
All figures are created via this colab notebook.
-
The figure of Astraios is generated via DALL-E with the prompt of
A fantasy-inspired, adorable illustration of Astraios, the Greek god of dusk. The setting is a serene evening landscape with a gradient sky transition
.
Everything is licensed as permissively as possible to us.
The evaluation repository Code Generation LM Evaluation Harness
is licensed under the Apache-2.0 license.
PEFT library is licensed under Apache-2.0 license.
All Astraios models are licensed under the same license as StarCoder (Commercial except for use cases deemed harmful).
The remaining code originally created in this repository is licensed under the MIT License.
- Organize the file names in
outputs/
folder. - Organize the bash script in
evaluation/
folder. - Merge the PR to
bigcode-evaluation-harness
.
@article{zhuo2024astraios,
title={Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models},
author={Terry Yue Zhuo and Armel Zebaze and Nitchakarn Suppattarachai and Leandro von Werra and Harm de Vries and Qian Liu and Niklas Muennighoff},
journal={https://arxiv.org/abs/2401.00788},
year={2024}
}