Skip to content

Commit

Permalink
Merge branch 'main' into doc-edit
Browse files Browse the repository at this point in the history
  • Loading branch information
hendrydong committed Aug 7, 2023
2 parents 2679fb9 + 4e385b5 commit c0ab0e6
Show file tree
Hide file tree
Showing 23 changed files with 232 additions and 50 deletions.
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
[![Doc](https://img.shields.io/badge/Website-Doc-ff69b4.svg)](https://optimalscale.github.io/LMFlow/)
[![Embark](https://img.shields.io/badge/Discord-LMFlow-%237289da.svg?logo=discord)](https://discord.gg/u9VJNpzhvA)
[![slack badge](https://img.shields.io/badge/Slack-Join-blueviolet?logo=slack&amp)](https://join.slack.com/t/lmflow/shared_invite/zt-1wju9nicy-woXbNtS~5MavHSAtiMxmxQ)
[![WeChat badge](https://img.shields.io/badge/WeChat-Join-brightgreen?logo=wechat&amp)](https://i.imgloc.com/2023/07/13/VgJyaZ.jpeg)
[![WeChat badge](https://img.shields.io/badge/WeChat-Join-brightgreen?logo=wechat&amp)](https://s1.ax1x.com/2023/08/06/pPAQTPI.jpg)

An extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community.

Expand All @@ -33,6 +33,7 @@ Large Model for All.


## Latest News
* [2023-08-07] Support [Flash Attention-2](https://crfm.stanford.edu/2023/07/17/flash2.html). Check out [flash_attention](https://github.com/OptimalScale/LMFlow/blob/main/readme/flash_attn2.md) for more details.
* [2023-08-02] Support [Llama2](https://ai.meta.com/llama/), [ChatGLM2](https://huggingface.co/THUDM/chatglm2-6b), and [Baichuan](https://huggingface.co/baichuan-inc/Baichuan-7B) models.
* [2023-07-23] :rocket: [LMFlow multimodal chatbot](https://github.com/OptimalScale/LMFlow/blob/main/scripts/run_vis_chatbot_gradio_minigpt4.sh) is now available! Support multimodal inputs of images and texts. [Online Demo](http://multimodal.lmflow.online) is also provided (We hold the service on a single GPU, hence one may experience "queuing" or "application busy" sometimes when multiple users are accessing at the same time, please wait and attempt again later when such event happens) :rocket: ![image](https://github.com/OptimalScale/LMFlow/blob/rpan-vision-encoder/assets/multimodal-chatbot-demo.gif)
* [2023-06-22] [LMFlow paper](https://arxiv.org/abs/2306.12420) is out! Check out our implementation details at https://arxiv.org/abs/2306.12420
Expand Down Expand Up @@ -213,7 +214,7 @@ cd LMFlow
conda create -n lmflow python=3.9 -y
conda activate lmflow
conda install mpi4py
pip install -e .
./install.sh
```

## 2.Prepare Dataset
Expand Down Expand Up @@ -336,6 +337,16 @@ You can config the deepspeed under configs. Details can be referred at [DeepSpee

Thanks to the great efforts of [llama.cpp](https://github.com/ggerganov/llama.cpp). It is possible for everyone to run their LLaMA models on CPU by 4-bit quantization. We provide a script to convert LLaMA LoRA weights to `.pt` files. You only need to use `convert-pth-to-ggml.py` in llama.cpp to perform quantization.

### 4.4 Vocabulary List Extension

Now you can train your own sentencepiece tokenizer and merge it with model's origin hf tokenizer. Check out [vocab_extension](https://github.com/OptimalScale/LMFlow/blob/main/scripts/vocab_extension) for more details.

### 4.5 Position Interpolation for LLaMA Models
Now LMFlow supports the latest Linear & NTK (Neural Kernel theory) scaling techniques for LLaMA models. Check out [postion_interpolation](
https://github.com/OptimalScale/LMFlow/blob/main/readme/Position_Interpolation.md) for more details.

### 4.6 FlashAttention-2
Now LMFlow supports the latest [FlashAttention-2](https://crfm.stanford.edu/2023/07/17/flash2.html). Check out [flash_attention](https://github.com/OptimalScale/LMFlow/blob/main/readme/flash_attn2.md) for more details.

## 5. Model Release

Expand Down Expand Up @@ -385,7 +396,6 @@ Then you can check the model performance at our [Doc](https://optimalscale.githu
Please refer to our [Documentation](https://optimalscale.github.io/LMFlow/) for more API reference and experimental results.



## Acknowledgement
LMFlow draws inspiration from various studies, including but not limited to:
- Alpaca: https://github.com/tatsu-lab/stanford_alpaca
Expand Down
8 changes: 8 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

pip install -e .

gpu_state="$(nvidia-smi --query-gpu=name --format=csv,noheader)"
if [[ "${gpu_state}" == *"A100"* || "${gpu_state}" == *"A40"* ]]; then
pip install flash-attn==2.0.2
fi
40 changes: 40 additions & 0 deletions readme/Position_Interpolation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Position Interpolation
Now LMFlow supports the latest Linear & NTK (Neural Kernel theory) scaling techniques for LLaMA models. \
For more details of these techniques, you can checkout the links below:
* Linear scaling: \
https://arxiv.org/abs/2306.15595
* NTK scaling: \
https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/
## Usage
To use the Position Interpolation Techniques, you need to set the following options:
```
--truncate_to_model_max_length False
--do_rope_scaling True
```
For linear scaling, set the extending ratio by:
```
--rope_pi_ratio 4
```
For NTK scaling, set the extending ratio by:
```
--rope_ntk_ratio 4
```
Here is an example of evaluation bash code:
```
#!/bin/bash
CUDA_VISIBLE_DEVICES=0 \
deepspeed examples/evaluation.py \
--answer_type text \
--model_name_or_path pinkmanlove/llama-7b-hf \
--dataset_path data/wiki_en_eval \
--deepspeed examples/ds_config.json \
--inference_batch_size_per_device 1 \
--truncate_to_model_max_length False \
--block_size 4096 \
--use_flash_attention True \
--do_rope_scaling True \
--rope_pi_ratio 2 \
--rope_ntk_ratio 4 \
--metric ppl
```
18 changes: 18 additions & 0 deletions readme/flash_attn2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Flash Attention 2.0
We're thrilled to announce that LMFlow now supports training and inference using **FlashAttention-2**! This cutting-edge feature will take your language modeling to the next level. To use it, simply add ``` --use_flash_attention True ``` to the corresponding bash script.
Here is an example of how to use it:
```
#!/bin/bash
pip install flash_attn==2.0.2
deepspeed --master_port=11000 \
examples/chatbot.py \
--deepspeed configs/ds_config_chatbot.json \
--model_name_or_path LMFlow/Full-Robin-7b-v2 \
--max_new_tokens 1024 \
--prompt_structure "###Human: {input_text}###Assistant:" \
--end_string "#" \
--use_flash_attention True
```

Upgrade to LMFlow now and experience the future of language modeling!
4 changes: 4 additions & 0 deletions scripts/run_evaluation.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
#!/bin/bash

if [ ! -d data/MedQA-USMLE ]; then
cd data && ./download.sh MedQA-USMLE && cd -
fi

CUDA_VISIBLE_DEVICES=0 \
deepspeed examples/evaluation.py \
--answer_type medmcqa \
Expand Down
4 changes: 4 additions & 0 deletions scripts/run_evaluation_accelerator.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
#!/bin/bash

if [ ! -d data/MedQA-USMLE ]; then
cd data && ./download.sh MedQA-USMLE && cd -
fi

CUDA_VISIBLE_DEVICES=0 accelerate launch --config_file configs/accelerator_singlegpu_config.yaml examples/evaluation.py \
--answer_type usmle \
--model_name_or_path gpt2-large \
Expand Down
5 changes: 5 additions & 0 deletions scripts/run_evaluation_with_lora.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@
# --model_name_or_path specifies the original huggingface model
# --lora_model_path specifies the model difference introduced by finetuning,
# i.e. the one saved by ./scripts/run_finetune_with_lora.sh

if [ ! -d data/alpaca ]; then
cd data && ./download.sh alpaca && cd -
fi

CUDA_VISIBLE_DEVICES=0 \
deepspeed examples/evaluation.py \
--answer_type text \
Expand Down
5 changes: 4 additions & 1 deletion scripts/run_finetune.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ output_dir=${project_dir}/output_models/${exp_id}
log_dir=${project_dir}/log/${exp_id}

dataset_path=${project_dir}/data/alpaca/train
if [ ! -d ${dataset_path} ]; then
cd data && ./download.sh alpaca && cd -
fi

mkdir -p ${output_dir} ${log_dir}

Expand All @@ -27,7 +30,7 @@ deepspeed ${deepspeed_args} \
--block_size 512 \
--per_device_train_batch_size 1 \
--deepspeed configs/ds_config_zero3.json \
--bf16 \
--fp16 \
--run_name finetune \
--validation_split_percentage 0 \
--logging_steps 20 \
Expand Down
5 changes: 4 additions & 1 deletion scripts/run_finetune_with_lora.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ output_dir=${project_dir}/output_models/${exp_id}
log_dir=${project_dir}/log/${exp_id}

dataset_path=${project_dir}/data/alpaca/train
if [ ! -d ${dataset_path} ]; then
cd data && ./download.sh alpaca && cd -
fi

mkdir -p ${output_dir} ${log_dir}

Expand All @@ -28,7 +31,7 @@ deepspeed ${deepspeed_args} \
--lora_r 8 \
--save_aggregated_lora 0\
--deepspeed configs/ds_config_zero2.json \
--bf16 \
--fp16 \
--run_name finetune_with_lora \
--validation_split_percentage 0 \
--logging_steps 20 \
Expand Down
5 changes: 4 additions & 1 deletion scripts/run_finetune_with_lora_save_aggregated_weights.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ log_dir=${project_dir}/log/${exp_id}

dataset_path=${project_dir}/data/alpaca/train
eval_dataset_path=${project_dir}/data/alpaca/test
if [ ! -d ${dataset_path} ]; then
cd data && ./download.sh alpaca && cd -
fi

mkdir -p ${output_dir} ${log_dir}

Expand All @@ -29,7 +32,7 @@ deepspeed ${deepspeed_args} \
--lora_r 8 \
--save_aggregated_lora 1\
--deepspeed configs/ds_config_zero2.json \
--bf16 \
--fp16 \
--run_name finetune_with_lora \
--validation_split_percentage 0 \
--logging_steps 20 \
Expand Down
3 changes: 3 additions & 0 deletions scripts/run_multistage_finetune.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ project_dir=$(cd "$(dirname $0)"/..; pwd)
output_dir=${project_dir}/output_models/${exp_id}
log_dir=${project_dir}/log/${exp_id}
dataset_path="${project_dir}/data/example_dataset/train"
if [ ! -d ${dataset_path} ]; then
cd data && ./download.sh example_dataset && cd -
fi

mkdir -p ${output_dir} ${log_dir}

Expand Down
4 changes: 4 additions & 0 deletions scripts/run_raft_align.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ project_dir=$(cd "$(dirname $0)"/..; pwd)
output_dir=${project_dir}/output_models/${exp_id}
log_dir=${project_dir}/log/${exp_id}

if [ ! -d data/hh_rlhf ]; then
cd data && ./download.sh hh_rlhf && cd -
fi

mkdir -p ${output_dir} ${log_dir}

export PYTHONPATH=.
Expand Down
3 changes: 3 additions & 0 deletions scripts/run_reward_modeling.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ output_dir=${project_dir}/output_models/${exp_id}
log_dir=${project_dir}/log/${exp_id}

dataset_path=${project_dir}/data/hh_rlhf/rm/hh_rlhf_rm_training.json
if [ ! -d data/hh_rlhf ]; then
cd data && ./download.sh hh_rlhf && cd -
fi

mkdir -p ${output_dir} ${log_dir}

Expand Down
23 changes: 23 additions & 0 deletions scripts/vocab_extension/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Vocab Extension
## Train & Merge Tokenizer
To automatically convert data, train a SentencePiece tokenizer, and merge the tokenizer, you can run the following script:
```
bash scripts/vocab_extension/train_merge_tokenizer.sh
```
Alternatively, you can run each of the three steps separately:

## Convert JSON Data to TXT
To convert JSON data to TXT for sentencepiece tokenizer training, run:
```
bash scripts/vocab_extension/convert_json_to_txt.sh
```
## Train SentencePiece Tokenizer
To train a SentencePiece tokenizer, run:
```
bash scripts/vocab_extension/train_tokenizer.sh
```
## Merge New Tokenizer with the Origin One
To merge a new tokenizer with the original one, run:
```
bash scripts/vocab_extension/merge_tokenizer.sh
```
2 changes: 1 addition & 1 deletion scripts/vocab_extension/merge_tokenizer.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
mkdir -p ./output_models/new_tokenizer
python utils/merge_tokenizer.py --tokenizer_dir pinkmanlove/llama-7b-hf \
python utils/merge_tokenizer.py --tokenizer_dir openlm-research/open_llama_3b \
--chinese_sp_model_file ./output_models/new_tokenizer/example.model \
--output_dir ./output_models/merged_tokenizer \
5 changes: 3 additions & 2 deletions scripts/vocab_extension/train_merge_tokenizer.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,11 @@ python utils/train_tokenizer.py --dataset_path ./data/wiki_zh_eval/converted_dat
--model_type bpe \
--output_dir ./output_models/new_tokenizer \
--user_defined_symbols 0,1,2,3,4,5,6,7,8,9,% \
--vocab_size 20000
--vocab_size 20000 \
--max_sentencepiece_length 4

# merge the new tokenizer with the old one
mkdir -p ./output_models/merged_tokenizer
python utils/merge_tokenizer.py --chinese_sp_model_file ./output_models/new_tokenizer/example.model \
--tokenizer_dir pinkmanlove/llama-7b-hf \
--tokenizer_dir openlm-research/open_llama_3b \
--output_dir ./output_models/merged_tokenizer
3 changes: 2 additions & 1 deletion scripts/vocab_extension/train_tokenizer.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ python utils/train_tokenizer.py --dataset_path ./data/wiki_zh_eval/converted_dat
--model_type bpe \
--output_dir ./output_models/new_tokenizer \
--user_defined_symbols 0,1,2,3,4,5,6,7,8,9,% \
--vocab_size 20000
--vocab_size 20000 \
--max_sentencepiece_length 4
51 changes: 34 additions & 17 deletions src/lmflow/models/hf_decoder_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,7 @@
"A100": ["LlamaForCausalLM", "GPTNeoForCausalLM", "GPT2ForCausalLM", "BloomForCausalLM"],
"A40": ["LlamaForCausalLM","GPTNeoForCausalLM", "GPT2ForCausalLM", "BloomForCausalLM"]
}
if int(flash_attn.__version__.split(".")[0]) == 1:
GPU_SUPPORT_FLASH_ATTENTION = {
"A100": ["LlamaForCausalLM", "GPTNeoForCausalLM", "GPT2ForCausalLM", "BloomForCausalLM"],
"A40": ["GPTNeoForCausalLM", "GPT2ForCausalLM", "BloomForCausalLM"]
}
except ImportError:
except:
pass

class HFDecoderModel(DecoderModel, Tunable):
Expand Down Expand Up @@ -140,18 +135,40 @@ def __init__(
"revision": model_args.model_revision,
"use_auth_token": True if model_args.use_auth_token else None,
}
if model_args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name, **tokenizer_kwargs)
elif model_args.model_name_or_path:
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)
else:
raise ValueError(
"You are instantiating a new tokenizer from scratch. This is"
" not supported by this script. You can do it from another"
" script, save it, and load it from here, using"
" --tokenizer_name."
)

try:
if model_args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name, **tokenizer_kwargs)
elif model_args.model_name_or_path:
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)
else:
raise ValueError(
"You are instantiating a new tokenizer from scratch. This is"
" not supported by this script. You can do it from another"
" script, save it, and load it from here, using"
" --tokenizer_name."
)

except RecursionError:
logger.warning("The tokenizer_config.json file doesn't set the special tokens. Using default values: <unk>, <s>, </s> for unknown token, bos token and eos token respectively.")
if model_args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name, unk_token="<unk>",
bos_token="<s>",
eos_token="</s>",
**tokenizer_kwargs)
elif model_args.model_name_or_path:
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, unk_token="<unk>",
bos_token="<s>",
eos_token="</s>",
**tokenizer_kwargs)
else:
raise ValueError(
"You are instantiating a new tokenizer from scratch. This is"
" not supported by this script. You can do it from another"
" script, save it, and load it from here, using"
" --tokenizer_name."
)

self.tokenizer = tokenizer

torch_dtype = (
Expand Down
8 changes: 4 additions & 4 deletions src/lmflow/utils/flash_attention/gpt2_flash_attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@

from einops import rearrange

import flash_attn
if int(flash_attn.__version__.split(".")[0]) == 1:
from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func
if int(flash_attn.__version__.split(".")[0]) == 2:
#try to import flash_attn 2.x.x, if not, import flash_attn 1.x.x
try:
from flash_attn.flash_attn_interface import flash_attn_varlen_qkvpacked_func as flash_attn_unpadded_qkvpacked_func
except:
from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func

from flash_attn.bert_padding import unpad_input, pad_input

Expand Down
8 changes: 4 additions & 4 deletions src/lmflow/utils/flash_attention/gpt_neo_flash_attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
import transformers
from einops import rearrange

import flash_attn
if int(flash_attn.__version__.split(".")[0]) == 1:
from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func
if int(flash_attn.__version__.split(".")[0]) == 2:
#try to import flash_attn 2.x.x, if not, import flash_attn 1.x.x
try:
from flash_attn.flash_attn_interface import flash_attn_varlen_qkvpacked_func as flash_attn_unpadded_qkvpacked_func
except:
from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func

from flash_attn.bert_padding import unpad_input, pad_input

Expand Down
Loading

0 comments on commit c0ab0e6

Please sign in to comment.