Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages

This repository contains the official code for the ACL 2024 paper: Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages.

Requirements

torch
transformers
fire

To install the required packages, run the following commands:

CUDA=cu118 # change to your CUDA version
pip install torch --index-url https://download.pytorch.org/whl/$CUDA

# If you do not need to use `chat.py`, you can install the no-cuda torch version.
pip install torch

pip install transformers fire

Usage

Extracting the Chat Vector

To extract the chat vector, use the following command:

BASE_MODEL_PATH=meta-llama/Meta-Llama-3-8B
CHAT_MODEL_PATH=meta-llama/Meta-Llama-3-8B-Instruct
CHAT_VECTOR_PATH=ckpt_tv/llama3-8b-instruct

python extract_chat_vector.py $BASE_MODEL_PATH $CHAT_MODEL_PATH $CHAT_VECTOR_PATH

Adding the Chat Vector

To add the chat vector to the model, use the following command:

CP_MODEL_PATH=ckpt/llama3-8b_cp
OUTPUT_PATH=ckpt/llama3-8b-cp_cv-llama3

python add_chat_vector.py $CP_MODEL_PATH "['$CHAT_VECTOR_PATH']" $OUTPUT_PATH \
--ratio "[1]"  # chat vector ratio

If you encounter issues with outputting the target language, please lower the ratio setting.

Skip Embedding

In cases where you need to continue pretraining with extended word embeddings, you can use the --skip_embed option to avoid adding the embedding and lm_head layer:

CP_MODEL_PATH=ckpt/llama3-8b_cp
OUTPUT_PATH=ckpt/llama3-8b-cp_cv-llama3

python add_chat_vector.py $CP_MODEL_PATH "['$CHAT_VECTOR_PATH']" $OUTPUT_PATH --skip_embed True

If certain special tokens in the chat template (such as <|eot_id|>) are not trained during continual pretraining, you should set special_tokens_map to replace the CP model's special tokens embedding with the chat model's tokens. For example, with llama3:

python add_chat_vector.py $CP_MODEL_PATH "['$CHAT_VECTOR_PATH']" $OUTPUT_PATH \
--ratio "[1]" \  # chat vector ratio
--skip_embed True \
--special_tokens_map "{128006:128006,128007:128007,128009:128009}"  # {'CP_MODEL_TOKEN_ID':'CHAT_MODEL_TOKEN_ID'}

If the model does not generate text properly, consider fine-tuning the model with the added chat vector.

Merging Multiple Chat Vectors

To merge multiple chat vectors, use the following command:

OUTPUT_PATH=ckpt/llama3-8b-cp_cv-llama3-openhermess
CV1_PATH=ckpt_tv/llama3-8b-instruct
CV2_PATH=ckpt_tv/llama3-8b-openhermess

python add_chat_vector.py $CP_MODEL_PATH "['$CV1_PATH','$CV2_PATH']" $OUTPUT_PATH \
--ratio "[0.5,0.5]"
# Enable `--skip_embed` and `--special_tokens_map` if needed

set chat_template to $CV2_PATH's chat template.

Chat Script

python chat.py \ 
$OUTPUT_PATH \  # model path
# --sys_prompt "你是一個樂於助人的助理。" \  # system prompt
# --<other generation config>

Citation

If you find this paper helpful, please use the following citation:

@misc{huang2024chat,
      title={Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages}, 
      author={Shih-Cheng Huang* and Pin-Zu Li* and Yu-Chi Hsu and Kuang-Ming Chen and Yu Tung Lin and Shih-Kai Hsiao and Richard Tzong-Han Tsai and Hung-yi Lee*},
      year={2024},
      eprint={2310.04799},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgement

We appreciate the support and resources provided by the TAIDE project.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
add_chat_vector.py		add_chat_vector.py
chat.py		chat.py
extract_chat_vector.py		extract_chat_vector.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages

Requirements

Usage

Extracting the Chat Vector

Adding the Chat Vector

Skip Embedding

Merging Multiple Chat Vectors

Chat Script

Citation

Acknowledgement

About

Releases

Packages

Languages

aqweteddy/ChatVector

Folders and files

Latest commit

History

Repository files navigation

Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages

Requirements

Usage

Extracting the Chat Vector

Adding the Chat Vector

Skip Embedding

Merging Multiple Chat Vectors

Chat Script

Citation

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages