NLP4MatSci-HoneyBee

This repository contains the dataset and code for our EMNLP'23 publication: "HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science".

Single GPU

for LLaMA (You need to first unzip the files in peft.zip and place them under the ./peft/ path)

python uniform_finetune.py --model_type llama --model_name_or_path yahma/llama-7b-hf \
    --data ./data/formatted_cot_data/train_instructions_from_chatgpt.json --lora_target_modules q_proj v_proj \
    --per_gpu_train_batch_size 4 --learning_rate 1e-4 --epochs 10

Multiple GPUs

for LLaMA (You need to first unzip the files in peft.zip and place them under the ./peft/ path)

python -m torch.distributed.launch --nproc_per_node 4  \
    --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy uniform_finetune.py \
    --model_type llama --model_name_or_path yahma/llama-13b-hf \
    --data ./data/formatted_cot_data/train_instructions_from_chatgpt.json --lora_target_modules q_proj v_proj \
    --per_gpu_train_batch_size 4 --learning_rate 1e-4 --epochs 10

Inference (for debug)

python generate.py  --data ./data/formatted_cot_data/train_instructions_from_chatgpt.json --model_type llama

Inference (for batch prediction)

python predict.py --model_type llama --size 7b --data ./data/formatted_cot_data/test_xxx.json --predict_batch_size 4 --cutoff_len 2048 --lora_dir ./saved_models/llama-7b-hf/lora

Instructions Data

You can find our instructions-based data for HonyeBee training and test via Zenodo.

QA

If you have any questions about this code, feel free to email yu.song@umontreal.ca. I will response as soon as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
peft		peft
utils		utils
README.md		README.md
evaluate_matsci.py		evaluate_matsci.py
generate.py		generate.py
predict.py		predict.py
requirements.txt		requirements.txt
res_dict.json		res_dict.json
uniform_finetune.py		uniform_finetune.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP4MatSci-HoneyBee

Inference (for debug)

Inference (for batch prediction)

Instructions Data

QA

About

Releases

Packages

Contributors 2

Languages

BangLab-UdeM-Mila/NLP4MatSci-HoneyBee

Folders and files

Latest commit

History

Repository files navigation

NLP4MatSci-HoneyBee

Inference (for debug)

Inference (for batch prediction)

Instructions Data

QA

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages