Skip to content

Commit

Permalink
Add MPT documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
haotian-liu committed May 6, 2023
1 parent 3815e34 commit b25c527
Show file tree
Hide file tree
Showing 2 changed files with 102 additions and 0 deletions.
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

## Release

- [5/6] 🔥 We are releasing [LLaVA-Lighting-MPT-7B-preview](https://huggingface.co/liuhaotian/LLaVA-Lightning-MPT-7B-preview), based on MPT-7B-Chat! See [here](#LLaVA-MPT-7b) for more details.
- [5/2] 🔥 We are releasing LLaVA-Lighting! Train a lite, multimodal GPT-4 with just $40 in 3 hours! See [here](#train-llava-lightning) for more details.
- [5/2] We upgrade LLaVA package to v0.1 to support Vicuna v0 and v1 checkpoints, please upgrade following instructions [here](#install).
- [4/30] Our checkpoint with Vicuna-7b-v0 has been released [here](#llava-7b)! This checkpoint is more accessible and device friendly. Stay tuned for a major upgrade next week!
Expand Down Expand Up @@ -549,6 +550,33 @@ bash ./scripts/train_lightning.sh {v0,v1}
| --- | ---: | ---: | ---: | ---: | ---: |
| LLaVA-Lightning-7B | 128 | 2e-5 | 1 | 2048 | 0 |

#### LLaVA-MPT-7b
Thanks to LLaVA-Lightning, we are able to train a checkpoint based on MPT-7b-Chat on 8x A100 GPUs in just 3 hours, including both pretraining and finetuning.

**NOTE**: This is a research preview of the LLaVA-Lightning based on MPT-7B-chat checkpoint. The usage of the model should comply with MPT-7B-chat license and agreements.

**NOTE**: Unlike other LLaVA models, this model should be used directly without delta weights conversion!

**NOTE**: You need to upgrade to our latest code base to use LLaVA-MPT-7b!

1. Usage

You do not need to download our checkpoint, it will directly load from our Hugging Face model: [`liuhaotian/LLaVA-Lightning-MPT-7B-preview`](https://huggingface.co/liuhaotian/LLaVA-Lightning-MPT-7B-preview).

```Shell
python -m llava.serve.controller --host 0.0.0.0 --port 10000
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path liuhaotian/LLaVA-Lightning-MPT-7B-preview
python -m llava.serve.gradio_web_server --controller http://localhost:10000
```

2. Training

We use the same set of training dataset, and the hyperparameters as other Lightning checkpoints.

```Shell
bash ./scripts/train_lightning_mpt.sh
```

## Acknowledgement

- [Vicuna](https://github.com/lm-sys/FastChat): the codebase we built upon, and our base model Vicuna-13B that has the amazing language capabilities!
Expand Down
74 changes: 74 additions & 0 deletions scripts/train_lightning_mpt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/bin/bash

# Pretraining (2 hours)
torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
llava/train/train_mem.py \
--model_name_or_path mosaicml/mpt-7b-chat \
--version v1 \
--data_path /path/to/blip_laion_cc_sbu_558k.json \
--image_folder /path/to/blip_laion_cc_sbu_558k \
--vision_tower openai/clip-vit-large-patch14 \
--tune_mm_mlp_adapter True \
--mm_vision_select_layer -2 \
--mm_use_im_start_end \
--bf16 True \
--output_dir ./checkpoints/llava-lightning-mpt-7b-pretrain \
--num_train_epochs 1 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2400 \
--save_total_limit 1 \
--learning_rate 2e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True \
--report_to wandb

# Extract projector features
python scripts/extract_mm_projector.py \
--model_name_or_path ./checkpoints/llava-lightning-mpt-7b-pretrain \
--output ./checkpoints/mm_projector/llava-lightning-mpt-7b-pretrain.bin

# Visual instruction tuning (1 hour)
torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
llava/train/train_mem.py \
--model_name_or_path mosaicml/mpt-7b-chat \
--version v1 \
--data_path /path/to/llava_instruct_80k.json \
--image_folder /Data/haotian/coco/train2014 \
--vision_tower openai/clip-vit-large-patch14 \
--pretrain_mm_mlp_adapter ./checkpoints/mm_projector/llava-lightning-mpt-7b-pretrain.bin \
--mm_vision_select_layer -2 \
--mm_use_im_start_end True \
--bf16 True \
--output_dir ./checkpoints \
--num_train_epochs 1 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 5000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True \
--report_to wandb

0 comments on commit b25c527

Please sign in to comment.