This repository is an official PyTorch implementation of the paper LANTERN: Accelerating Visual Autoregressive Models via Relaxed Speculative Decoding (ICLR 2025) and LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models (ICLRW - SCOPE(Oral) 2025), which supports various functionalities related to LANTERN, including model inference, drafter model training, drafter model training data generation and image decoding for image generation.
- [2025-03-05] 🎉🎉🎉 LANTERN is released! 🎉🎉🎉
The main directory structure of the project is as follows:
.
├── models/ # Model and related modules
│ ├── base_models/ # Base model modules
│ │ ├── lumina_mgpt
│ │ │ ├── modeling_lumina_mgpt.py
│ │ │ └── other files...
│ │ └── other models...
│ ├── kv_variants/ # Key-Value variant models
│ │ ├── modeling_lumina_mgpt_kv.py
| | └── modeling_anole_kv.py
│ │ └── other models...
│ ├── drafters/ # Drafter model modules
│ │ ├── kv_cache.py
│ │ ├── choices
│ │ ├── cnets_lumina_mgpt.py
| | ├── cnets_anole.py
│ │ ├── cnets_{other_models}.py ...
│ │ └── utils.py
│ ├── configs/ # Configuration modules
│ │ ├── configs.py
│ │ ├── configuration_lumina_mgpt.py
| | ├── configuration_anole.py
│ │ └── configuration_{other_models}.py...
│ ├── ea_model_lumina_mgpt.py # EAGLE models
| ├── ea_model_anole.py
│ └── ea_model_{other_models}.py...
├── data/
│ ├── configs/
│ │ ├── lumina_mgpt_config.json # Configuration for model init
| | ├── anole_config.json
│ │ └── configs for other models...
│ ├── prompts/ # Prompts for image generation
│ ├── self_distilled_data/ # Self-distilled data for drafter training
│ └── drafter_train_data/ # Train data for drafter
├── ckpts/ # Model checkpoints folder
│ ├── lumina_mgpt/
│ │ ├── chameleon/
│ │ ├── Lumina-mGPT-7B-768/ # Model and tokenizer files
│ │ ├── trained_drafters/ # Trained drafter models
| | | └──...state_20/
| | | ├── config.json # config.json for drafter model
| | | └── other files...
│ │ └── vq_distances/ # Pre-computed VQ distances for LANTERN
│ └── other models...
├── entrypoints/ # Execution entry points
│ ├── train_drafter/
│ │ ├── data_utils.py
│ │ └── main.py
│ ├── generate_codebook.py
│ ├── generate_images.py
│ ├── generate_train_data.py
│ └── other files...
├── third_party/ # Third-party libraries
│ └── vllm
├── main.py # Main execution script
├── requirements.txt # Project dependencies
├── environment.yaml
├── .gitignore
└── README.md
Here is a brief description for each directory.
-
models/
- Contains model implementations and related modules.base_models/
- Base model implementations (e.g., Lumina-mGPT, LlamaGen, Anole).kv_variants/
- Modified base models with Key-Value cache adaptations for enhanced compatibility with EAGLE’s architecture.drafters/
- Modules and auxiliary code for drafter models.configs/
- Configuration modules for each model (e.g.,ChameleonConfig
for Lumina-mGPT).
-
data/
- Stores configuration files, text prompts, self-distilled data, and drafter training data. -
ckpts/
- Checkpoints for all models, including trained drafters and VQ distances for relaxed speculative decoding. -
entrypoints/
- Primary scripts for tasks such as image generation, codebook generation, and drafter training. -
third_party/
- Custom external libraries, including modifications for specific functionality.
-
Install Required Packages Requirements
- Python >= 3.10
- PyTorch >= 2.4.0
Install the dependencies listed in
requirements.txt
.git clone https://github.com/jadohu/LANTERN cd LANTERN pip install -r requirements.txt
-
Additional Setup
- Lumina-mGPT
For Lumina-mGPT, we need to install
flash_attention
andxllmx
packages.pip install flash-attn --no-build-isolation cd models/base_models/lumina_mgpt pip install -e .
- (Optional) vLLM
Install and set up
vLLM
with the required modifications. Note that we usevLLM==0.6.3
and build from source. The required modifications are specifed inthird_party/vllm
. The installation procedure is as follows.pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/fd47e57f4b0d5f7920903490bce13bc9e49d8dba/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl git clone https://github.com/vllm-project/vllm cd vllm git checkout tags/v0.6.3 cd .. mv -rf third_party/vllm/* vllm/ cd vllm python python_only_dev.py
- (Optional) vLLM
Install and set up
- Lumina-mGPT
For Lumina-mGPT, we need to install
-
Checkpoints All model weights and other required data should be stored in
ckpts/
.-
Lumina-mGPT For Lumina-mGPT, since currently the Chameleon implementation in transformers does not contain the VQ-VAE decoder, please manually download the original VQ-VAE weights provided by Meta and put them to the following directory:
ckpts └── lumina_mgpt └── chameleon └── tokenizer ├── text_tokenizer.json ├── vqgan.yaml └── vqgan.ckpt
Also download the original model
Lumina-mGPT-7B-768
from Huggingface 🤗 and put them to the following directory:ckpts └── lumina_mgpt └── Lumina-mGPT-7B-768 ├── config.json ├── generation_config.json ├── model-00001-of-00002.safetensors └── other files...
-
LlamaGen For LlamaGen T2I model, download
LlamaGen-T2I
and/orLlamaGen-T2I-2
, which is a huggingface style converted model fromLlamaGen
.In addition, you should download
VQ-VAE
andflan-t5-xl
.ckpts └── llamagen ├── LlamaGen-T2I │ ├── config.json │ ├── generation_config.json │ ├── model.safetensors │ └── other files... ├── LlamaGen-T2I-2 │ ├── config.json │ ├── generation_config.json │ ├── model.safetensors │ └── other files... ├── vq_ds16_t2i.pt └── t5 └── flan-t5-xl ├── config.json ├── generation_config.json ├── model-00001-of-00002.safetensors └── other files...
(Optional) Trained drafter To use trained drafter, you need to download
llamagen_drafter
and/orllamagen2_drafter
and save it under trained_drafters directory.ckpts └── llamagen └── trained_drafters ├── llamagen_drafter | ├── config.json | ├── generation_config.json | ├── pytorch_model.bin | └── other files... └── llamagen2_drafter ├── config.json ├── generation_config.json ├── pytorch_model.bin └── other files...
-
Anole For Anole, download
Anole-7b-v0.1-hf
, which is a huggingface style converted model fromAnole
.In addition, you should download the original VQ-VAE weights provided by Meta and put them to the following directory:
ckpts └── anole ├── Anole-7b-v0.1-hf | ├── config.json | ├── generation_config.json | ├── model-00001-of-00003.safetensors | └── other files... └── chameleon └── tokenizer ├── text_tokenizer.json ├── vqgan.yaml └── vqgan.ckpt
(Optional) Trained drafter To use trained drafter, you need to download
anole_drafter
and save it under trained_drafters directory.ckpts └── anole └── trained_drafters └── anole_drafter ├── config.json ├── generation_config.json ├── pytorch_model.bin └── other files...
-
All the functionalities can be done by either running main.py
or directly running entrypoints/{function}.py
.
Currently, "llamagen" (LlamaGen-Stage I), "llamagen2" (LlamaGen-Stage II), "anole", and "lumina_mgpt" are supported as --model.
🚧 Lumina-mGPT is still under construction, so some functions may not work properly yet. You can follow the procedures here, but you may encounter a few exceptions.
-
Generate Images
python main.py generate_images --model <model_name> --model_type <model_type; e.g., base, vllm, eagle> --model_path <model_path> --drafter_path <drafter_path> --output_dir <output_dir> ...
or
python -m entrypoints.generate_images --model <model_name> --model_type <model_type; e.g., base, vllm, eagle> --model_path <model_path> --drafter_path <drafter_path> --output_dir <output_dir> ...
💡How to use LANTERN and LANTERN++ for image generation
- For LANTERN, set
--model_type eagle
, turn on--lantern
option and set--lantern_k
and--lantern_delta
options. - For LANTERN++, use
--static_tree
option and use--lantern_delta
to set$\lambda$ value.
- For LANTERN, set
-
Generate Training Data for Drafter
python main.py generate_train_data --model <model_name> --data_path <path_to_image_tokens> --output_dir <output_dir> --num_samples <num_samples>
or
python -m entrypoints.generate_train_data --model <model_name> --data_path <path_to_image_tokens> --output_dir <output_dir> --num_samples <num_samples>
For LlamaGen and Anole, you have to extract code and T5 embedding(only for LlamaGen) for training data.
- Locate image and caption files in given format and execute following command before run generate_train_data:
Data Format:
- image_folder
- {file_1}.jpg
- {file_1}.txt
- {file_2}.jpg
- {file_2}.txt ...
python main.py extract_code --model <model_type> --data_path <path_to_image_and_caption> --output_dir <output_dir> --num_samples <num_samples>
or
python -m entrypoints.extract_code --model <model_type> --data_path <path_to_image_and_caption> --output_dir <output_dir> --num_samples <num_samples>
-
Train Drafter Model
python main.py train_drafter --model <model_type> --base_path <base_model_path> --config_path <path_to_config.json> --data_dir <data_dir> --save_dir <save_dir> --lr <lr> --bs <bs> --gradient_accumlation_steps <gradient_accumulation_steps> ...
or
python -m entrypoints.train_drafter.main --model <model_type> --base_path <base_model_path> --config_path <path_to_config.json> --data_dir <data_dir> --save_dir <save_dir> --lr <lr> --bs <bs> --gradient_accumlation_steps <gradient_accumulation_steps> ...
For multi GPU training with accelerate, you can use
accelerate launch -m entrypoints.train_drafter.main --model <model_type> --base_path <base_model_path> --config_path <path_to_config.json> --data_dir <data_dir> --save_dir <save_dir> --lr <lr> --bs <bs> --gradient_accumlation_steps <gradient_accumulation_steps> ...
-
Generate VQ Distances
python main.py generate_codebook --model <model_name> --save_path <save_path>
or
python -m entrypoints.generate_codebook --model <model_name> --save_path <save_path>
-
Evaluate Generated Images We support FID, CLIP score, Precision/Recall and HPSv2 for image evaluation.
python main.py eval_fid_clip --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> --caption_path <path_to_prompt> --how_many <number_of_images_for_evaluation> ...
python main.py eval_prec_recall --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> ...
python main.py eval_hpsv2 --image_path <path_to_generated_image> --prompt_path <path_to_prompt>
or
python -m entrypoints.eval_fid_clip --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> --caption_path <path_to_prompt> --how_many <number_of_images_for_evaluation> ...
python -m entrypoints.eval_prec_recall --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> ...
python -m entrypoints.eval_hpsv2 --image_path <path_to_generated_image> --prompt_path <path_to_prompt>
config.json
should be in theckpts/{model_name}/trained_models/{drafter_path}
Since theModel
incnets_{model_name}.py
is initialized according to theconfig.json
in thedrafter_path
, you need to placeconfig.json
for drafter correctly. Note that theconfig.json
should be same as the base model'sconfig.json
other thannum_hidden_layers
.
This project is distributed under the Chameleon License by Meta Platforms, Inc. For more information, please see the LICENSE
file in the repository.
This repository is built with extensive reference to FoundationVision/LlamaGen, Alpha-VLLM/Lumina-mGPT and SafeAILab/EAGLE, leveraging many of their core components and approaches.
@article{jang2024lantern,
title={LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding},
author={Jang, Doohyuk and Park, Sihwan and Yang, June Yong and Jung, Yeonsung and Yun, Jihun and Kundu, Souvik and Kim, Sung-Yub and Yang, Eunho},
journal={arXiv preprint arXiv:2410.03355},
year={2024}
}
@article{park2025lanternenhancedrelaxedspeculative,
title={LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models},
author={Sihwan Park and Doohyuk Jang and Sungyub Kim and Souvik Kundu and Eunho Yang},
journal={arXiv preprint arXiv:2410.03355},
year={2025}
}