This repository is an official PyTorch implementation of the paper LANTERN: Accelerating Visual Autoregressive Models via Relaxed Speculative Decoding (ICLR 2025) and LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models (ICLRW - SCOPE(Oral) 2025), which supports various functionalities related to LANTERN, including model inference, drafter model training, drafter model training data generation and image decoding for image generation.
- [2025-03-05] πππ LANTERN is released! πππ
The main directory structure of the project is as follows:
.
βββ models/ # Model and related modules
β βββ base_models/ # Base model modules
β β βββ lumina_mgpt
β β β βββ modeling_lumina_mgpt.py
β β β βββ other files...
β β βββ other models...
β βββ kv_variants/ # Key-Value variant models
β β βββ modeling_lumina_mgpt_kv.py
| | βββ modeling_anole_kv.py
β β βββ other models...
β βββ drafters/ # Drafter model modules
β β βββ kv_cache.py
β β βββ choices
β β βββ cnets_lumina_mgpt.py
| | βββ cnets_anole.py
β β βββ cnets_{other_models}.py ...
β β βββ utils.py
β βββ configs/ # Configuration modules
β β βββ configs.py
β β βββ configuration_lumina_mgpt.py
| | βββ configuration_anole.py
β β βββ configuration_{other_models}.py...
β βββ ea_model_lumina_mgpt.py # EAGLE models
| βββ ea_model_anole.py
β βββ ea_model_{other_models}.py...
βββ data/
β βββ configs/
β β βββ lumina_mgpt_config.json # Configuration for model init
| | βββ anole_config.json
β β βββ configs for other models...
β βββ prompts/ # Prompts for image generation
β βββ self_distilled_data/ # Self-distilled data for drafter training
β βββ drafter_train_data/ # Train data for drafter
βββ ckpts/ # Model checkpoints folder
β βββ lumina_mgpt/
β β βββ chameleon/
β β βββ Lumina-mGPT-7B-768/ # Model and tokenizer files
β β βββ trained_drafters/ # Trained drafter models
| | | βββ...state_20/
| | | βββ config.json # config.json for drafter model
| | | βββ other files...
β β βββ vq_distances/ # Pre-computed VQ distances for LANTERN
β βββ other models...
βββ entrypoints/ # Execution entry points
β βββ train_drafter/
β β βββ data_utils.py
β β βββ main.py
β βββ generate_codebook.py
β βββ generate_images.py
β βββ generate_train_data.py
β βββ other files...
βββ third_party/ # Third-party libraries
β βββ vllm
βββ main.py # Main execution script
βββ requirements.txt # Project dependencies
βββ environment.yaml
βββ .gitignore
βββ README.md
Here is a brief description for each directory.
-
models/
- Contains model implementations and related modules.base_models/
- Base model implementations (e.g., Lumina-mGPT, LlamaGen, Anole).kv_variants/
- Modified base models with Key-Value cache adaptations for enhanced compatibility with EAGLEβs architecture.drafters/
- Modules and auxiliary code for drafter models.configs/
- Configuration modules for each model (e.g.,ChameleonConfig
for Lumina-mGPT).
-
data/
- Stores configuration files, text prompts, self-distilled data, and drafter training data. -
ckpts/
- Checkpoints for all models, including trained drafters and VQ distances for relaxed speculative decoding. -
entrypoints/
- Primary scripts for tasks such as image generation, codebook generation, and drafter training. -
third_party/
- Custom external libraries, including modifications for specific functionality.
-
Install Required Packages Requirements
- Python >= 3.10
- PyTorch >= 2.4.0
Install the dependencies listed in
requirements.txt
.git clone https://github.com/jadohu/LANTERN cd LANTERN pip install -r requirements.txt
-
Additional Setup
- Lumina-mGPT
For Lumina-mGPT, we need to install
flash_attention
andxllmx
packages.pip install flash-attn --no-build-isolation cd models/base_models/lumina_mgpt pip install -e .
- (Optional) vLLM
Install and set up
vLLM
with the required modifications. Note that we usevLLM==0.6.3
and build from source. The required modifications are specifed inthird_party/vllm
. The installation procedure is as follows.pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/fd47e57f4b0d5f7920903490bce13bc9e49d8dba/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl git clone https://github.com/vllm-project/vllm cd vllm git checkout tags/v0.6.3 cd .. mv -rf third_party/vllm/* vllm/ cd vllm python python_only_dev.py
- (Optional) vLLM
Install and set up
- Lumina-mGPT
For Lumina-mGPT, we need to install
-
Checkpoints All model weights and other required data should be stored in
ckpts/
.-
Lumina-mGPT For Lumina-mGPT, since currently the Chameleon implementation in transformers does not contain the VQ-VAE decoder, please manually download the original VQ-VAE weights provided by Meta and put them to the following directory:
ckpts βββ lumina_mgpt βββ chameleon βββ tokenizer βββ text_tokenizer.json βββ vqgan.yaml βββ vqgan.ckpt
Also download the original model
Lumina-mGPT-7B-768
from Huggingface π€ and put them to the following directory:ckpts βββ lumina_mgpt βββ Lumina-mGPT-7B-768 βββ config.json βββ generation_config.json βββ model-00001-of-00002.safetensors βββ other files...
-
LlamaGen For LlamaGen T2I model, download
LlamaGen-T2I
and/orLlamaGen-T2I-2
, which is a huggingface style converted model fromLlamaGen
.In addition, you should download
VQ-VAE
andflan-t5-xl
.ckpts βββ llamagen βββ LlamaGen-T2I β βββ config.json β βββ generation_config.json β βββ model.safetensors β βββ other files... βββ LlamaGen-T2I-2 β βββ config.json β βββ generation_config.json β βββ model.safetensors β βββ other files... βββ vq_ds16_t2i.pt βββ t5 βββ flan-t5-xl βββ config.json βββ generation_config.json βββ model-00001-of-00002.safetensors βββ other files...
(Optional) Trained drafter To use trained drafter, you need to download
llamagen_drafter
and/orllamagen2_drafter
and save it under trained_drafters directory.ckpts βββ llamagen βββ trained_drafters βββ llamagen_drafter | βββ config.json | βββ generation_config.json | βββ pytorch_model.bin | βββ other files... βββ llamagen2_drafter βββ config.json βββ generation_config.json βββ pytorch_model.bin βββ other files...
-
Anole For Anole, download
Anole-7b-v0.1-hf
, which is a huggingface style converted model fromAnole
.In addition, you should download the original VQ-VAE weights provided by Meta and put them to the following directory:
ckpts βββ anole βββ Anole-7b-v0.1-hf | βββ config.json | βββ generation_config.json | βββ model-00001-of-00003.safetensors | βββ other files... βββ chameleon βββ tokenizer βββ text_tokenizer.json βββ vqgan.yaml βββ vqgan.ckpt
(Optional) Trained drafter To use trained drafter, you need to download
anole_drafter
and save it under trained_drafters directory.ckpts βββ anole βββ trained_drafters βββ anole_drafter βββ config.json βββ generation_config.json βββ pytorch_model.bin βββ other files...
-
All the functionalities can be done by either running main.py
or directly running entrypoints/{function}.py
.
Currently, "llamagen" (LlamaGen-Stage I), "llamagen2" (LlamaGen-Stage II), "anole", and "lumina_mgpt" are supported as --model.
π§ Lumina-mGPT is still under construction, so some functions may not work properly yet. You can follow the procedures here, but you may encounter a few exceptions.
-
Generate Images
python main.py generate_images --model <model_name> --model_type <model_type; e.g., base, vllm, eagle> --model_path <model_path> --drafter_path <drafter_path> --output_dir <output_dir> ...
or
python -m entrypoints.generate_images --model <model_name> --model_type <model_type; e.g., base, vllm, eagle> --model_path <model_path> --drafter_path <drafter_path> --output_dir <output_dir> ...
π‘How to use LANTERN and LANTERN++ for image generation
- For LANTERN, set
--model_type eagle
, turn on--lantern
option and set--lantern_k
and--lantern_delta
options. - For LANTERN++, use
--static_tree
option and use--lantern_delta
to set$\lambda$ value.
- For LANTERN, set
-
Generate Training Data for Drafter
python main.py generate_train_data --model <model_name> --data_path <path_to_image_tokens> --output_dir <output_dir> --num_samples <num_samples>
or
python -m entrypoints.generate_train_data --model <model_name> --data_path <path_to_image_tokens> --output_dir <output_dir> --num_samples <num_samples>
For LlamaGen and Anole, you have to extract code and T5 embedding(only for LlamaGen) for training data.
- Locate image and caption files in given format and execute following command before run generate_train_data:
Data Format:
- image_folder
- {file_1}.jpg
- {file_1}.txt
- {file_2}.jpg
- {file_2}.txt ...
python main.py extract_code --model <model_type> --data_path <path_to_image_and_caption> --output_dir <output_dir> --num_samples <num_samples>
or
python -m entrypoints.extract_code --model <model_type> --data_path <path_to_image_and_caption> --output_dir <output_dir> --num_samples <num_samples>
-
Train Drafter Model
python main.py train_drafter --model <model_type> --base_path <base_model_path> --config_path <path_to_config.json> --data_dir <data_dir> --save_dir <save_dir> --lr <lr> --bs <bs> --gradient_accumlation_steps <gradient_accumulation_steps> ...
or
python -m entrypoints.train_drafter.main --model <model_type> --base_path <base_model_path> --config_path <path_to_config.json> --data_dir <data_dir> --save_dir <save_dir> --lr <lr> --bs <bs> --gradient_accumlation_steps <gradient_accumulation_steps> ...
For multi GPU training with accelerate, you can use
accelerate launch -m entrypoints.train_drafter.main --model <model_type> --base_path <base_model_path> --config_path <path_to_config.json> --data_dir <data_dir> --save_dir <save_dir> --lr <lr> --bs <bs> --gradient_accumlation_steps <gradient_accumulation_steps> ...
-
Generate VQ Distances
python main.py generate_codebook --model <model_name> --save_path <save_path>
or
python -m entrypoints.generate_codebook --model <model_name> --save_path <save_path>
-
Evaluate Generated Images We support FID, CLIP score, Precision/Recall and HPSv2 for image evaluation.
python main.py eval_fid_clip --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> --caption_path <path_to_prompt> --how_many <number_of_images_for_evaluation> ...
python main.py eval_prec_recall --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> ...
python main.py eval_hpsv2 --image_path <path_to_generated_image> --prompt_path <path_to_prompt>
or
python -m entrypoints.eval_fid_clip --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> --caption_path <path_to_prompt> --how_many <number_of_images_for_evaluation> ...
python -m entrypoints.eval_prec_recall --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> ...
python -m entrypoints.eval_hpsv2 --image_path <path_to_generated_image> --prompt_path <path_to_prompt>
config.json
should be in theckpts/{model_name}/trained_models/{drafter_path}
Since theModel
incnets_{model_name}.py
is initialized according to theconfig.json
in thedrafter_path
, you need to placeconfig.json
for drafter correctly. Note that theconfig.json
should be same as the base model'sconfig.json
other thannum_hidden_layers
.
This project is distributed under the Chameleon License by Meta Platforms, Inc. For more information, please see the LICENSE
file in the repository.
This repository is built with extensive reference to FoundationVision/LlamaGen, Alpha-VLLM/Lumina-mGPT and SafeAILab/EAGLE, leveraging many of their core components and approaches.
@article{jang2024lantern,
title={LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding},
author={Jang, Doohyuk and Park, Sihwan and Yang, June Yong and Jung, Yeonsung and Yun, Jihun and Kundu, Souvik and Kim, Sung-Yub and Yang, Eunho},
journal={arXiv preprint arXiv:2410.03355},
year={2024}
}
@article{park2025lanternenhancedrelaxedspeculative,
title={LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models},
author={Sihwan Park and Doohyuk Jang and Sungyub Kim and Souvik Kundu and Eunho Yang},
journal={arXiv preprint arXiv:2410.03355},
year={2025}
}