Skip to content
/ LANTERN Public

Official Implementation of LANTERN (ICLR'25) and LANTERN++(ICLRW-SCOPE'25)

Notifications You must be signed in to change notification settings

jadohu/LANTERN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LANTERN

This repository is an official PyTorch implementation of the paper LANTERN: Accelerating Visual Autoregressive Models via Relaxed Speculative Decoding (ICLR 2025) and LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models (ICLRW - SCOPE(Oral) 2025), which supports various functionalities related to LANTERN, including model inference, drafter model training, drafter model training data generation and image decoding for image generation.


πŸ“° News

  • [2025-03-05] πŸŽ‰πŸŽ‰πŸŽ‰ LANTERN is released! πŸŽ‰πŸŽ‰πŸŽ‰

πŸ“š Table of Contents

  1. Directory Structure
  2. Installation
  3. Key Features
  4. Usage
  5. License
  6. Acknowledgement
  7. Citation

πŸ—‚οΈ Directory Structure

The main directory structure of the project is as follows:

.
β”œβ”€β”€ models/                                     # Model and related modules
β”‚   β”œβ”€β”€ base_models/                            # Base model modules
β”‚   β”‚   β”œβ”€β”€ lumina_mgpt
β”‚   β”‚   β”‚   β”œβ”€β”€ modeling_lumina_mgpt.py
β”‚   β”‚   β”‚   └── other files...
β”‚   β”‚   └── other models...     
β”‚   β”œβ”€β”€ kv_variants/                            # Key-Value variant models
β”‚   β”‚   β”œβ”€β”€ modeling_lumina_mgpt_kv.py
|   |   └── modeling_anole_kv.py
β”‚   β”‚   └── other models...
β”‚   β”œβ”€β”€ drafters/                               # Drafter model modules
β”‚   β”‚   β”œβ”€β”€ kv_cache.py
β”‚   β”‚   β”œβ”€β”€ choices
β”‚   β”‚   β”œβ”€β”€ cnets_lumina_mgpt.py
|   |   β”œβ”€β”€ cnets_anole.py
β”‚   β”‚   β”œβ”€β”€ cnets_{other_models}.py ...   
β”‚   β”‚   └── utils.py
β”‚   β”œβ”€β”€ configs/                                # Configuration modules
β”‚   β”‚   β”œβ”€β”€ configs.py
β”‚   β”‚   β”œβ”€β”€ configuration_lumina_mgpt.py
|   |   β”œβ”€β”€ configuration_anole.py
β”‚   β”‚   └── configuration_{other_models}.py...
β”‚   β”œβ”€β”€ ea_model_lumina_mgpt.py                 # EAGLE models
|   β”œβ”€β”€ ea_model_anole.py
β”‚   └── ea_model_{other_models}.py...
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ configs/
β”‚   β”‚   β”œβ”€β”€ lumina_mgpt_config.json             # Configuration for model init
|   |   β”œβ”€β”€ anole_config.json
β”‚   β”‚   └── configs for other models...
β”‚   β”œβ”€β”€ prompts/                                # Prompts for image generation
β”‚   β”œβ”€β”€ self_distilled_data/                    # Self-distilled data for drafter training
β”‚   └── drafter_train_data/                     # Train data for drafter
β”œβ”€β”€ ckpts/                                      # Model checkpoints folder
β”‚   β”œβ”€β”€ lumina_mgpt/
β”‚   β”‚   β”œβ”€β”€ chameleon/                      
β”‚   β”‚   β”œβ”€β”€ Lumina-mGPT-7B-768/                 # Model and tokenizer files
β”‚   β”‚   β”œβ”€β”€ trained_drafters/                   # Trained drafter models
|   |   |   └──...state_20/
|   |   |      β”œβ”€β”€ config.json                  # config.json for drafter model
|   |   |      └── other files...
β”‚   β”‚   └── vq_distances/                       # Pre-computed VQ distances for LANTERN
β”‚   └── other models...
β”œβ”€β”€ entrypoints/                                # Execution entry points
β”‚   β”œβ”€β”€ train_drafter/
β”‚   β”‚   β”œβ”€β”€ data_utils.py
β”‚   β”‚   └── main.py
β”‚   β”œβ”€β”€ generate_codebook.py
β”‚   β”œβ”€β”€ generate_images.py
β”‚   β”œβ”€β”€ generate_train_data.py
β”‚   └── other files...
β”œβ”€β”€ third_party/                                # Third-party libraries
β”‚   └── vllm
β”œβ”€β”€ main.py                                     # Main execution script
β”œβ”€β”€ requirements.txt                            # Project dependencies
β”œβ”€β”€ environment.yaml
β”œβ”€β”€ .gitignore
└── README.md

Here is a brief description for each directory.

  1. models/ - Contains model implementations and related modules.

    • base_models/ - Base model implementations (e.g., Lumina-mGPT, LlamaGen, Anole).
    • kv_variants/ - Modified base models with Key-Value cache adaptations for enhanced compatibility with EAGLE’s architecture.
    • drafters/ - Modules and auxiliary code for drafter models.
    • configs/ - Configuration modules for each model (e.g., ChameleonConfig for Lumina-mGPT).
  2. data/ - Stores configuration files, text prompts, self-distilled data, and drafter training data.

  3. ckpts/ - Checkpoints for all models, including trained drafters and VQ distances for relaxed speculative decoding.

  4. entrypoints/ - Primary scripts for tasks such as image generation, codebook generation, and drafter training.

  5. third_party/ - Custom external libraries, including modifications for specific functionality.


βš™οΈ Installation

  1. Install Required Packages Requirements

    • Python >= 3.10
    • PyTorch >= 2.4.0

    Install the dependencies listed in requirements.txt.

    git clone https://github.com/jadohu/LANTERN
    cd LANTERN
    pip install -r requirements.txt
  2. Additional Setup

    1. Lumina-mGPT For Lumina-mGPT, we need to install flash_attention and xllmx packages.
      pip install flash-attn --no-build-isolation
      cd models/base_models/lumina_mgpt
      pip install -e .
      1. (Optional) vLLM Install and set up vLLM with the required modifications. Note that we use vLLM==0.6.3 and build from source. The required modifications are specifed in third_party/vllm. The installation procedure is as follows.
        pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/fd47e57f4b0d5f7920903490bce13bc9e49d8dba/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
        git clone https://github.com/vllm-project/vllm
        cd vllm
        git checkout tags/v0.6.3
        
        cd ..
        mv -rf third_party/vllm/* vllm/
        cd vllm
        python python_only_dev.py
  3. Checkpoints All model weights and other required data should be stored in ckpts/.

    1. Lumina-mGPT For Lumina-mGPT, since currently the Chameleon implementation in transformers does not contain the VQ-VAE decoder, please manually download the original VQ-VAE weights provided by Meta and put them to the following directory:

      ckpts
      └── lumina_mgpt
          └── chameleon
              └── tokenizer
                  β”œβ”€β”€ text_tokenizer.json
                  β”œβ”€β”€ vqgan.yaml
                  └── vqgan.ckpt
      

      Also download the original model Lumina-mGPT-7B-768 from Huggingface πŸ€— and put them to the following directory:

      ckpts
      └── lumina_mgpt
          └── Lumina-mGPT-7B-768
              β”œβ”€β”€ config.json
              β”œβ”€β”€ generation_config.json
              β”œβ”€β”€ model-00001-of-00002.safetensors
              └── other files...
      
    2. LlamaGen For LlamaGen T2I model, download LlamaGen-T2I and/or LlamaGen-T2I-2, which is a huggingface style converted model from LlamaGen.

      In addition, you should download VQ-VAE and flan-t5-xl.

      ckpts
      └── llamagen
          β”œβ”€β”€ LlamaGen-T2I
          β”‚   β”œβ”€β”€ config.json
          β”‚   β”œβ”€β”€ generation_config.json
          β”‚   β”œβ”€β”€ model.safetensors
          β”‚   └── other files...
          β”œβ”€β”€ LlamaGen-T2I-2
          β”‚   β”œβ”€β”€ config.json
          β”‚   β”œβ”€β”€ generation_config.json
          β”‚   β”œβ”€β”€ model.safetensors
          β”‚   └── other files...
          β”œβ”€β”€ vq_ds16_t2i.pt
          └── t5
              └── flan-t5-xl
                  β”œβ”€β”€ config.json
                  β”œβ”€β”€ generation_config.json
                  β”œβ”€β”€ model-00001-of-00002.safetensors
                  └── other files...
      

      (Optional) Trained drafter To use trained drafter, you need to download llamagen_drafter and/or llamagen2_drafter and save it under trained_drafters directory.

      ckpts
      └── llamagen
          └── trained_drafters
              β”œβ”€β”€ llamagen_drafter
              |   β”œβ”€β”€ config.json
              |   β”œβ”€β”€ generation_config.json
              |   β”œβ”€β”€ pytorch_model.bin
              |   └── other files...
              └── llamagen2_drafter
                  β”œβ”€β”€ config.json
                  β”œβ”€β”€ generation_config.json
                  β”œβ”€β”€ pytorch_model.bin
                  └── other files...
      
    3. Anole For Anole, download Anole-7b-v0.1-hf, which is a huggingface style converted model from Anole.

      In addition, you should download the original VQ-VAE weights provided by Meta and put them to the following directory:

      ckpts
      └── anole
          β”œβ”€β”€ Anole-7b-v0.1-hf
          |   β”œβ”€β”€ config.json
          |   β”œβ”€β”€ generation_config.json
          |   β”œβ”€β”€ model-00001-of-00003.safetensors
          |   └── other files...
          └── chameleon
              └── tokenizer
                  β”œβ”€β”€ text_tokenizer.json
                  β”œβ”€β”€ vqgan.yaml
                  └── vqgan.ckpt
      

      (Optional) Trained drafter To use trained drafter, you need to download anole_drafter and save it under trained_drafters directory.

      ckpts
      └── anole
          └── trained_drafters
              └── anole_drafter
                  β”œβ”€β”€ config.json
                  β”œβ”€β”€ generation_config.json
                  β”œβ”€β”€ pytorch_model.bin
                  └── other files...
      

✨ Usage

All the functionalities can be done by either running main.py or directly running entrypoints/{function}.py. Currently, "llamagen" (LlamaGen-Stage I), "llamagen2" (LlamaGen-Stage II), "anole", and "lumina_mgpt" are supported as --model.

🚧 Lumina-mGPT is still under construction, so some functions may not work properly yet. You can follow the procedures here, but you may encounter a few exceptions.

  1. Generate Images

    python main.py generate_images --model <model_name> --model_type <model_type; e.g., base, vllm, eagle> --model_path <model_path> --drafter_path <drafter_path> --output_dir <output_dir> ...

    or

    python -m entrypoints.generate_images --model <model_name> --model_type <model_type; e.g., base, vllm, eagle> --model_path <model_path> --drafter_path <drafter_path> --output_dir <output_dir> ...

    πŸ’‘How to use LANTERN and LANTERN++ for image generation

    • For LANTERN, set --model_type eagle, turn on --lantern option and set --lantern_k and --lantern_delta options.
    • For LANTERN++, use --static_tree option and use --lantern_delta to set $\lambda$ value.
  2. Generate Training Data for Drafter

    python main.py generate_train_data --model <model_name> --data_path <path_to_image_tokens> --output_dir <output_dir> --num_samples <num_samples>

    or

    python -m entrypoints.generate_train_data --model <model_name> --data_path <path_to_image_tokens> --output_dir <output_dir> --num_samples <num_samples>

    For LlamaGen and Anole, you have to extract code and T5 embedding(only for LlamaGen) for training data.

    • Locate image and caption files in given format and execute following command before run generate_train_data:

    Data Format:

    • image_folder
      • {file_1}.jpg
      • {file_1}.txt
      • {file_2}.jpg
      • {file_2}.txt ...
    python main.py extract_code --model <model_type> --data_path <path_to_image_and_caption> --output_dir <output_dir> --num_samples <num_samples>

    or

    python -m entrypoints.extract_code --model <model_type> --data_path <path_to_image_and_caption> --output_dir <output_dir> --num_samples <num_samples>
  3. Train Drafter Model

     python main.py train_drafter --model <model_type> --base_path <base_model_path> --config_path <path_to_config.json> --data_dir <data_dir> --save_dir <save_dir> --lr <lr> --bs <bs> --gradient_accumlation_steps <gradient_accumulation_steps> ...

    or

    python -m entrypoints.train_drafter.main --model <model_type> --base_path <base_model_path> --config_path <path_to_config.json> --data_dir <data_dir> --save_dir <save_dir> --lr <lr> --bs <bs> --gradient_accumlation_steps <gradient_accumulation_steps> ...

    For multi GPU training with accelerate, you can use

     accelerate launch -m entrypoints.train_drafter.main --model <model_type> --base_path <base_model_path> --config_path <path_to_config.json> --data_dir <data_dir> --save_dir <save_dir> --lr <lr> --bs <bs> --gradient_accumlation_steps <gradient_accumulation_steps> ...
  4. Generate VQ Distances

    python main.py generate_codebook --model <model_name> --save_path <save_path>

    or

    python -m entrypoints.generate_codebook --model <model_name> --save_path <save_path>
  5. Evaluate Generated Images We support FID, CLIP score, Precision/Recall and HPSv2 for image evaluation.

    python main.py eval_fid_clip --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> --caption_path <path_to_prompt> --how_many <number_of_images_for_evaluation> ...
    python main.py eval_prec_recall --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> ...
    python main.py eval_hpsv2 --image_path <path_to_generated_image> --prompt_path <path_to_prompt>

    or

    python -m entrypoints.eval_fid_clip --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> --caption_path <path_to_prompt> --how_many <number_of_images_for_evaluation> ...
    python -m entrypoints.eval_prec_recall --fake_dir <path_to_generated_image> --ref_dir <path_to_reference_image> ...
    python -m entrypoints.eval_hpsv2 --image_path <path_to_generated_image> --prompt_path <path_to_prompt>

⚠️ CAUTIONS

  1. config.json should be in the ckpts/{model_name}/trained_models/{drafter_path} Since the Model in cnets_{model_name}.py is initialized according to the config.json in the drafter_path, you need to place config.json for drafter correctly. Note that the config.json should be same as the base model's config.json other than num_hidden_layers.

βš–οΈ License

This project is distributed under the Chameleon License by Meta Platforms, Inc. For more information, please see the LICENSE file in the repository.


πŸ™ Acknowledgement

This repository is built with extensive reference to FoundationVision/LlamaGen, Alpha-VLLM/Lumina-mGPT and SafeAILab/EAGLE, leveraging many of their core components and approaches.


πŸ“„ Citation

@article{jang2024lantern,
  title={LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding},
  author={Jang, Doohyuk and Park, Sihwan and Yang, June Yong and Jung, Yeonsung and Yun, Jihun and Kundu, Souvik and Kim, Sung-Yub and Yang, Eunho},
  journal={arXiv preprint arXiv:2410.03355},
  year={2024}
}
@article{park2025lanternenhancedrelaxedspeculative,
  title={LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models}, 
  author={Sihwan Park and Doohyuk Jang and Sungyub Kim and Souvik Kundu and Eunho Yang},
  journal={arXiv preprint arXiv:2410.03355},
  year={2025}
}

About

Official Implementation of LANTERN (ICLR'25) and LANTERN++(ICLRW-SCOPE'25)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published