Benchmark: Discrete Image Tokenizers

Model	Approach	Token Type	Training Resolution	Inference Resolution	# Tokens per Image	Codebook Size	Training Data Augmented	Image Understanding	Image Generation	Pretraining Data
Open-MagVit2	VQ-VAE + MLM	Spatial (2D Grid)	256×256	Flexible (e.g., 256×256)	16×16 Compression	262,144	Unknown	✅	✅	Imagenet2012
Emu3-VisionTokenizer	VQ-GAN (MoVQGAN)	Spatial (2D Grid)	≥ 512×512	Flexible (e.g., 512×512)	8×8 Compression	32,768	Unknown	✅	✅	laion-high-resolution
Cosmos	VQ-AE (Discrete)	Spatial (2D Grid)	Flexible (256px to 4K)	Original	16×16 or 8×8 Compression	64,000	Unknown	✅	✅	VIDEO: Driving (11%), Hand motion and object manipulation (16%), Human motion and activity (10%), Spatial awareness and navigation (16%), First person point-of-view (8%), Nature dynamics (20%), Dynamic camera movements (8%), Synthetically rendered (4%), Others (7%)
FlowMo Hi	Diffusion Autoencoder (Transformer-based)	Sequential (1D latent)	256×256	256×256	1,024	16,384	Unknown	—	✅	Imagenet2012
TiTok	1D VQ-VAE (Transformer-based)	Sequential (1D latent)	256×256, 512×512	256×256, 512×512	256	4,096	Unknown	—	✅	ImageNet
Selftok	Diffusion-based AR Prior	Sequential (Autoregressive Prior)	256×256	256×256	512 / 1,024 / 1,536	32,768	Unknown	✅	✅	DataComp: 25.45%, LAION-2B En: 25.36%, LAION-2B Multi: 24.26%, COYO-700M: 12.96%, In-house T2I: 7.98%, In-house Text: 4.00%
UniTok	VQ-VAE	Sequential (1D latent)	256×256	flexible	flexible(8 x 256 for 256 x 256)	8 x 16000	Unknown	✅	✅	DataComp-1B
DetailFlow	Autoregressive	Sequential (AR coarse-to-fine, next-detail-prediction)	256×256	256×256	128 / 256 / 512	8,192	Unknown	-	✅	ImageNet-1K
TokenFlow	VQ-VAE (Transformer-based)	Spatial (2D latent, next-scale-prediction)	256×256 / 384x384	256×256 / 384x384	16x16 / 27x27	32,768	Unknown	✅	✅	LAION and COYO-700M (no ocr data!)
VILA-U	RQ-VAE	Spatial (2D latent)	256×256	256×256	16x16x4	16,384	Unknown	✅	✅	COYO-700M

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
Tokenizer		Tokenizer
assets		assets
download_dataset		download_dataset
notebook		notebook
vision_tokenization		vision_tokenization
.gitignore		.gitignore
README.md		README.md
Tiler.py		Tiler.py
calculate_metrics.py		calculate_metrics.py
conftest.py		conftest.py
emu3_reconstruct_helper.py		emu3_reconstruct_helper.py
emu3_vllm_inferencer.py		emu3_vllm_inferencer.py
expr.ipynb		expr.ipynb
lpips_comparison_plot.png		lpips_comparison_plot.png
lpips_vs_tokens_composite_plot.png		lpips_vs_tokens_composite_plot.png
lpips_vs_tokens_scatter.png		lpips_vs_tokens_scatter.png
metrics_results.csv		metrics_results.csv
metrics_results.md		metrics_results.md
molmo_tiler.py		molmo_tiler.py
pytest.ini		pytest.ini
requirements_metrics.txt		requirements_metrics.txt
test_conditional_generation.py		test_conditional_generation.py
utils_benchmark.py		utils_benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmark: Discrete Image Tokenizers

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

swiss-ai/benchmark-image-tokenzier

Folders and files

Latest commit

History

Repository files navigation

Benchmark: Discrete Image Tokenizers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages