14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
-
Updated
Mar 21, 2026 - Python
14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
[TMLR 2026] Survey: https://arxiv.org/pdf/2507.20198
📚 Collection of token-level model compression resources.
The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
Token-Oriented Object Notation - A compact data format for reducing token consumption when sending structured data to LLMs (PHP implementation)
Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"
You say it. AutoCode builds it. 38 professional skills, persistent memory, 60%+ dev cost savings. Zero dependencies. Free forever.
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
[ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
[ICLR 2026] Official code repository for "⚡️VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration"
AI gateway with token compression for Claude Code, Codex, and more
[ICLR 2026] MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
😎 Awesome papers on token redundancy reduction
This repo integrates DyCoke's token compression method with VLMs such as Gemma3 and InternVL3
[ICLR 2026] Official code of PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models.
⚡ Compress Claude Code context by 60-90%. Six noise filters RTK doesn't have.
Rust Local Token Compression Proxy for coding agents, built solo for GenAI Genesis 2026. 🏆 1st Google Sustainability Hack
Official implementation of TCSVT 2025 paper: DiViCo: Disentangled Visual Token Compression For Efficient Large Vision-Language Model
[Arxiv 2025 Preprint] HiPrune, a training-free visual token pruning method for VLM acceleration.
Token compression + context memory for Claude Code. Runs automatically. No configuration required.
Add a description, image, and links to the token-compression topic page so that developers can more easily learn about it.
To associate your repository with the token-compression topic, visit your repo's landing page and select "manage topics."