Skip to content

Support Micro Batching and Lossess Compression#39

Merged
HaibaraAiChan merged 24 commits into
ai-decentralized:mainfrom
JiuChen0:integrate/upstream
Feb 20, 2026
Merged

Support Micro Batching and Lossess Compression#39
HaibaraAiChan merged 24 commits into
ai-decentralized:mainfrom
JiuChen0:integrate/upstream

Conversation

@JiuChen0
Copy link
Copy Markdown
Contributor

  • Added Micro-batching support: large batches can be split into micro-batches, with GPU slot reuse / multiplexing on the KV cache.

    • Files: microbatch_config.py, memory_cache_manager.py, block_functions.py, handler.py
  • Added cross-stage overlap (compute/communication overlap): enables micro-batch–level asynchronous push/consume pipelining.

    • Files: handler.py, block_functions.py, microbatch_config.py
  • Added Zstd lossless compression for activation transport packaging before/after transfer.

    • Files: lossless_transport.py, lossless_wrapper_config.py
  • Merged the Speculative Decoding path into the existing inference pipeline. For now, speculative decoding remains on the full batch-size path (does not use micro-batching) to ensure correct alignment across tree/draft/KV.

    • Files: speculative_model.py, inference_session.py, handler.py, block_functions.py

@HaibaraAiChan HaibaraAiChan merged commit 74e0ff0 into ai-decentralized:main Feb 20, 2026
JiuChen0 added a commit to JiuChen0/BloomBee that referenced this pull request Mar 22, 2026
* micro batching slice

* cross stage

* cross stage overlap

* fix shape mismatch

* fix cross stage error

* cross stage pipeline

* micro batch size reuse

* pipeline

* overlap

* fix kvcache BH_dst

* fix batch mismatch

* delete debug print

* cross stage overlap

* fix micro batching

* micro batching

* add timer tracking

* finish micro batching

* merge code of sepc decoding

* spec decoding max length test

* add compression

* spec decoding token limit error

* fix disable  micro batch error

* fix micro batching index unstable problem
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants