Support Micro Batching and Lossess Compression by JiuChen0 · Pull Request #39 · ai-decentralized/BloomBee

JiuChen0 · 2026-02-20T01:26:01Z

Added Micro-batching support: large batches can be split into micro-batches, with GPU slot reuse / multiplexing on the KV cache.
- Files: microbatch_config.py, memory_cache_manager.py, block_functions.py, handler.py
Added cross-stage overlap (compute/communication overlap): enables micro-batch–level asynchronous push/consume pipelining.
- Files: handler.py, block_functions.py, microbatch_config.py
Added Zstd lossless compression for activation transport packaging before/after transfer.
- Files: lossless_transport.py, lossless_wrapper_config.py
Merged the Speculative Decoding path into the existing inference pipeline. For now, speculative decoding remains on the full batch-size path (does not use micro-batching) to ensure correct alignment across tree/draft/KV.
- Files: speculative_model.py, inference_session.py, handler.py, block_functions.py

* micro batching slice * cross stage * cross stage overlap * fix shape mismatch * fix cross stage error * cross stage pipeline * micro batch size reuse * pipeline * overlap * fix kvcache BH_dst * fix batch mismatch * delete debug print * cross stage overlap * fix micro batching * micro batching * add timer tracking * finish micro batching * merge code of sepc decoding * spec decoding max length test * add compression * spec decoding token limit error * fix disable micro batch error * fix micro batching index unstable problem

JiuChen0 added 24 commits December 30, 2025 01:12

micro batching slice

22ac0ad

cross stage

6e09a82

cross stage overlap

370377e

fix shape mismatch

2d58388

fix cross stage error

b28ffab

cross stage pipeline

8054db6

micro batch size reuse

c117da0

pipeline

29c598d

overlap

a3494ae

fix kvcache BH_dst

0a73b63

fix batch mismatch

39daa03

delete debug print

213e44f

cross stage overlap

a3122b6

fix micro batching

4130144

micro batching

64c1d01

add timer tracking

7cd494a

merge code of spec decoding

c908ae0

finish micro batching

018b89c

merge code of sepc decoding

93523f7

spec decoding max length test

0db73ae

add compression

7bf582f

spec decoding token limit error

a7a71bd

fix disable micro batch error

d70545b

fix micro batching index unstable problem

22bafe8

HaibaraAiChan approved these changes Feb 20, 2026

View reviewed changes

HaibaraAiChan merged commit 74e0ff0 into ai-decentralized:main Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Micro Batching and Lossess Compression#39

Support Micro Batching and Lossess Compression#39
HaibaraAiChan merged 24 commits into
ai-decentralized:mainfrom
JiuChen0:integrate/upstream

JiuChen0 commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JiuChen0 commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants