RadixMesh: Distributed Radix Tree Cache for Accelerating LLM Serving

This project is implemented based on the radix_cache from the SGLang project.

Why need RadixMesh

RadixMesh can reuse kv cache in any routing policy.
RadixMesh can gc dup kv cache.
RadixMesh can be used by prefill or decode node to redirect req to other node if overloaded.
Updating the routing rule based on real-time event makes RadixMesh faster.
RadixMesh is capable of finding best match for both prefill and decode nodes.

Run

Config

(optional) venv

python3 -m venv venv
source venv/bin/activate

requirements

cd python
pip3 install .

Correctness Test

#cd python
python3 -m src.test.correctness

Benchmark

#cd python
python3 -m src.test.benchmark

Future Roadmap

How to be Distributed & Consistent

Eventual Consistency

same base state
ordered oplog applied
oplog is idempotent
master free policy to resolve possible multi-write conflict

Notice

The tree structures among the different nodes may not be identical, but the query (lookup) results will be consistent.

Tech Detail

tcp connection with simple protocol

 4-byte
[length][data]

keep alive

Send heartbeat requests at regular intervals to keep the connection active and detect topology anomalies.
Heartbeat with ttl to avoid infinite loop
Hierarchy heartbeat
- next node short heartbeat
- ring check with long gap

node topo check

Two-rounds verification for ring-topo

cache-aware routing

warmup stage use random hash routing strategy to avoid all request go only one node
prefix match first
- give prefill node & decode node
consistent hash if no nodes has cache

gc dup kv cache

local ref
- When to gc: periodical
- How to gc
  - each tree send try gc req, collect agree
  - if all agree, gc
global ref
- When to gc: periodical & ref reach 0
- How to gc
  - insert oplog with initial ref count
  - update ref count when used/freed (idempotent impl is complex)
  - when ref reach 0, send try gc req, collect agree
  - each tree send try gc req, collect agree
  - if all agree, gc

same tree but diff value

router value & prefill value & normal value have different value class

Limitation

All nodes' config should be same except local_cache_addr

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docker		docker
python		python
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RadixMesh: Distributed Radix Tree Cache for Accelerating LLM Serving

Why need RadixMesh

Run

Config

Correctness Test

Benchmark

Future Roadmap

How to be Distributed & Consistent

Tech Detail

tcp connection with simple protocol

keep alive

node topo check

cache-aware routing

gc dup kv cache

same tree but diff value

Limitation

About

Uh oh!

Releases

Packages

Languages

zzt93/RadixMesh

Folders and files

Latest commit

History

Repository files navigation

RadixMesh: Distributed Radix Tree Cache for Accelerating LLM Serving

Why need RadixMesh

Run

Config

Correctness Test

Benchmark

Future Roadmap

How to be Distributed & Consistent

Tech Detail

tcp connection with simple protocol

keep alive

node topo check

cache-aware routing

gc dup kv cache

same tree but diff value

Limitation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages