DisCEdge: Distributed Context Management for Large Language Models at the Edge

DisCEdge is a distributed context management system designed to enable efficient, low-latency Large Language Model (LLM) inference in edge computing environments.

Deploying LLMs at the edge offers significant privacy and latency benefits, but managing state across geo-distributed nodes is a major challenge. DisCEdge addresses this by replicating user context (such as session history and preferences) in tokenized form. By maintaining context as token sequences rather than raw text, the system avoids redundant tokenization overhead, minimizes network bandwidth usage, and ensures data consistency as mobile clients roam between edge nodes.

DisCEdge processes requests in three context modes: (i) raw text mode, where the context is maintained as raw text, (ii) tokenized mode, where the context is maintained as tokenized data, and (iii) client-side mode, where the context is maintained on the client device and thus the requests are just forwarded to the LLM Service by the Context Manager.

DisCEdge Architecture Overview. The system consists of modular edge nodes containing a Context Manager, LLM Service, and Distributed KV Store.

Research

If you use this software in a publication, please cite it as:

Text

M. Malekabbasi, M. Wang, and D. Bermbach, DisCEdge: Distributed Context Management for Large Language Models at the Edge, 2025. (arXiv)

BibTeX

@article{malekabbasi2025discedge,
  title={DisCEdge: Distributed Context Management for Large Language Models at the Edge},
  author={Malekabbasi, Mohammadreza and Wang, Minghe and Bermbach, David},
  year={2025},
  url={https://arxiv.org/abs/2511.22599}
}

Usage

The Context Manager can be run in two modes:

Server Mode

When runServerMode is true, the Context Manager starts an HTTP server that listens for completion requests. It manages session and context persistence automatically.

The API payload is compatible with the LLaMa.cpp /completion endpoint but includes additional parameters for context management:

mode: raw, tokenized, or client-side.
session_id (optional): To continue an existing session. If omitted, a new session is created.
user_id (optional): To associate the session with a user.
turn: A client-side counter for the conversation turn, used for synchronization.

Example Request:

{
	"model": "Qwen1.5-0.5B-Chat-Q4_K_M:latest",
	"prompt": "What language does people speak there",
	"temperature": 0,
	"seed": 123,
	"stream": false,
	"mode": "raw",
	"user_id": "u1"
}

Example Response: The response from LLaMa.cpp is augmented with session information.

{
    "content": "...",
    "session_id": "...",
    "user_id": "...",
    "mode": "raw"
}

Scenario Mode

When runServerMode is false, Context Manager runs in a non-interactive test mode based on a scenario file. This mode is useful for development and testing.

It reads a sequence of user messages from a YAML file specified by scenarioFilePath.
At startup, it prompts the user to choose the context method (raw or tokenized) for the entire scenario run.
It processes the messages sequentially, simulating a conversation and logging performance metrics to a CSV file in testdata/log/.

Configuration

runServerMode: Set to true for server mode or false for scenario mode.
serverListenAddr: The address and port for the server to listen on (e.g., :8081).
scenarioFilePath: Path to the YAML file for scenario mode (e.g., testdata/example_robo_longer.yml).

Run DisCEdge (paper version)

run fred/etd.sh on a node
clear etcd data etcdctl del "" --from-key
run LLaMa.cpp-fastencode on nodes. This fork is modified to accept a pre-tokenized context, which is required for the tokenized mode.
- ./server -m ./Qwen1.5-0.5B-Chat-Q4_K_M.gguf -c 2048 -n 128 -b 512 -ngl 33 (had to explicitly specify for Jetson TX2 to run on GPU)
- Parameters:
  - -c N: size of the prompt context (default: 512)
  - -n N: maximum tokens to predict (default: -1)
  - -ngl N: number of layers to store in VRAM
  - -b N: batch size for prompt processing (default: 512)
run fred/edge-node-*.sh on nodes
run LLM Context Manager on nodes (check fredAddr to be correct)
(optional) run fred_traffic_monitor.sh on edge nodes to capture inter-node DB traffic
- Example: ./fred_traffic_monitor.sh raw-TX2 250 or fred_traffic_monitor.sh tokenized-TX2 250
- See fred_traffic_monitor.md for detailed usage, prerequisites, and experiment workflow
run client from discedge-client-experiments repository

Notes

All FReD scripts are in fred/.
The config.sh script sets up the environment variables such as node IP addresses. No need to run this script manually, it is sourced automatically by other scripts.
You need to generate certificates gen-cert.sh for etcd/FReD

etcd client useful commands

# list all keys
etcdctl --cacert=cert/ca.crt \                                                                             
        --cert=cert/frededge1.crt \
        --key=cert/frededge1.key \
get "" --prefix --keys-only
# delete all keys
etcdctl --cacert=cert/ca.crt \                                                                             
        --cert=cert/frededge1.crt \
        --key=cert/frededge1.key \
del "" --from-key

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
cmd		cmd
fred		fred
internal		internal
testdata		testdata
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture-edge.svg		architecture-edge.svg
fred_traffic_monitor.md		fred_traffic_monitor.md
fred_traffic_monitor.sh		fred_traffic_monitor.sh
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DisCEdge: Distributed Context Management for Large Language Models at the Edge

Research

Text

BibTeX

Usage

Server Mode

Scenario Mode

Configuration

Run DisCEdge (paper version)

Notes

etcd client useful commands

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ChaosRez/llm-context-management

Folders and files

Latest commit

History

Repository files navigation

DisCEdge: Distributed Context Management for Large Language Models at the Edge

Research

Text

BibTeX

Usage

Server Mode

Scenario Mode

Configuration

Run DisCEdge (paper version)

Notes

etcd client useful commands

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages