DisCEdge is a distributed context management system designed to enable efficient, low-latency Large Language Model (LLM) inference in edge computing environments.
Deploying LLMs at the edge offers significant privacy and latency benefits, but managing state across geo-distributed nodes is a major challenge. DisCEdge addresses this by replicating user context (such as session history and preferences) in tokenized form. By maintaining context as token sequences rather than raw text, the system avoids redundant tokenization overhead, minimizes network bandwidth usage, and ensures data consistency as mobile clients roam between edge nodes.
DisCEdge processes requests in three context modes: (i) raw text mode, where the context is maintained as raw text, (ii) tokenized mode, where the context is maintained as tokenized data, and (iii) client-side mode, where the context is maintained on the client device and thus the requests are just forwarded to the LLM Service by the Context Manager.
DisCEdge Architecture Overview. The system consists of modular edge nodes containing a Context Manager, LLM Service, and Distributed KV Store.
If you use this software in a publication, please cite it as:
M. Malekabbasi, M. Wang, and D. Bermbach, DisCEdge: Distributed Context Management for Large Language Models at the Edge, 2025. (arXiv)
@article{malekabbasi2025discedge,
title={DisCEdge: Distributed Context Management for Large Language Models at the Edge},
author={Malekabbasi, Mohammadreza and Wang, Minghe and Bermbach, David},
year={2025},
url={https://arxiv.org/abs/2511.22599}
}The Context Manager can be run in two modes:
When runServerMode is true, the Context Manager starts an HTTP server that listens for completion requests. It manages session and context persistence automatically.
The API payload is compatible with the LLaMa.cpp /completion endpoint but includes additional parameters for context management:
mode:raw,tokenized, orclient-side.session_id(optional): To continue an existing session. If omitted, a new session is created.user_id(optional): To associate the session with a user.turn: A client-side counter for the conversation turn, used for synchronization.
Example Request:
{
"model": "Qwen1.5-0.5B-Chat-Q4_K_M:latest",
"prompt": "What language does people speak there",
"temperature": 0,
"seed": 123,
"stream": false,
"mode": "raw",
"user_id": "u1"
}Example Response: The response from LLaMa.cpp is augmented with session information.
{
"content": "...",
"session_id": "...",
"user_id": "...",
"mode": "raw"
}When runServerMode is false, Context Manager runs in a non-interactive test mode based on a scenario file. This mode is useful for development and testing.
- It reads a sequence of user messages from a YAML file specified by
scenarioFilePath. - At startup, it prompts the user to choose the context method (
rawortokenized) for the entire scenario run. - It processes the messages sequentially, simulating a conversation and logging performance metrics to a CSV file in
testdata/log/.
runServerMode: Set totruefor server mode orfalsefor scenario mode.serverListenAddr: The address and port for the server to listen on (e.g.,:8081).scenarioFilePath: Path to the YAML file for scenario mode (e.g.,testdata/example_robo_longer.yml).
- run
fred/etd.shon a node - clear etcd data
etcdctl del "" --from-key - run LLaMa.cpp-fastencode on nodes. This fork is modified to accept a pre-tokenized context, which is required for the
tokenizedmode../server -m ./Qwen1.5-0.5B-Chat-Q4_K_M.gguf -c 2048 -n 128 -b 512 -ngl 33(had to explicitly specify for Jetson TX2 to run on GPU)- Parameters:
-c N: size of the prompt context (default: 512)-n N: maximum tokens to predict (default: -1)-ngl N: number of layers to store in VRAM-b N: batch size for prompt processing (default: 512)
- run
fred/edge-node-*.shon nodes - run LLM Context Manager on nodes (check
fredAddrto be correct) - (optional) run
fred_traffic_monitor.shon edge nodes to capture inter-node DB traffic- Example:
./fred_traffic_monitor.sh raw-TX2 250orfred_traffic_monitor.sh tokenized-TX2 250 - See
fred_traffic_monitor.mdfor detailed usage, prerequisites, and experiment workflow
- Example:
- run client from discedge-client-experiments repository
- All FReD scripts are in
fred/. - The
config.shscript sets up the environment variables such as node IP addresses. No need to run this script manually, it is sourced automatically by other scripts. - You need to generate certificates
gen-cert.shfor etcd/FReD
# list all keys
etcdctl --cacert=cert/ca.crt \
--cert=cert/frededge1.crt \
--key=cert/frededge1.key \
get "" --prefix --keys-only
# delete all keys
etcdctl --cacert=cert/ca.crt \
--cert=cert/frededge1.crt \
--key=cert/frededge1.key \
del "" --from-key
This project is licensed under the MIT License - see the LICENSE file for details.