Jalla (Just Another Large Language Agregator) is a responses middleware transformer with full responses coverage.
It provides a single endpoint layer that can:
- route requests to multiple upstream providers,
- normalize behavior across provider APIs,
- fail over between backends,
- expose streaming and websocket flows.
- Architecture at a glance
- Quickstart
- Configuration model
- Full config walkthrough (
config.tomlandconfig.example.toml) - Environment variables
- How model routing works
- Common configuration patterns
- Debugging configuration issues
- Development
Jalla has three key configuration layers:
- Providers: upstream APIs (OpenAI, Anthropic, Gemini, etc.)
- Models: your public aliases mapped to provider-native model names
- Load balancing rules: optional failover behavior for aliases
At runtime:
- a client calls Jalla using a model alias (
default,fast,reasoning, etc.), - Jalla resolves alias -> provider model,
- Jalla executes the upstream request,
- optional failover rules trigger on configured status codes.
- Rust toolchain (edition 2024 compatible)
export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."If a provider is enabled in config.toml, its api_key_env variable must be set.
cargo runDefault bind address from the example config:
0.0.0.0:8787
cargo testJalla uses config.toml as the primary declarative config.
Top-level sections:
[server]: bind/auth settings[store]: response storage backend[providers.*]: provider connection settings[[model]]: alias registry[[loadbalancing.rule]]: failover behavior[[pricing.model]]: optional pricing metadata
The repository includes:
config.toml: minimal local defaults for quick startupconfig.example.toml: full OSS template for real deployments
To bootstrap a full config:
cp config.example.toml config.toml[server]
bind_addr = "0.0.0.0:8787"
require_auth = truebind_addr: HTTP bind address and port.require_auth: whether requests must be authenticated by Jalla.
[store]
backend = "memory"
# backend = "valkey"
# url = "redis://127.0.0.1:6379"
# ttl_secs = 3600
# key_prefix = "jalla:resp:"backend = "memory"is the easiest local mode.- For persistence, use
backend = "valkey"and seturl. ttl_secsandkey_prefixare optional tuning knobs.
Example:
[providers.openai]
enabled = true
base_url = "https://api.openai.com"
api_key_env = "OPENAI_API_KEY"
responses_path = "/v1/responses"
chat_completions_path = "/v1/chat/completions"Provider fields:
enabled: whether provider is available for routing.base_url: upstream provider base URL.api_key_env: env var used for provider API key.- provider-specific endpoint paths:
- OpenAI:
responses_path,chat_completions_path - Anthropic:
messages_path - Gemini:
generate_content_path
- OpenAI:
Example:
[[model]]
name = "gpt-4.1"
alias = "default"
provider = "openai"Meaning:
aliasis what your clients send.nameis the upstream provider model identifier.providermust match an enabled provider section.
Optional model fields:
chat_completions = true: force chat-completions compatibility flow.- defaults like temperature/top_p/max output may be set per model when supported.
headers = { ... }: static upstream headers for that alias.
Example:
[[loadbalancing.rule]]
alias = "default"
primary = "openai"
[[loadbalancing.rule.failover]]
backend = "anthropic"
model = "reasoning"
on_status = [429, 500, 502, 503, 504]Behavior:
- for alias
default, route toopenaifirst, - if upstream returns any status in
on_status, - retry through failover backend/model.
Pricing entries are optional and useful for cost accounting/analytics:
[[pricing.model]]
name = "gpt-4.1"
input_per_mtok_usd = 2.00
cached_input_per_mtok_usd = 0.20
output_per_mtok_usd = 8.00At minimum, set keys for enabled providers:
export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GEMINI_API_KEY="..." # only if providers.gemini.enabled = trueUseful runtime flags:
export JALLA_DEBUG_PRETTY_RESPONSE=1
export RUST_LOG="jalla::request=debug,jalla::response=debug,jalla::error=debug,jalla::proxy=debug,jalla::auth=debug,tower_http=debug"Then run:
cargo runGiven this request model:
model = "default"
Jalla resolves:
default->[[model]]entry- entry selects
provider = "openai"andname = "gpt-4.1" - if request fails with configured failover status, use loadbalancing rule
- route to fallback
model = "reasoning"onanthropic
Use only [providers.openai] and define aliases like default, fast, reasoning.
Keep two providers enabled and add [[loadbalancing.rule]] for critical aliases.
Start with:
require_auth = falsebackend = "memory"- one provider enabled
Then add auth and persistent store before production.
- Confirm the provider is
enabled = true - Confirm
api_key_envexists in your shell
- Confirm
modelalias in request matches[[model]].alias - Confirm no typo/case mismatch
- Confirm there is a
[[loadbalancing.rule]]for that alias - Confirm returned status is listed in
on_status
Run:
cargo testRun with debug logs:
JALLA_DEBUG_PRETTY_RESPONSE=1 \
RUST_LOG=jalla::request=debug,jalla::response=debug,jalla::error=debug,jalla::proxy=debug,jalla::auth=debug,tower_http=debug \
cargo run