Jalla

Jalla (Just Another Large Language Agregator) is a responses middleware transformer with full responses coverage.

It provides a single endpoint layer that can:

route requests to multiple upstream providers,
normalize behavior across provider APIs,
fail over between backends,
expose streaming and websocket flows.

Architecture at a glance

Jalla has three key configuration layers:

Providers: upstream APIs (OpenAI, Anthropic, Gemini, etc.)
Models: your public aliases mapped to provider-native model names
Load balancing rules: optional failover behavior for aliases

At runtime:

a client calls Jalla using a model alias (default, fast, reasoning, etc.),
Jalla resolves alias -> provider model,
Jalla executes the upstream request,
optional failover rules trigger on configured status codes.

Quickstart

Requirements

Rust toolchain (edition 2024 compatible)

1) Configure API keys

export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."

If a provider is enabled in config.toml, its api_key_env variable must be set.

2) Start Jalla

cargo run

Default bind address from the example config:

0.0.0.0:8787

3) Run tests

cargo test

Configuration model

Jalla uses config.toml as the primary declarative config.

Top-level sections:

[server]: bind/auth settings
[store]: response storage backend
[providers.*]: provider connection settings
[[model]]: alias registry
[[loadbalancing.rule]]: failover behavior
[[pricing.model]]: optional pricing metadata

Full config walkthrough (`config.toml` and `config.example.toml`)

The repository includes:

config.toml: minimal local defaults for quick startup
config.example.toml: full OSS template for real deployments

To bootstrap a full config:

cp config.example.toml config.toml

`[server]`

[server]
bind_addr = "0.0.0.0:8787"
require_auth = true

bind_addr: HTTP bind address and port.
require_auth: whether requests must be authenticated by Jalla.

`[store]`

[store]
backend = "memory"
# backend = "valkey"
# url = "redis://127.0.0.1:6379"
# ttl_secs = 3600
# key_prefix = "jalla:resp:"

backend = "memory" is the easiest local mode.
For persistence, use backend = "valkey" and set url.
ttl_secs and key_prefix are optional tuning knobs.

`[providers.<name>]`

Example:

[providers.openai]
enabled = true
base_url = "https://api.openai.com"
api_key_env = "OPENAI_API_KEY"
responses_path = "/v1/responses"
chat_completions_path = "/v1/chat/completions"

Provider fields:

enabled: whether provider is available for routing.
base_url: upstream provider base URL.
api_key_env: env var used for provider API key.
provider-specific endpoint paths:
- OpenAI: responses_path, chat_completions_path
- Anthropic: messages_path
- Gemini: generate_content_path

`[[model]]` aliases

Example:

[[model]]
name = "gpt-4.1"
alias = "default"
provider = "openai"

Meaning:

alias is what your clients send.
name is the upstream provider model identifier.
provider must match an enabled provider section.

Optional model fields:

chat_completions = true: force chat-completions compatibility flow.
defaults like temperature/top_p/max output may be set per model when supported.
headers = { ... }: static upstream headers for that alias.

`[[loadbalancing.rule]]` failover

Example:

[[loadbalancing.rule]]
alias = "default"
primary = "openai"

[[loadbalancing.rule.failover]]
backend = "anthropic"
model = "reasoning"
on_status = [429, 500, 502, 503, 504]

Behavior:

for alias default, route to openai first,
if upstream returns any status in on_status,
retry through failover backend/model.

`[[pricing.model]]` metadata

Pricing entries are optional and useful for cost accounting/analytics:

[[pricing.model]]
name = "gpt-4.1"
input_per_mtok_usd = 2.00
cached_input_per_mtok_usd = 0.20
output_per_mtok_usd = 8.00

Environment variables

At minimum, set keys for enabled providers:

export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GEMINI_API_KEY="..." # only if providers.gemini.enabled = true

Useful runtime flags:

export JALLA_DEBUG_PRETTY_RESPONSE=1
export RUST_LOG="jalla::request=debug,jalla::response=debug,jalla::error=debug,jalla::proxy=debug,jalla::auth=debug,tower_http=debug"

Then run:

cargo run

How model routing works

Given this request model:

model = "default"

Jalla resolves:

default -> [[model]] entry
entry selects provider = "openai" and name = "gpt-4.1"
if request fails with configured failover status, use loadbalancing rule
route to fallback model = "reasoning" on anthropic

Common configuration patterns

Pattern 1: Single provider, multiple aliases

Use only [providers.openai] and define aliases like default, fast, reasoning.

Pattern 2: Multi-provider reliability

Keep two providers enabled and add [[loadbalancing.rule]] for critical aliases.

Pattern 3: Local-first development

Start with:

require_auth = false
backend = "memory"
one provider enabled

Then add auth and persistent store before production.

Debugging configuration issues

Provider key errors

Confirm the provider is enabled = true
Confirm api_key_env exists in your shell

Alias not found

Confirm model alias in request matches [[model]].alias
Confirm no typo/case mismatch

Failover not triggering

Confirm there is a [[loadbalancing.rule]] for that alias
Confirm returned status is listed in on_status

Development

Run:

cargo test

Run with debug logs:

JALLA_DEBUG_PRETTY_RESPONSE=1 \
RUST_LOG=jalla::request=debug,jalla::response=debug,jalla::error=debug,jalla::proxy=debug,jalla::auth=debug,tower_http=debug \
cargo run

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cmd/server		cmd/server
internal		internal
tests/e2e		tests/e2e
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
README_GO.md		README_GO.md
config.example.toml		config.example.toml
config.toml		config.toml
e2e.test		e2e.test
go.mod		go.mod
go.sum		go.sum
grafana.json		grafana.json
research.md		research.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jalla

Contents

Architecture at a glance

Quickstart

Requirements

1) Configure API keys

2) Start Jalla

3) Run tests

Configuration model

Full config walkthrough (`config.toml` and `config.example.toml`)

`[server]`

`[store]`

`[providers.<name>]`

`[[model]]` aliases

`[[loadbalancing.rule]]` failover

`[[pricing.model]]` metadata

Environment variables

How model routing works

Common configuration patterns

Pattern 1: Single provider, multiple aliases

Pattern 2: Multi-provider reliability

Pattern 3: Local-first development

Debugging configuration issues

Provider key errors

Alias not found

Failover not triggering

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Jalla

Contents

Architecture at a glance

Quickstart

Requirements

1) Configure API keys

2) Start Jalla

3) Run tests

Configuration model

Full config walkthrough (config.toml and config.example.toml)

[server]

[store]

[providers.<name>]

[[model]] aliases

[[loadbalancing.rule]] failover

[[pricing.model]] metadata

Environment variables

How model routing works

Common configuration patterns

Pattern 1: Single provider, multiple aliases

Pattern 2: Multi-provider reliability

Pattern 3: Local-first development

Debugging configuration issues

Provider key errors

Alias not found

Failover not triggering

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Full config walkthrough (`config.toml` and `config.example.toml`)

`[server]`

`[store]`

`[providers.<name>]`

`[[model]]` aliases

`[[loadbalancing.rule]]` failover

`[[pricing.model]]` metadata

Packages