Skip to content

PandelisZ/jalla

Repository files navigation

Jalla

Jalla (Just Another Large Language Agregator) is a responses middleware transformer with full responses coverage.

It provides a single endpoint layer that can:

  • route requests to multiple upstream providers,
  • normalize behavior across provider APIs,
  • fail over between backends,
  • expose streaming and websocket flows.

Contents


Architecture at a glance

Jalla has three key configuration layers:

  1. Providers: upstream APIs (OpenAI, Anthropic, Gemini, etc.)
  2. Models: your public aliases mapped to provider-native model names
  3. Load balancing rules: optional failover behavior for aliases

At runtime:

  • a client calls Jalla using a model alias (default, fast, reasoning, etc.),
  • Jalla resolves alias -> provider model,
  • Jalla executes the upstream request,
  • optional failover rules trigger on configured status codes.

Quickstart

Requirements

  • Rust toolchain (edition 2024 compatible)

1) Configure API keys

export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."

If a provider is enabled in config.toml, its api_key_env variable must be set.

2) Start Jalla

cargo run

Default bind address from the example config:

  • 0.0.0.0:8787

3) Run tests

cargo test

Configuration model

Jalla uses config.toml as the primary declarative config.

Top-level sections:

  • [server]: bind/auth settings
  • [store]: response storage backend
  • [providers.*]: provider connection settings
  • [[model]]: alias registry
  • [[loadbalancing.rule]]: failover behavior
  • [[pricing.model]]: optional pricing metadata

Full config walkthrough (config.toml and config.example.toml)

The repository includes:

  • config.toml: minimal local defaults for quick startup
  • config.example.toml: full OSS template for real deployments

To bootstrap a full config:

cp config.example.toml config.toml

[server]

[server]
bind_addr = "0.0.0.0:8787"
require_auth = true
  • bind_addr: HTTP bind address and port.
  • require_auth: whether requests must be authenticated by Jalla.

[store]

[store]
backend = "memory"
# backend = "valkey"
# url = "redis://127.0.0.1:6379"
# ttl_secs = 3600
# key_prefix = "jalla:resp:"
  • backend = "memory" is the easiest local mode.
  • For persistence, use backend = "valkey" and set url.
  • ttl_secs and key_prefix are optional tuning knobs.

[providers.<name>]

Example:

[providers.openai]
enabled = true
base_url = "https://api.openai.com"
api_key_env = "OPENAI_API_KEY"
responses_path = "/v1/responses"
chat_completions_path = "/v1/chat/completions"

Provider fields:

  • enabled: whether provider is available for routing.
  • base_url: upstream provider base URL.
  • api_key_env: env var used for provider API key.
  • provider-specific endpoint paths:
    • OpenAI: responses_path, chat_completions_path
    • Anthropic: messages_path
    • Gemini: generate_content_path

[[model]] aliases

Example:

[[model]]
name = "gpt-4.1"
alias = "default"
provider = "openai"

Meaning:

  • alias is what your clients send.
  • name is the upstream provider model identifier.
  • provider must match an enabled provider section.

Optional model fields:

  • chat_completions = true: force chat-completions compatibility flow.
  • defaults like temperature/top_p/max output may be set per model when supported.
  • headers = { ... }: static upstream headers for that alias.

[[loadbalancing.rule]] failover

Example:

[[loadbalancing.rule]]
alias = "default"
primary = "openai"

[[loadbalancing.rule.failover]]
backend = "anthropic"
model = "reasoning"
on_status = [429, 500, 502, 503, 504]

Behavior:

  • for alias default, route to openai first,
  • if upstream returns any status in on_status,
  • retry through failover backend/model.

[[pricing.model]] metadata

Pricing entries are optional and useful for cost accounting/analytics:

[[pricing.model]]
name = "gpt-4.1"
input_per_mtok_usd = 2.00
cached_input_per_mtok_usd = 0.20
output_per_mtok_usd = 8.00

Environment variables

At minimum, set keys for enabled providers:

export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GEMINI_API_KEY="..." # only if providers.gemini.enabled = true

Useful runtime flags:

export JALLA_DEBUG_PRETTY_RESPONSE=1
export RUST_LOG="jalla::request=debug,jalla::response=debug,jalla::error=debug,jalla::proxy=debug,jalla::auth=debug,tower_http=debug"

Then run:

cargo run

How model routing works

Given this request model:

  • model = "default"

Jalla resolves:

  1. default -> [[model]] entry
  2. entry selects provider = "openai" and name = "gpt-4.1"
  3. if request fails with configured failover status, use loadbalancing rule
  4. route to fallback model = "reasoning" on anthropic

Common configuration patterns

Pattern 1: Single provider, multiple aliases

Use only [providers.openai] and define aliases like default, fast, reasoning.

Pattern 2: Multi-provider reliability

Keep two providers enabled and add [[loadbalancing.rule]] for critical aliases.

Pattern 3: Local-first development

Start with:

  • require_auth = false
  • backend = "memory"
  • one provider enabled

Then add auth and persistent store before production.


Debugging configuration issues

Provider key errors

  • Confirm the provider is enabled = true
  • Confirm api_key_env exists in your shell

Alias not found

  • Confirm model alias in request matches [[model]].alias
  • Confirm no typo/case mismatch

Failover not triggering

  • Confirm there is a [[loadbalancing.rule]] for that alias
  • Confirm returned status is listed in on_status

Development

Run:

cargo test

Run with debug logs:

JALLA_DEBUG_PRETTY_RESPONSE=1 \
RUST_LOG=jalla::request=debug,jalla::response=debug,jalla::error=debug,jalla::proxy=debug,jalla::auth=debug,tower_http=debug \
cargo run

About

Jalla - 100% OpenAI /responses compatible middleware for Anthropic, OpenAI, and Gemini with first-party SDKs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages