Feature Request: Add LLM Concurrency Limit with Customizable Model Grouping

## What problem does this solve?

Many LLM providers enforce concurrency (RPM / concurrent request) limits on their APIs. Currently, Clawith has no built-in mechanism to control the number of simultaneous outgoing LLM requests. This leads to several problems:

1. **No concurrency control** – When multiple Agents are active simultaneously (e.g. via the Aware/Pulse Engine triggers, A2A conversations, or multi-channel IM messages), the system can fire an unbounded number of parallel LLM calls, easily exceeding a provider's rate limit and resulting in HTTP 429 errors or dropped connections.

2. **Shared API key, shared limit** – The current LLM configuration (`PROVIDER_REGISTRY` in `llm_client.py`) is keyed per-provider, not per-API-key. Multiple model entries (e.g. `gpt-4o`, `gpt-4o-mini`, `o1-preview`) may all use the same OpenAI API key and therefore share a single concurrency quota from the provider, but the system treats them independently with no awareness of this shared ceiling.

3. **No way to group models** – There is no facility to declare "these 5 models share a pool of N concurrent requests" or "these two providers should be merged into one limit group because they route through the same gateway."

4. **Silent failures** – When the provider rejects a request due to concurrency overflow, the user sees a generic LLMError rather than a queued/retried response.

## Proposed solution

Introduce a **Concurrency Limit Management** layer with the following capabilities:

### 1. Per-group concurrency semaphores
Allow administrators to define named concurrency groups. Each group holds:
- A list of LLM config IDs (or provider + model patterns) that belong to the group
- A maximum concurrent request count (`max_concurrency`)

Internally this could be backed by `asyncio.Semaphore` instances keyed by group name.

### 2. Default auto-grouping by provider
Out of the box, group all models under the same provider (e.g. all `openai` models → one semaphore). Admins can override this by creating explicit custom groups.

### 3. Custom cross-provider grouping
Support merging models from different providers into a single group. This covers scenarios where:
- A reverse proxy / API gateway fronts multiple providers under one rate limit
- An organization uses a shared quota across multiple API keys
- Specific high-traffic models need a dedicated pool separate from the default

Example configuration concept:
```yaml
concurrency_groups:
  - name: openai-shared
    max_concurrency: 5
    models: ["gpt-4o", "gpt-4o-mini", "o1-preview"]
  - name: deepseek-group
    max_concurrency: 3
    providers: ["deepseek"]
  - name: custom-gateway
    max_concurrency: 10
    models: ["claude-3-opus", "gpt-4o"]   # behind same proxy
```

### 4. Graceful queuing & visibility
- When a group is at capacity, new LLM requests wait in a queue (with configurable timeout) rather than failing immediately
- Expose current concurrency utilization via an admin API endpoint and optionally in the frontend dashboard
- Log warnings when approaching the limit (e.g. 80% utilization)

### 5. Admin UI
Add a concurrency management section in the enterprise settings page to configure groups without editing config files.

## Why this is useful

- **Reliability** – Eliminates HTTP 429 errors from providers by respecting their limits
- **Fairness** – Ensures no single Agent or trigger monopolizes the LLM capacity
- **Flexibility** – Adapts to any deployment topology (single provider, multi-provider, gateway-proxied)
- **Cost control** – Prevents unexpected burst usage that could spike API costs
- **Production readiness** – Essential for multi-tenant SaaS deployments where many Agents run concurrently

## Willing to contribute?

Yes – I would be interested in working on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add LLM Concurrency Limit with Customizable Model Grouping #305

What problem does this solve?

Proposed solution

1. Per-group concurrency semaphores

2. Default auto-grouping by provider

3. Custom cross-provider grouping

4. Graceful queuing & visibility

5. Admin UI

Why this is useful

Willing to contribute?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add LLM Concurrency Limit with Customizable Model Grouping #305

Description

What problem does this solve?

Proposed solution

1. Per-group concurrency semaphores

2. Default auto-grouping by provider

3. Custom cross-provider grouping

4. Graceful queuing & visibility

5. Admin UI

Why this is useful

Willing to contribute?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions