Cortex provides OpenAI-compatible and Anthropic-compatible endpoints for unified access to multiple AI providers.
- Authentication
- OpenAI-Compatible Endpoints
- Anthropic-Compatible Endpoints
- Provider Management Endpoints
- OAuth Endpoints
- Admin Monitoring Endpoints
- Runtime Configuration Endpoints
- Usage Tracking Endpoints
- Error Handling
All API requests require authentication using an API key. Include your API key in the request headers:
OpenAI-Compatible Endpoints:
Authorization: Bearer YOUR_API_KEY
Anthropic-Compatible Endpoints:
x-api-key: YOUR_API_KEY
API keys are configured in cortex.yaml and can have specific permissions, rate limits, and access controls.
Create a chat completion using the OpenAI-compatible API.
Request:
curl -X POST http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer cortex-dev-key-001" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"max_tokens": 100,
"temperature": 0.7
}'Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model ID or alias (e.g., "gpt-4o", "claude-3-5-sonnet-20241022") |
messages |
array | Yes | Array of message objects with role and content |
max_tokens |
integer | No | Maximum tokens to generate |
temperature |
float | No | Sampling temperature (0.0 to 2.0) |
stream |
boolean | No | Enable streaming responses |
tools |
array | No | Array of tool definitions for function calling |
tool_choice |
string/object | No | Control tool usage ("auto", "none", or specific tool) |
Response:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1699472000,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 8,
"total_tokens": 23
}
}Response Fields:
| Field | Type | Description |
|---|---|---|
id |
string | Unique completion ID |
object |
string | Object type (always "chat.completion") |
created |
integer | Unix timestamp of creation |
model |
string | Model used for completion |
choices |
array | Array of completion choices |
choices[].index |
integer | Choice index |
choices[].message |
object | Generated message with role and content |
choices[].finish_reason |
string | Reason completion finished ("stop", "length", "tool_calls") |
usage |
object | Token usage statistics |
usage.prompt_tokens |
integer | Tokens in the prompt |
usage.completion_tokens |
integer | Tokens in the completion |
usage.total_tokens |
integer | Total tokens used |
List available models accessible through the OpenAI-compatible API.
Request:
curl http://localhost:8090/v1/models \
-H "Authorization: Bearer cortex-dev-key-001"Response:
{
"object": "list",
"data": [
{
"id": "gpt-4",
"object": "model",
"owned_by": "cortex"
},
{
"id": "gpt-3.5-turbo",
"object": "model",
"owned_by": "cortex"
},
{
"id": "claude-3-opus",
"object": "model",
"owned_by": "cortex"
}
]
}Create embeddings for text using the OpenAI-compatible API. Supports both single string and array inputs.
Request:
curl -X POST http://localhost:8090/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer cortex-dev-key-001" \
-d '{
"model": "openai:text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}'Request with Array Input:
curl -X POST http://localhost:8090/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer cortex-dev-key-001" \
-d '{
"model": "openai:text-embedding-3-small",
"input": ["First sentence", "Second sentence", "Third sentence"]
}'Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model ID with optional provider prefix (e.g., "openai:text-embedding-3-small" or "text-embedding-3-small") |
input |
string/array | Yes | Text to embed. Can be a single string, an array of strings, or an array of token arrays |
encoding_format |
string | No | Encoding format for returned embeddings: "float" (default) or "base64" |
dimensions |
integer | No | Number of output dimensions (for models that support flexible dimensionality) |
user |
string | No | End-user identifier for abuse detection |
Response:
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.0023064255, -0.009327292, -0.0028842222, ...],
"index": 0
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 9,
"total_tokens": 9
}
}Response Fields:
| Field | Type | Description |
|---|---|---|
object |
string | Object type (always "list") |
data |
array | Array of embedding objects |
data[].object |
string | Object type (always "embedding") |
data[].embedding |
array | The embedding vector (array of floats) |
data[].index |
integer | Index of the input item corresponding to this embedding |
model |
string | Model used for generating embeddings |
usage |
object | Token usage statistics |
usage.prompt_tokens |
integer | Number of tokens in the input |
usage.total_tokens |
integer | Total tokens used (same as prompt_tokens for embeddings) |
Supported Providers:
The embeddings endpoint is available for the following provider types:
| Provider | Support |
|---|---|
| OpenAI | ✅ Full support |
| Mistral | ✅ Full support |
| Azure | ✅ Full support |
| Together | ✅ Full support |
| Fireworks | ✅ Full support |
| Groq | ✅ Full support |
| DeepSeek | ✅ Full support |
| Custom | ✅ Full support (OpenAI-compatible) |
| Anthropic | ❌ Not supported |
Virtual Model Requirements:
Virtual models are supported for embeddings only when they have exactly one enabled candidate. This prevents mixing embeddings from different models (which would produce incompatible vector spaces). If a virtual model has zero or multiple enabled candidates, the request will return a 400 Bad Request error.
Error Responses:
| Status | Error | Description |
|---|---|---|
| 400 | invalid_request_error |
Empty model, empty input, multi-candidate virtual model |
| 401 | authentication_error |
Missing or invalid API key |
| 404 | invalid_request_error |
Model not found or unknown provider |
| 429 | rate_limit_error |
Rate limit exceeded |
Transcribe audio into text using the OpenAI-compatible API. Supports multiple audio formats and streaming output.
Request:
curl -X POST http://localhost:8090/v1/audio/transcriptions \
-H "Authorization: Bearer cortex-dev-key-001" \
-F "file=@audio.mp3" \
-F "model=openai:whisper-1"Request Fields (multipart/form-data):
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model ID with optional provider prefix (e.g., "openai:whisper-1", "groq:whisper-large-v3") |
file |
file | Yes | Audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, flac, ogg, webm |
language |
string | No | ISO-639-1 language code (e.g., "en") to improve accuracy and latency |
prompt |
string | No | Optional text to guide transcription style or continue a previous segment |
response_format |
string | No | Output format: "json" (default), "text", "srt", "vtt", "verbose_json", "diarized_json" |
temperature |
float | No | Sampling temperature (0.0 to 1.0). Higher values increase randomness |
stream |
boolean | No | Enable SSE streaming output (default: false) |
include[] |
array | No | Additional fields to include (e.g., "logprobs" for gpt-4o-transcribe models) |
timestamp_granularities[] |
array | No | Timestamp granularities for verbose_json: "word", "segment" |
chunking_strategy |
string | No | Chunking strategy for diarization: "auto" or JSON object |
known_speaker_names[] |
array | No | Known speaker names for diarization (max 4) |
known_speaker_references[] |
array | No | Base64 audio samples for speaker identification (max 4) |
Response (json format):
{
"text": "Hello, this is a transcription of the audio file."
}Response (verbose_json format):
{
"task": "transcribe",
"language": "en",
"duration": 12.5,
"text": "Hello, this is a transcription.",
"words": [
{"word": "Hello", "start": 0.0, "end": 0.5},
{"word": "this", "start": 0.6, "end": 0.8}
],
"segments": [
{"id": 0, "start": 0.0, "end": 2.0, "text": "Hello, this is a transcription."}
]
}Response (diarized_json format for speaker identification):
{
"text": "Speaker A: Hello. Speaker B: Hi there.",
"segments": [
{
"type": "transcript.text.segment",
"id": "seg_001",
"start": 0.0,
"end": 1.5,
"text": "Hello.",
"speaker": "A"
},
{
"type": "transcript.text.segment",
"id": "seg_002",
"start": 1.8,
"end": 3.0,
"text": "Hi there.",
"speaker": "B"
}
],
"usage": {
"type": "duration",
"seconds": 3.5
}
}Streaming Response (stream=true):
When stream=true, the response is returned as SSE events:
event: transcript.text.delta
data: {"type": "transcript.text.delta", "text": "Hello"}
event: transcript.text.delta
data: {"type": "transcript.text.delta", "text": " world"}
event: transcript.text.done
data: {"type": "transcript.text.done", "text": "Hello world"}
Response Fields:
| Field | Type | Description |
|---|---|---|
text |
string | The transcribed text |
usage |
object | Usage statistics (optional, for gpt-4o-transcribe models) |
usage.type |
string | Usage type: "tokens" or "duration" |
usage.input_tokens |
integer | Input tokens (for token-based usage) |
usage.output_tokens |
integer | Output tokens (for token-based usage) |
usage.total_tokens |
integer | Total tokens used |
usage.seconds |
float | Duration in seconds (for duration-based usage) |
logprobs |
array | Token log probabilities (when include[] contains "logprobs") |
Supported Providers:
| Provider | Support | Models |
|---|---|---|
| OpenAI | ✅ Full support | whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize |
| Groq | ✅ Full support | whisper-1, whisper-large-v3 |
| Azure OpenAI | ✅ Full support | whisper-1 |
| Custom | ✅ OpenAI-compatible | Any OpenAI-compatible transcription endpoint |
| Anthropic | ❌ Not supported | - |
File Size Limits:
- Maximum audio file size: 25 MB
- Files exceeding this limit will return a
400 Bad Requesterror
Error Responses:
| Status | Error | Description |
|---|---|---|
| 400 | invalid_request_error |
Missing model/file, file too large, unsupported provider |
| 401 | authentication_error |
Missing or invalid API key |
| 404 | invalid_request_error |
Unknown provider or model |
| 429 | rate_limit_error |
Rate limit exceeded |
| 502 | server_error |
Upstream provider error |
Create a message using the Anthropic-compatible API.
Request:
curl -X POST http://localhost:8090/anthropic/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: cortex-dev-key-001" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model ID or alias (e.g., "claude-3-5-sonnet-20241022") |
messages |
array | Yes | Array of message objects with role and content |
max_tokens |
integer | Yes | Maximum tokens to generate |
temperature |
float | No | Sampling temperature (0.0 to 1.0) |
system |
string | No | System prompt to set context |
stream |
boolean | No | Enable streaming responses |
tools |
array | No | Array of tool definitions |
metadata |
object | No | Metadata for the request |
Response:
{
"id": "msg_01ABC123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "The capital of France is Paris."
}
],
"model": "claude-3-5-sonnet-20241022",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 15,
"output_tokens": 8
}
}Response Fields:
| Field | Type | Description |
|---|---|---|
id |
string | Unique message ID |
type |
string | Object type (always "message") |
role |
string | Message role (always "assistant") |
content |
array | Array of content blocks |
content[].type |
string | Content type ("text" or "tool_use") |
content[].text |
string | Text content (for text blocks) |
model |
string | Model used for generation |
stop_reason |
string | Reason generation stopped ("end_turn", "max_tokens", "tool_use") |
usage |
object | Token usage statistics |
usage.input_tokens |
integer | Tokens in the input |
usage.output_tokens |
integer | Tokens in the output |
List available Anthropic models.
Request:
curl http://localhost:8090/anthropic/v1/models \
-H "x-api-key: cortex-dev-key-001"Response:
{
"data": [
{
"id": "claude-3-5-sonnet-20241022",
"display_name": "Claude 3.5 Sonnet",
"created_at": "2024-10-22T00:00:00Z",
"type": "model"
},
{
"id": "claude-3-5-haiku-20241022",
"display_name": "Claude 3.5 Haiku",
"created_at": "2024-10-22T00:00:00Z",
"type": "model"
}
]
}List available models for a specific provider.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Provider name (e.g., "openai", "anthropic", "groq") |
Request:
curl http://localhost:8090/api/providers/openai/modelsResponse:
{
"provider": "openai",
"models": [
{
"id": "gpt-4o",
"display_name": "GPT-4 Optimized",
"enabled": true
},
{
"id": "gpt-4o-mini",
"display_name": "GPT-4 Optimized Mini",
"enabled": true
},
{
"id": "gpt-3.5-turbo",
"display_name": "GPT-3.5 Turbo",
"enabled": true
}
],
"default_model": "gpt-4o"
}Response Fields:
| Field | Type | Description |
|---|---|---|
provider |
string | Provider name |
models |
array | Array of available models |
models[].id |
string | Model ID with provider-specific routing pattern applied |
models[].display_name |
string | Human-readable model name |
models[].enabled |
boolean | Whether the model is currently enabled |
default_model |
string | Default model ID for this provider (from config or first available) |
Status Codes:
| Code | Description |
|---|---|
| 200 | Success |
| 400 | Bad Request - Provider name is required |
| 404 | Not Found - Provider not found |
| 500 | Internal Server Error - Failed to retrieve models |
Example with Provider-Specific Prefixes:
# Groq provider applies 'groq/' prefix
curl http://localhost:8090/api/providers/groq/models{
"provider": "groq",
"models": [
{
"id": "groq/mixtral-8x7b-32768",
"display_name": "Mixtral 8x7B",
"enabled": true
},
{
"id": "groq/llama-3.1-70b-versatile",
"display_name": "LLaMA 3.1 70B",
"enabled": true
}
],
"default_model": "groq/mixtral-8x7b-32768"
}Notes:
- Model IDs include provider-specific routing patterns (prefixes) as configured in
provider_patterns - The
default_modelis taken from the provider configuration, or defaults to the first available model - All models returned are considered enabled and available for use
- Model display names come from the provider's API
Get authentication information and status for a specific provider.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Provider name (e.g., "openai", "anthropic") |
Request:
curl http://localhost:8090/api/providers/openai/authResponse:
{
"provider": "openai",
"auth_method": "api_key",
"api_keys": [
{
"masked": "sk-p****...****h7YZ",
"index": 0
}
],
"oauth": {
"configured": false,
"authenticated": false
}
}Response Fields:
| Field | Type | Description |
|---|---|---|
provider |
string | Provider name |
auth_method |
string | Authentication method ("api_key", "oauth", or "auto") |
api_keys |
array | Array of masked API keys (if using API key auth) |
api_keys[].masked |
string | Masked API key showing first and last 4 characters |
api_keys[].index |
integer | Index of the API key |
oauth |
object | OAuth configuration status |
oauth.configured |
boolean | Whether OAuth is configured for this provider |
oauth.authenticated |
boolean | Whether OAuth token is valid and authenticated |
Status Codes:
| Code | Description |
|---|---|
| 200 | Success |
| 400 | Bad Request - Provider name is required |
| 404 | Not Found - Provider not found or not configured |
OAuth endpoints manage OAuth 2.0 authentication for supported providers. See the OAuth Authentication Guide for complete documentation.
GET /oauth/{provider}/authorize
Initiates the OAuth 2.0 authorization flow by redirecting the user to the provider's authorization page.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Provider name (e.g., "google", "anthropic") |
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
redirect_url |
string | No | URL to redirect to after successful authorization |
Example:
# Open in browser to start OAuth flow
http://localhost:8090/oauth/google/authorize
# With custom redirect URL
http://localhost:8090/oauth/google/authorize?redirect_url=http://localhost:3000/successResponse:
Redirects to the provider's authorization page (HTTP 302 redirect).
Notes:
- User must complete authorization in browser
- Provider will redirect back to
/oauth/{provider}/callbackafter authorization - State token is automatically generated for CSRF protection
- PKCE code challenge is automatically generated for security
GET /oauth/{provider}/callback
Handles the OAuth callback from the provider. This endpoint is automatically called by the OAuth provider after user authorization.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Provider name (e.g., "google", "anthropic") |
Query Parameters (from provider):
| Parameter | Type | Description |
|---|---|---|
code |
string | Authorization code from provider |
state |
string | State token for CSRF validation |
error |
string | Error code (if authorization failed) |
error_description |
string | Error description (if authorization failed) |
Example Callback URL:
http://localhost:8090/oauth/google/callback?code=AUTH_CODE&state=STATE_TOKEN
Success Response:
{
"success": true,
"provider": "google",
"expires_at": "2024-01-15T10:30:00Z"
}Response Fields:
| Field | Type | Description |
|---|---|---|
success |
boolean | Whether authentication was successful |
provider |
string | Provider name |
expires_at |
string | Token expiration time (ISO 8601 format) |
Error Response:
HTTP 400 Bad Request
OAuth error: access_denied - User denied access
Notes:
- Automatically exchanges authorization code for access token
- Stores encrypted token in configured storage backend
- Validates state token to prevent CSRF attacks
- Verifies PKCE code verifier
GET /oauth/{provider}/status
Returns the current OAuth authentication status for a provider.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Provider name (e.g., "google", "anthropic") |
Example:
curl http://localhost:8090/oauth/google/statusResponse:
{
"provider": "google",
"configured": true,
"authenticated": true,
"expires_at": "2024-01-15T10:30:00Z",
"scopes": [
"https://www.googleapis.com/auth/generative-language"
]
}Response Fields:
| Field | Type | Description |
|---|---|---|
provider |
string | Provider name |
configured |
boolean | Whether OAuth is configured for this provider |
authenticated |
boolean | Whether a valid OAuth token exists |
expires_at |
string | Token expiration time (ISO 8601 format, omitted if not authenticated) |
scopes |
array | OAuth scopes granted (omitted if not authenticated) |
Status Codes:
| Code | Description |
|---|---|
| 200 | Success |
| 405 | Method not allowed (use GET) |
Example - Not Configured:
{
"provider": "anthropic",
"configured": false,
"authenticated": false
}Example - Configured but Not Authenticated:
{
"provider": "openai",
"configured": true,
"authenticated": false
}Example - OpenRouter (Permanent Token):
{
"provider": "openrouter",
"configured": true,
"authenticated": true,
"token_type": "permanent"
}Note: OpenRouter returns a permanent API key that never expires, so expires_at is omitted.
POST /oauth/{provider}/refresh
Manually forces an OAuth token refresh. Normally, tokens are automatically refreshed before expiration.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Provider name (e.g., "google", "anthropic") |
Example:
curl -X POST http://localhost:8090/oauth/google/refreshSuccess Response:
{
"success": true,
"provider": "google",
"expires_at": "2024-01-15T11:30:00Z"
}Response Fields:
| Field | Type | Description |
|---|---|---|
success |
boolean | Whether refresh was successful |
provider |
string | Provider name |
expires_at |
string | New token expiration time (ISO 8601 format) |
Error Response:
{
"error": "Failed to refresh token: refresh token expired"
}Status Codes:
| Code | Description |
|---|---|
| 200 | Success - token refreshed |
| 400 | Bad Request - OAuth not configured or no refresh token |
| 405 | Method not allowed (use POST) |
| 500 | Internal Server Error - refresh failed |
Notes:
- Requires a valid refresh token to be stored
- Updates the stored token with new access token
- Refresh token may also be rotated (provider-dependent)
- Useful for testing token refresh logic
- Not applicable to OpenRouter (permanent tokens don't need refresh)
POST /oauth/qwen/device
Initiates the Device Code flow for Qwen authentication (RFC 8628).
Example:
curl -X POST http://localhost:8090/oauth/qwen/deviceSuccess Response:
{
"device_code": "abc123...",
"user_code": "ABCD-1234",
"verification_uri": "https://login.aliyun.com/oauth/device",
"verification_uri_complete": "https://login.aliyun.com/oauth/device?user_code=ABCD-1234",
"expires_in": 900,
"interval": 5
}Response Fields:
| Field | Type | Description |
|---|---|---|
device_code |
string | Device code for polling |
user_code |
string | User code (not needed - auto-handled) |
verification_uri |
string | URL for user to visit |
verification_uri_complete |
string | Complete URL with code pre-filled |
expires_in |
integer | How long the device code is valid (seconds) |
interval |
integer | How often to poll for token (seconds) |
Status Codes:
| Code | Description |
|---|---|
| 200 | Success - device code issued |
| 400 | Bad Request - OAuth not configured |
| 405 | Method not allowed (use POST) |
| 500 | Internal Server Error - request failed |
Notes:
- Browser is automatically opened to
verification_uri_complete - User simply clicks "Authorize" - no code entry needed
- Polling for token happens automatically
- Different UX than Authorization Code flow
POST /oauth/qwen/device/poll
Polls for the access token after device authorization. This endpoint is called automatically by the system and is not intended for direct use.
Note: This is an internal endpoint that handles the polling loop for Device Code flow.
DELETE /oauth/{provider}/token
Deletes the stored OAuth token for a provider. This removes the token from local storage but does not revoke it with the provider.
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Provider name (e.g., "google", "anthropic") |
Example:
curl -X DELETE http://localhost:8090/oauth/google/tokenSuccess Response:
{
"success": true,
"provider": "google",
"message": "Token revoked successfully"
}Response Fields:
| Field | Type | Description |
|---|---|---|
success |
boolean | Whether revocation was successful |
provider |
string | Provider name |
message |
string | Success message |
Error Response:
HTTP 500 Internal Server Error
Failed to delete token: token not found for provider: google
Status Codes:
| Code | Description |
|---|---|
| 200 | Success - token deleted |
| 405 | Method not allowed (use DELETE) |
| 500 | Internal Server Error - deletion failed |
Notes:
- Only deletes the token from local storage
- Does NOT revoke the token with the OAuth provider
- User will need to re-authenticate to use OAuth again
- To fully revoke access, revoke the token in the provider's console
All OAuth endpoints implement security best practices:
PKCE (Proof Key for Code Exchange)
- Uses SHA-256 code challenge method
- Protects against authorization code interception
- Code verifier never transmitted until token exchange
State Tokens
- 32-byte random state tokens
- Validates state on callback
- Prevents CSRF attacks
- Expires after 10 minutes
Encrypted Token Storage
- AES-256-GCM encryption
- Unique nonce per encryption
- Configurable encryption key
- Secure file permissions (0600)
Automatic Token Refresh
- Refreshes before expiration
- Uses stored refresh token
- Transparent to API clients
- Logs refresh operations
For detailed OAuth documentation, see the OAuth Authentication Guide.
Retrieve a real-time snapshot of currently in-flight inference requests being processed by the server. This endpoint provides visibility into active /v1/* requests (chat completions, embeddings, etc.) and is useful for monitoring and debugging purposes.
Request:
curl http://localhost:8090/api/admin/inflight \
-H "Authorization: Bearer cortex-dev-key-001"Response:
{
"count": 2,
"requests": [
{
"id": 42,
"method": "POST",
"uri": "/v1/chat/completions",
"elapsed_secs": 15,
"idle_secs": 3,
"model": "gpt-4o",
"provider": "openai",
"api_key_name": "Development Key"
},
{
"id": 43,
"method": "POST",
"uri": "/v1/chat/completions",
"elapsed_secs": 8,
"idle_secs": 0,
"model": "claude-3-5-sonnet-20241022",
"provider": "anthropic",
"api_key_name": "Production Key"
}
]
}Response Fields:
| Field | Type | Description |
|---|---|---|
count |
integer | Total number of in-flight inference requests |
requests |
array | Array of in-flight request snapshots |
requests[].id |
integer | Unique monotonic request ID assigned by the in-flight middleware |
requests[].method |
string | HTTP method (e.g., "POST", "GET") |
requests[].uri |
string | Request URI (e.g., "/v1/chat/completions") |
requests[].elapsed_secs |
integer | Seconds since the request was registered |
requests[].idle_secs |
integer | Seconds since the last body chunk was produced (0 for non-streaming requests) |
requests[].model |
string | Model being used (populated after dispatch resolution) |
requests[].provider |
string | Provider name (populated after dispatch resolution) |
requests[].api_key_name |
string | API key name (populated after auth resolution) |
Notes:
- Only
/v1/*paths (inference endpoints) are included in the response - Admin, health, OAuth, and other internal endpoints are excluded from the display
- The
idle_secsfield is useful for detecting stalled streaming requests - Fields like
model,provider, andapi_key_namemay benullif the request has not yet completed the dispatch/auth resolution phase
Get the caching configuration for prompt/response caching features.
Request:
curl http://localhost:8090/api/config/caching \
-H "Authorization: Bearer cortex-dev-key-001"Response:
{
"auto_inject_anthropic_cache_control": true,
"response_cache_enabled": false,
"response_cache_ttl_secs": 300
}Response Fields:
| Field | Type | Description |
|---|---|---|
auto_inject_anthropic_cache_control |
boolean | Automatically inject cache control headers for Anthropic API requests |
response_cache_enabled |
boolean | Enable caching of API responses |
response_cache_ttl_secs |
integer | Time-to-live for cached responses in seconds |
Update the caching configuration.
Request:
curl -X PUT http://localhost:8090/api/config/caching \
-H "Content-Type: application/json" \
-H "Authorization: Bearer cortex-dev-key-001" \
-d '{
"auto_inject_anthropic_cache_control": true,
"response_cache_enabled": true,
"response_cache_ttl_secs": 600
}'Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
auto_inject_anthropic_cache_control |
boolean | No | Auto-inject cache control for Anthropic |
response_cache_enabled |
boolean | No | Enable response caching |
response_cache_ttl_secs |
integer | No | Cache TTL in seconds (must be positive) |
Response:
Returns the updated configuration (same format as GET).
Get the token refresh configuration for OAuth token management.
Request:
curl http://localhost:8090/api/config/token-refresh \
-H "Authorization: Bearer cortex-dev-key-001"Response:
{
"enabled": true,
"refresh_threshold_secs": 300,
"check_interval_secs": 60
}Response Fields:
| Field | Type | Description |
|---|---|---|
enabled |
boolean | Whether background token refresh is enabled |
refresh_threshold_secs |
integer | Seconds before token expiry to trigger refresh |
check_interval_secs |
integer | Interval in seconds between background refresh checks |
Update the token refresh configuration. Changes take effect immediately without requiring a server restart.
Request:
curl -X PUT http://localhost:8090/api/config/token-refresh \
-H "Content-Type: application/json" \
-H "Authorization: Bearer cortex-dev-key-001" \
-d '{
"enabled": true,
"refresh_threshold_secs": 300,
"check_interval_secs": 60
}'Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
enabled |
boolean | No | Enable/disable background token refresh |
refresh_threshold_secs |
integer | No | Seconds before expiry to trigger refresh |
check_interval_secs |
integer | No | Interval between refresh checks in seconds |
Response:
Returns the updated configuration (same format as GET).
Notes:
- Disabling stops the background refresh task on its next cycle
refresh_threshold_secsis used in the next refresh cycle to determine which tokens need refreshcheck_interval_secsis used for the next sleep interval between refresh cycles
Retrieve usage records with optional filtering.
Request:
curl "http://localhost:8090/usage?api_key=Development%20Key&limit=10" \
-H "Authorization: Bearer cortex-dev-key-001"Query Parameters:
| Parameter | Type | Description |
|---|---|---|
api_key |
string | Filter by API key name |
provider |
string | Filter by provider (e.g., "openai", "anthropic") |
model |
string | Filter by model ID |
start_time |
string | Filter by start time (RFC3339 format) |
end_time |
string | Filter by end time (RFC3339 format) |
limit |
integer | Maximum number of records to return (default: 100) |
Response:
[
{
"request_id": "req_123",
"timestamp": "2024-11-29T10:30:00Z",
"api_key_name": "Development Key",
"model": "gpt-4o",
"provider": "openai",
"endpoint": "/v1/chat/completions",
"input_tokens": 15,
"output_tokens": 8,
"total_tokens": 23,
"latency_ms": 1250,
"status": "success"
}
]Record Fields:
| Field | Type | Description |
|---|---|---|
request_id |
string | Unique request identifier |
timestamp |
string | When the request occurred (ISO 8601) |
api_key_name |
string | Name of the API key used |
model |
string | Model used |
provider |
string | Provider used |
endpoint |
string | API endpoint called |
input_tokens |
integer | Input tokens consumed |
output_tokens |
integer | Output tokens generated |
total_tokens |
integer | Total tokens used |
latency_ms |
integer | Request latency in milliseconds |
status |
string | Request status ("success" or "error") |
error_code |
string | Error code (if status is "error") |
Retrieve aggregated usage statistics.
Request:
curl "http://localhost:8090/usage/summary?start_time=2024-11-01T00:00:00Z" \
-H "Authorization: Bearer cortex-dev-key-001"Query Parameters:
| Parameter | Type | Description |
|---|---|---|
api_key |
string | Filter by API key name |
provider |
string | Filter by provider |
model |
string | Filter by model ID |
start_time |
string | Filter by start time (RFC3339 format) |
end_time |
string | Filter by end time (RFC3339 format) |
Response:
{
"total_requests": 1250,
"total_tokens": 156000,
"total_input_tokens": 95000,
"total_output_tokens": 61000,
"avg_latency_ms": 1320.5,
"by_provider": {
"openai": 750,
"anthropic": 500
},
"by_model": {
"gpt-4o": 500,
"gpt-4o-mini": 250,
"claude-3-5-sonnet-20241022": 500
},
"by_api_key": {
"Development Key": 800,
"Production Key": 450
},
"success_count": 1230,
"error_count": 20
}Summary Fields:
| Field | Type | Description |
|---|---|---|
total_requests |
integer | Total number of requests |
total_tokens |
integer | Total tokens consumed |
total_input_tokens |
integer | Total input tokens |
total_output_tokens |
integer | Total output tokens |
avg_latency_ms |
float | Average latency in milliseconds |
by_provider |
object | Request count by provider |
by_model |
object | Request count by model |
by_api_key |
object | Request count by API key |
success_count |
integer | Number of successful requests |
error_count |
integer | Number of failed requests |
{
"error": {
"message": "Invalid API key",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "Invalid API key"
}
}| Status Code | Description |
|---|---|
| 200 | Success |
| 400 | Bad Request - Invalid request body or parameters |
| 401 | Unauthorized - Missing or invalid API key |
| 403 | Forbidden - API key lacks permission for the requested operation |
| 429 | Too Many Requests - Rate limit exceeded |
| 500 | Internal Server Error - Server-side error |
| Error Type | Description |
|---|---|
invalid_request_error |
The request was malformed or missing required fields |
authentication_error |
Invalid or missing API key |
permission_error |
API key doesn't have permission for the requested resource |
rate_limit_error |
Rate limit exceeded |
model_not_found |
Requested model doesn't exist or isn't accessible |
provider_error |
Error from the underlying AI provider |