Cortex API Documentation

Cortex provides OpenAI-compatible and Anthropic-compatible endpoints for unified access to multiple AI providers.

Authentication
OpenAI-Compatible Endpoints
Anthropic-Compatible Endpoints
Provider Management Endpoints
OAuth Endpoints
Admin Monitoring Endpoints
Runtime Configuration Endpoints
Usage Tracking Endpoints
Error Handling

Authentication

All API requests require authentication using an API key. Include your API key in the request headers:

OpenAI-Compatible Endpoints:

Authorization: Bearer YOUR_API_KEY

Anthropic-Compatible Endpoints:

x-api-key: YOUR_API_KEY

API keys are configured in cortex.yaml and can have specific permissions, rate limits, and access controls.

OpenAI-Compatible Endpoints

POST /v1/chat/completions

Create a chat completion using the OpenAI-compatible API.

Request:

curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "max_tokens": 100,
    "temperature": 0.7
  }'

Request Body:

Field	Type	Required	Description
`model`	string	Yes	Model ID or alias (e.g., "gpt-4o", "claude-3-5-sonnet-20241022")
`messages`	array	Yes	Array of message objects with `role` and `content`
`max_tokens`	integer	No	Maximum tokens to generate
`temperature`	float	No	Sampling temperature (0.0 to 2.0)
`stream`	boolean	No	Enable streaming responses
`tools`	array	No	Array of tool definitions for function calling
`tool_choice`	string/object	No	Control tool usage ("auto", "none", or specific tool)

Response:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1699472000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 8,
    "total_tokens": 23
  }
}

Response Fields:

Field	Type	Description
`id`	string	Unique completion ID
`object`	string	Object type (always "chat.completion")
`created`	integer	Unix timestamp of creation
`model`	string	Model used for completion
`choices`	array	Array of completion choices
`choices[].index`	integer	Choice index
`choices[].message`	object	Generated message with role and content
`choices[].finish_reason`	string	Reason completion finished ("stop", "length", "tool_calls")
`usage`	object	Token usage statistics
`usage.prompt_tokens`	integer	Tokens in the prompt
`usage.completion_tokens`	integer	Tokens in the completion
`usage.total_tokens`	integer	Total tokens used

GET /v1/models

List available models accessible through the OpenAI-compatible API.

Request:

curl http://localhost:8090/v1/models \
  -H "Authorization: Bearer cortex-dev-key-001"

Response:

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4",
      "object": "model",
      "owned_by": "cortex"
    },
    {
      "id": "gpt-3.5-turbo",
      "object": "model",
      "owned_by": "cortex"
    },
    {
      "id": "claude-3-opus",
      "object": "model",
      "owned_by": "cortex"
    }
  ]
}

POST /v1/embeddings

Create embeddings for text using the OpenAI-compatible API. Supports both single string and array inputs.

Request:

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "model": "openai:text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Request with Array Input:

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "model": "openai:text-embedding-3-small",
    "input": ["First sentence", "Second sentence", "Third sentence"]
  }'

Request Body:

Field	Type	Required	Description
`model`	string	Yes	Model ID with optional provider prefix (e.g., `"openai:text-embedding-3-small"` or `"text-embedding-3-small"`)
`input`	string/array	Yes	Text to embed. Can be a single string, an array of strings, or an array of token arrays
`encoding_format`	string	No	Encoding format for returned embeddings: `"float"` (default) or `"base64"`
`dimensions`	integer	No	Number of output dimensions (for models that support flexible dimensionality)
`user`	string	No	End-user identifier for abuse detection

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0023064255, -0.009327292, -0.0028842222, ...],
      "index": 0
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Response Fields:

Field	Type	Description
`object`	string	Object type (always "list")
`data`	array	Array of embedding objects
`data[].object`	string	Object type (always "embedding")
`data[].embedding`	array	The embedding vector (array of floats)
`data[].index`	integer	Index of the input item corresponding to this embedding
`model`	string	Model used for generating embeddings
`usage`	object	Token usage statistics
`usage.prompt_tokens`	integer	Number of tokens in the input
`usage.total_tokens`	integer	Total tokens used (same as prompt_tokens for embeddings)

Supported Providers:

The embeddings endpoint is available for the following provider types:

Provider	Support
OpenAI	✅ Full support
Mistral	✅ Full support
Azure	✅ Full support
Together	✅ Full support
Fireworks	✅ Full support
Groq	✅ Full support
DeepSeek	✅ Full support
Custom	✅ Full support (OpenAI-compatible)
Anthropic	❌ Not supported

Virtual Model Requirements:

Virtual models are supported for embeddings only when they have exactly one enabled candidate. This prevents mixing embeddings from different models (which would produce incompatible vector spaces). If a virtual model has zero or multiple enabled candidates, the request will return a 400 Bad Request error.

Error Responses:

Status	Error	Description
400	`invalid_request_error`	Empty model, empty input, multi-candidate virtual model
401	`authentication_error`	Missing or invalid API key
404	`invalid_request_error`	Model not found or unknown provider
429	`rate_limit_error`	Rate limit exceeded

POST /v1/audio/transcriptions

Transcribe audio into text using the OpenAI-compatible API. Supports multiple audio formats and streaming output.

Request:

curl -X POST http://localhost:8090/v1/audio/transcriptions \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -F "file=@audio.mp3" \
  -F "model=openai:whisper-1"

Request Fields (multipart/form-data):

Field	Type	Required	Description
`model`	string	Yes	Model ID with optional provider prefix (e.g., `"openai:whisper-1"`, `"groq:whisper-large-v3"`)
`file`	file	Yes	Audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, flac, ogg, webm
`language`	string	No	ISO-639-1 language code (e.g., `"en"`) to improve accuracy and latency
`prompt`	string	No	Optional text to guide transcription style or continue a previous segment
`response_format`	string	No	Output format: `"json"` (default), `"text"`, `"srt"`, `"vtt"`, `"verbose_json"`, `"diarized_json"`
`temperature`	float	No	Sampling temperature (0.0 to 1.0). Higher values increase randomness
`stream`	boolean	No	Enable SSE streaming output (default: false)
`include[]`	array	No	Additional fields to include (e.g., `"logprobs"` for gpt-4o-transcribe models)
`timestamp_granularities[]`	array	No	Timestamp granularities for `verbose_json`: `"word"`, `"segment"`
`chunking_strategy`	string	No	Chunking strategy for diarization: `"auto"` or JSON object
`known_speaker_names[]`	array	No	Known speaker names for diarization (max 4)
`known_speaker_references[]`	array	No	Base64 audio samples for speaker identification (max 4)

Response (json format):

{
  "text": "Hello, this is a transcription of the audio file."
}

Response (verbose_json format):

{
  "task": "transcribe",
  "language": "en",
  "duration": 12.5,
  "text": "Hello, this is a transcription.",
  "words": [
    {"word": "Hello", "start": 0.0, "end": 0.5},
    {"word": "this", "start": 0.6, "end": 0.8}
  ],
  "segments": [
    {"id": 0, "start": 0.0, "end": 2.0, "text": "Hello, this is a transcription."}
  ]
}

Response (diarized_json format for speaker identification):

{
  "text": "Speaker A: Hello. Speaker B: Hi there.",
  "segments": [
    {
      "type": "transcript.text.segment",
      "id": "seg_001",
      "start": 0.0,
      "end": 1.5,
      "text": "Hello.",
      "speaker": "A"
    },
    {
      "type": "transcript.text.segment",
      "id": "seg_002",
      "start": 1.8,
      "end": 3.0,
      "text": "Hi there.",
      "speaker": "B"
    }
  ],
  "usage": {
    "type": "duration",
    "seconds": 3.5
  }
}

Streaming Response (stream=true):

When stream=true, the response is returned as SSE events:

event: transcript.text.delta
data: {"type": "transcript.text.delta", "text": "Hello"}

event: transcript.text.delta
data: {"type": "transcript.text.delta", "text": " world"}

event: transcript.text.done
data: {"type": "transcript.text.done", "text": "Hello world"}

Response Fields:

Field	Type	Description
`text`	string	The transcribed text
`usage`	object	Usage statistics (optional, for gpt-4o-transcribe models)
`usage.type`	string	Usage type: `"tokens"` or `"duration"`
`usage.input_tokens`	integer	Input tokens (for token-based usage)
`usage.output_tokens`	integer	Output tokens (for token-based usage)
`usage.total_tokens`	integer	Total tokens used
`usage.seconds`	float	Duration in seconds (for duration-based usage)
`logprobs`	array	Token log probabilities (when `include[]` contains "logprobs")

Supported Providers:

Provider	Support	Models
OpenAI	✅ Full support	whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize
Groq	✅ Full support	whisper-1, whisper-large-v3
Azure OpenAI	✅ Full support	whisper-1
Custom	✅ OpenAI-compatible	Any OpenAI-compatible transcription endpoint
Anthropic	❌ Not supported	-

File Size Limits:

Maximum audio file size: 25 MB
Files exceeding this limit will return a 400 Bad Request error

Error Responses:

Status	Error	Description
400	`invalid_request_error`	Missing model/file, file too large, unsupported provider
401	`authentication_error`	Missing or invalid API key
404	`invalid_request_error`	Unknown provider or model
429	`rate_limit_error`	Rate limit exceeded
502	`server_error`	Upstream provider error

Anthropic-Compatible Endpoints

POST /anthropic/v1/messages

Create a message using the Anthropic-compatible API.

Request:

curl -X POST http://localhost:8090/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: cortex-dev-key-001" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'

Request Body:

Field	Type	Required	Description
`model`	string	Yes	Model ID or alias (e.g., "claude-3-5-sonnet-20241022")
`messages`	array	Yes	Array of message objects with `role` and `content`
`max_tokens`	integer	Yes	Maximum tokens to generate
`temperature`	float	No	Sampling temperature (0.0 to 1.0)
`system`	string	No	System prompt to set context
`stream`	boolean	No	Enable streaming responses
`tools`	array	No	Array of tool definitions
`metadata`	object	No	Metadata for the request

Response:

{
  "id": "msg_01ABC123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "model": "claude-3-5-sonnet-20241022",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 8
  }
}

Response Fields:

Field	Type	Description
`id`	string	Unique message ID
`type`	string	Object type (always "message")
`role`	string	Message role (always "assistant")
`content`	array	Array of content blocks
`content[].type`	string	Content type ("text" or "tool_use")
`content[].text`	string	Text content (for text blocks)
`model`	string	Model used for generation
`stop_reason`	string	Reason generation stopped ("end_turn", "max_tokens", "tool_use")
`usage`	object	Token usage statistics
`usage.input_tokens`	integer	Tokens in the input
`usage.output_tokens`	integer	Tokens in the output

GET /anthropic/v1/models

List available Anthropic models.

Request:

curl http://localhost:8090/anthropic/v1/models \
  -H "x-api-key: cortex-dev-key-001"

Response:

{
  "data": [
    {
      "id": "claude-3-5-sonnet-20241022",
      "display_name": "Claude 3.5 Sonnet",
      "created_at": "2024-10-22T00:00:00Z",
      "type": "model"
    },
    {
      "id": "claude-3-5-haiku-20241022",
      "display_name": "Claude 3.5 Haiku",
      "created_at": "2024-10-22T00:00:00Z",
      "type": "model"
    }
  ]
}

Provider Management Endpoints

GET /api/providers/{name}/models

List available models for a specific provider.

Path Parameters:

Parameter	Type	Required	Description
`name`	string	Yes	Provider name (e.g., "openai", "anthropic", "groq")

Request:

curl http://localhost:8090/api/providers/openai/models

Response:

{
  "provider": "openai",
  "models": [
    {
      "id": "gpt-4o",
      "display_name": "GPT-4 Optimized",
      "enabled": true
    },
    {
      "id": "gpt-4o-mini",
      "display_name": "GPT-4 Optimized Mini",
      "enabled": true
    },
    {
      "id": "gpt-3.5-turbo",
      "display_name": "GPT-3.5 Turbo",
      "enabled": true
    }
  ],
  "default_model": "gpt-4o"
}

Response Fields:

Field	Type	Description
`provider`	string	Provider name
`models`	array	Array of available models
`models[].id`	string	Model ID with provider-specific routing pattern applied
`models[].display_name`	string	Human-readable model name
`models[].enabled`	boolean	Whether the model is currently enabled
`default_model`	string	Default model ID for this provider (from config or first available)

Status Codes:

Code	Description
200	Success
400	Bad Request - Provider name is required
404	Not Found - Provider not found
500	Internal Server Error - Failed to retrieve models

Example with Provider-Specific Prefixes:

# Groq provider applies 'groq/' prefix
curl http://localhost:8090/api/providers/groq/models

{
  "provider": "groq",
  "models": [
    {
      "id": "groq/mixtral-8x7b-32768",
      "display_name": "Mixtral 8x7B",
      "enabled": true
    },
    {
      "id": "groq/llama-3.1-70b-versatile",
      "display_name": "LLaMA 3.1 70B",
      "enabled": true
    }
  ],
  "default_model": "groq/mixtral-8x7b-32768"
}

Notes:

Model IDs include provider-specific routing patterns (prefixes) as configured in provider_patterns
The default_model is taken from the provider configuration, or defaults to the first available model
All models returned are considered enabled and available for use
Model display names come from the provider's API

GET /api/providers/{name}/auth

Get authentication information and status for a specific provider.

Path Parameters:

Parameter	Type	Required	Description
`name`	string	Yes	Provider name (e.g., "openai", "anthropic")

Request:

curl http://localhost:8090/api/providers/openai/auth

Response:

{
  "provider": "openai",
  "auth_method": "api_key",
  "api_keys": [
    {
      "masked": "sk-p****...****h7YZ",
      "index": 0
    }
  ],
  "oauth": {
    "configured": false,
    "authenticated": false
  }
}

Response Fields:

Field	Type	Description
`provider`	string	Provider name
`auth_method`	string	Authentication method ("api_key", "oauth", or "auto")
`api_keys`	array	Array of masked API keys (if using API key auth)
`api_keys[].masked`	string	Masked API key showing first and last 4 characters
`api_keys[].index`	integer	Index of the API key
`oauth`	object	OAuth configuration status
`oauth.configured`	boolean	Whether OAuth is configured for this provider
`oauth.authenticated`	boolean	Whether OAuth token is valid and authenticated

Status Codes:

Code	Description
200	Success
400	Bad Request - Provider name is required
404	Not Found - Provider not found or not configured

OAuth Endpoints

OAuth endpoints manage OAuth 2.0 authentication for supported providers. See the OAuth Authentication Guide for complete documentation.

Start Authorization Flow

GET /oauth/{provider}/authorize

Initiates the OAuth 2.0 authorization flow by redirecting the user to the provider's authorization page.

Path Parameters:

Parameter	Type	Required	Description
`provider`	string	Yes	Provider name (e.g., "google", "anthropic")

Query Parameters:

Parameter	Type	Required	Description
`redirect_url`	string	No	URL to redirect to after successful authorization

Example:

# Open in browser to start OAuth flow
http://localhost:8090/oauth/google/authorize

# With custom redirect URL
http://localhost:8090/oauth/google/authorize?redirect_url=http://localhost:3000/success

Response:

Redirects to the provider's authorization page (HTTP 302 redirect).

Notes:

User must complete authorization in browser
Provider will redirect back to /oauth/{provider}/callback after authorization
State token is automatically generated for CSRF protection
PKCE code challenge is automatically generated for security

OAuth Callback Handler

GET /oauth/{provider}/callback

Handles the OAuth callback from the provider. This endpoint is automatically called by the OAuth provider after user authorization.

Path Parameters:

Parameter	Type	Required	Description
`provider`	string	Yes	Provider name (e.g., "google", "anthropic")

Query Parameters (from provider):

Parameter	Type	Description
`code`	string	Authorization code from provider
`state`	string	State token for CSRF validation
`error`	string	Error code (if authorization failed)
`error_description`	string	Error description (if authorization failed)

Example Callback URL:

http://localhost:8090/oauth/google/callback?code=AUTH_CODE&state=STATE_TOKEN

Success Response:

{
  "success": true,
  "provider": "google",
  "expires_at": "2024-01-15T10:30:00Z"
}

Response Fields:

Field	Type	Description
`success`	boolean	Whether authentication was successful
`provider`	string	Provider name
`expires_at`	string	Token expiration time (ISO 8601 format)

Error Response:

HTTP 400 Bad Request
OAuth error: access_denied - User denied access

Notes:

Automatically exchanges authorization code for access token
Stores encrypted token in configured storage backend
Validates state token to prevent CSRF attacks
Verifies PKCE code verifier

Check OAuth Status

GET /oauth/{provider}/status

Returns the current OAuth authentication status for a provider.

Path Parameters:

Parameter	Type	Required	Description
`provider`	string	Yes	Provider name (e.g., "google", "anthropic")

Example:

curl http://localhost:8090/oauth/google/status

Response:

{
  "provider": "google",
  "configured": true,
  "authenticated": true,
  "expires_at": "2024-01-15T10:30:00Z",
  "scopes": [
    "https://www.googleapis.com/auth/generative-language"
  ]
}

Response Fields:

Field	Type	Description
`provider`	string	Provider name
`configured`	boolean	Whether OAuth is configured for this provider
`authenticated`	boolean	Whether a valid OAuth token exists
`expires_at`	string	Token expiration time (ISO 8601 format, omitted if not authenticated)
`scopes`	array	OAuth scopes granted (omitted if not authenticated)

Status Codes:

Code	Description
200	Success
405	Method not allowed (use GET)

Example - Not Configured:

{
  "provider": "anthropic",
  "configured": false,
  "authenticated": false
}

Example - Configured but Not Authenticated:

{
  "provider": "openai",
  "configured": true,
  "authenticated": false
}

Example - OpenRouter (Permanent Token):

{
  "provider": "openrouter",
  "configured": true,
  "authenticated": true,
  "token_type": "permanent"
}

Note: OpenRouter returns a permanent API key that never expires, so expires_at is omitted.

Refresh OAuth Token

POST /oauth/{provider}/refresh

Manually forces an OAuth token refresh. Normally, tokens are automatically refreshed before expiration.

Path Parameters:

Parameter	Type	Required	Description
`provider`	string	Yes	Provider name (e.g., "google", "anthropic")

Example:

curl -X POST http://localhost:8090/oauth/google/refresh

Success Response:

{
  "success": true,
  "provider": "google",
  "expires_at": "2024-01-15T11:30:00Z"
}

Response Fields:

Field	Type	Description
`success`	boolean	Whether refresh was successful
`provider`	string	Provider name
`expires_at`	string	New token expiration time (ISO 8601 format)

Error Response:

{
  "error": "Failed to refresh token: refresh token expired"
}

Status Codes:

Code	Description
200	Success - token refreshed
400	Bad Request - OAuth not configured or no refresh token
405	Method not allowed (use POST)
500	Internal Server Error - refresh failed

Notes:

Requires a valid refresh token to be stored
Updates the stored token with new access token
Refresh token may also be rotated (provider-dependent)
Useful for testing token refresh logic
Not applicable to OpenRouter (permanent tokens don't need refresh)

Device Code Authorization (Qwen Only)

POST /oauth/qwen/device

Initiates the Device Code flow for Qwen authentication (RFC 8628).

Example:

curl -X POST http://localhost:8090/oauth/qwen/device

Success Response:

{
  "device_code": "abc123...",
  "user_code": "ABCD-1234",
  "verification_uri": "https://login.aliyun.com/oauth/device",
  "verification_uri_complete": "https://login.aliyun.com/oauth/device?user_code=ABCD-1234",
  "expires_in": 900,
  "interval": 5
}

Response Fields:

Field	Type	Description
`device_code`	string	Device code for polling
`user_code`	string	User code (not needed - auto-handled)
`verification_uri`	string	URL for user to visit
`verification_uri_complete`	string	Complete URL with code pre-filled
`expires_in`	integer	How long the device code is valid (seconds)
`interval`	integer	How often to poll for token (seconds)

Status Codes:

Code	Description
200	Success - device code issued
400	Bad Request - OAuth not configured
405	Method not allowed (use POST)
500	Internal Server Error - request failed

Notes:

Browser is automatically opened to verification_uri_complete
User simply clicks "Authorize" - no code entry needed
Polling for token happens automatically
Different UX than Authorization Code flow

Poll Device Code Token (Internal)

POST /oauth/qwen/device/poll

Polls for the access token after device authorization. This endpoint is called automatically by the system and is not intended for direct use.

Note: This is an internal endpoint that handles the polling loop for Device Code flow.

Revoke OAuth Token

DELETE /oauth/{provider}/token

Deletes the stored OAuth token for a provider. This removes the token from local storage but does not revoke it with the provider.

Path Parameters:

Parameter	Type	Required	Description
`provider`	string	Yes	Provider name (e.g., "google", "anthropic")

Example:

curl -X DELETE http://localhost:8090/oauth/google/token

Success Response:

{
  "success": true,
  "provider": "google",
  "message": "Token revoked successfully"
}

Response Fields:

Field	Type	Description
`success`	boolean	Whether revocation was successful
`provider`	string	Provider name
`message`	string	Success message

Error Response:

HTTP 500 Internal Server Error
Failed to delete token: token not found for provider: google

Status Codes:

Code	Description
200	Success - token deleted
405	Method not allowed (use DELETE)
500	Internal Server Error - deletion failed

Notes:

Only deletes the token from local storage
Does NOT revoke the token with the OAuth provider
User will need to re-authenticate to use OAuth again
To fully revoke access, revoke the token in the provider's console

OAuth Security

All OAuth endpoints implement security best practices:

PKCE (Proof Key for Code Exchange)

Uses SHA-256 code challenge method
Protects against authorization code interception
Code verifier never transmitted until token exchange

State Tokens

32-byte random state tokens
Validates state on callback
Prevents CSRF attacks
Expires after 10 minutes

Encrypted Token Storage

AES-256-GCM encryption
Unique nonce per encryption
Configurable encryption key
Secure file permissions (0600)

Automatic Token Refresh

Refreshes before expiration
Uses stored refresh token
Transparent to API clients
Logs refresh operations

For detailed OAuth documentation, see the OAuth Authentication Guide.

Admin Monitoring Endpoints

GET /api/admin/inflight

Retrieve a real-time snapshot of currently in-flight inference requests being processed by the server. This endpoint provides visibility into active /v1/* requests (chat completions, embeddings, etc.) and is useful for monitoring and debugging purposes.

Request:

curl http://localhost:8090/api/admin/inflight \
  -H "Authorization: Bearer cortex-dev-key-001"

Response:

{
  "count": 2,
  "requests": [
    {
      "id": 42,
      "method": "POST",
      "uri": "/v1/chat/completions",
      "elapsed_secs": 15,
      "idle_secs": 3,
      "model": "gpt-4o",
      "provider": "openai",
      "api_key_name": "Development Key"
    },
    {
      "id": 43,
      "method": "POST",
      "uri": "/v1/chat/completions",
      "elapsed_secs": 8,
      "idle_secs": 0,
      "model": "claude-3-5-sonnet-20241022",
      "provider": "anthropic",
      "api_key_name": "Production Key"
    }
  ]
}

Response Fields:

Field	Type	Description
`count`	integer	Total number of in-flight inference requests
`requests`	array	Array of in-flight request snapshots
`requests[].id`	integer	Unique monotonic request ID assigned by the in-flight middleware
`requests[].method`	string	HTTP method (e.g., "POST", "GET")
`requests[].uri`	string	Request URI (e.g., "/v1/chat/completions")
`requests[].elapsed_secs`	integer	Seconds since the request was registered
`requests[].idle_secs`	integer	Seconds since the last body chunk was produced (0 for non-streaming requests)
`requests[].model`	string	Model being used (populated after dispatch resolution)
`requests[].provider`	string	Provider name (populated after dispatch resolution)
`requests[].api_key_name`	string	API key name (populated after auth resolution)

Notes:

Only /v1/* paths (inference endpoints) are included in the response
Admin, health, OAuth, and other internal endpoints are excluded from the display
The idle_secs field is useful for detecting stalled streaming requests
Fields like model, provider, and api_key_name may be null if the request has not yet completed the dispatch/auth resolution phase

Runtime Configuration Endpoints

GET /api/config/caching

Get the caching configuration for prompt/response caching features.

Request:

curl http://localhost:8090/api/config/caching \
  -H "Authorization: Bearer cortex-dev-key-001"

Response:

{
  "auto_inject_anthropic_cache_control": true,
  "response_cache_enabled": false,
  "response_cache_ttl_secs": 300
}

Response Fields:

Field	Type	Description
`auto_inject_anthropic_cache_control`	boolean	Automatically inject cache control headers for Anthropic API requests
`response_cache_enabled`	boolean	Enable caching of API responses
`response_cache_ttl_secs`	integer	Time-to-live for cached responses in seconds

PUT /api/config/caching

Update the caching configuration.

Request:

curl -X PUT http://localhost:8090/api/config/caching \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "auto_inject_anthropic_cache_control": true,
    "response_cache_enabled": true,
    "response_cache_ttl_secs": 600
  }'

Request Body:

Field	Type	Required	Description
`auto_inject_anthropic_cache_control`	boolean	No	Auto-inject cache control for Anthropic
`response_cache_enabled`	boolean	No	Enable response caching
`response_cache_ttl_secs`	integer	No	Cache TTL in seconds (must be positive)

Response:

Returns the updated configuration (same format as GET).

GET /api/config/token-refresh

Get the token refresh configuration for OAuth token management.

Request:

curl http://localhost:8090/api/config/token-refresh \
  -H "Authorization: Bearer cortex-dev-key-001"

Response:

{
  "enabled": true,
  "refresh_threshold_secs": 300,
  "check_interval_secs": 60
}

Response Fields:

Field	Type	Description
`enabled`	boolean	Whether background token refresh is enabled
`refresh_threshold_secs`	integer	Seconds before token expiry to trigger refresh
`check_interval_secs`	integer	Interval in seconds between background refresh checks

PUT /api/config/token-refresh

Update the token refresh configuration. Changes take effect immediately without requiring a server restart.

Request:

curl -X PUT http://localhost:8090/api/config/token-refresh \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "enabled": true,
    "refresh_threshold_secs": 300,
    "check_interval_secs": 60
  }'

Request Body:

Field	Type	Required	Description
`enabled`	boolean	No	Enable/disable background token refresh
`refresh_threshold_secs`	integer	No	Seconds before expiry to trigger refresh
`check_interval_secs`	integer	No	Interval between refresh checks in seconds

Response:

Returns the updated configuration (same format as GET).

Notes:

Disabling stops the background refresh task on its next cycle
refresh_threshold_secs is used in the next refresh cycle to determine which tokens need refresh
check_interval_secs is used for the next sleep interval between refresh cycles

Usage Tracking Endpoints

GET /usage

Retrieve usage records with optional filtering.

Request:

curl "http://localhost:8090/usage?api_key=Development%20Key&limit=10" \
  -H "Authorization: Bearer cortex-dev-key-001"

Query Parameters:

Parameter	Type	Description
`api_key`	string	Filter by API key name
`provider`	string	Filter by provider (e.g., "openai", "anthropic")
`model`	string	Filter by model ID
`start_time`	string	Filter by start time (RFC3339 format)
`end_time`	string	Filter by end time (RFC3339 format)
`limit`	integer	Maximum number of records to return (default: 100)

Response:

[
  {
    "request_id": "req_123",
    "timestamp": "2024-11-29T10:30:00Z",
    "api_key_name": "Development Key",
    "model": "gpt-4o",
    "provider": "openai",
    "endpoint": "/v1/chat/completions",
    "input_tokens": 15,
    "output_tokens": 8,
    "total_tokens": 23,
    "latency_ms": 1250,
    "status": "success"
  }
]

Record Fields:

Field	Type	Description
`request_id`	string	Unique request identifier
`timestamp`	string	When the request occurred (ISO 8601)
`api_key_name`	string	Name of the API key used
`model`	string	Model used
`provider`	string	Provider used
`endpoint`	string	API endpoint called
`input_tokens`	integer	Input tokens consumed
`output_tokens`	integer	Output tokens generated
`total_tokens`	integer	Total tokens used
`latency_ms`	integer	Request latency in milliseconds
`status`	string	Request status ("success" or "error")
`error_code`	string	Error code (if status is "error")

GET /usage/summary

Retrieve aggregated usage statistics.

Request:

curl "http://localhost:8090/usage/summary?start_time=2024-11-01T00:00:00Z" \
  -H "Authorization: Bearer cortex-dev-key-001"

Query Parameters:

Parameter	Type	Description
`api_key`	string	Filter by API key name
`provider`	string	Filter by provider
`model`	string	Filter by model ID
`start_time`	string	Filter by start time (RFC3339 format)
`end_time`	string	Filter by end time (RFC3339 format)

Response:

{
  "total_requests": 1250,
  "total_tokens": 156000,
  "total_input_tokens": 95000,
  "total_output_tokens": 61000,
  "avg_latency_ms": 1320.5,
  "by_provider": {
    "openai": 750,
    "anthropic": 500
  },
  "by_model": {
    "gpt-4o": 500,
    "gpt-4o-mini": 250,
    "claude-3-5-sonnet-20241022": 500
  },
  "by_api_key": {
    "Development Key": 800,
    "Production Key": 450
  },
  "success_count": 1230,
  "error_count": 20
}

Summary Fields:

Field	Type	Description
`total_requests`	integer	Total number of requests
`total_tokens`	integer	Total tokens consumed
`total_input_tokens`	integer	Total input tokens
`total_output_tokens`	integer	Total output tokens
`avg_latency_ms`	float	Average latency in milliseconds
`by_provider`	object	Request count by provider
`by_model`	object	Request count by model
`by_api_key`	object	Request count by API key
`success_count`	integer	Number of successful requests
`error_count`	integer	Number of failed requests

Error Handling

OpenAI-Compatible Error Format

{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Anthropic-Compatible Error Format

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid API key"
  }
}

Common HTTP Status Codes

Status Code	Description
200	Success
400	Bad Request - Invalid request body or parameters
401	Unauthorized - Missing or invalid API key
403	Forbidden - API key lacks permission for the requested operation
429	Too Many Requests - Rate limit exceeded
500	Internal Server Error - Server-side error

Common Error Types

Error Type	Description
`invalid_request_error`	The request was malformed or missing required fields
`authentication_error`	Invalid or missing API key
`permission_error`	API key doesn't have permission for the requested resource
`rate_limit_error`	Rate limit exceeded
`model_not_found`	Requested model doesn't exist or isn't accessible
`provider_error`	Error from the underlying AI provider

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

Cortex API Documentation

Table of Contents

Authentication

OpenAI-Compatible Endpoints

POST /v1/chat/completions

GET /v1/models

POST /v1/embeddings

POST /v1/audio/transcriptions

Anthropic-Compatible Endpoints

POST /anthropic/v1/messages

GET /anthropic/v1/models

Provider Management Endpoints

GET /api/providers/{name}/models

GET /api/providers/{name}/auth

OAuth Endpoints

Start Authorization Flow

OAuth Callback Handler

Check OAuth Status

Refresh OAuth Token

Device Code Authorization (Qwen Only)

Poll Device Code Token (Internal)

Revoke OAuth Token

OAuth Security

Admin Monitoring Endpoints

GET /api/admin/inflight

Runtime Configuration Endpoints

GET /api/config/caching

PUT /api/config/caching

GET /api/config/token-refresh

PUT /api/config/token-refresh

Usage Tracking Endpoints

GET /usage

GET /usage/summary

Error Handling

OpenAI-Compatible Error Format

Anthropic-Compatible Error Format

Common HTTP Status Codes

Common Error Types