Skip to content

Latest commit

 

History

History
1417 lines (1112 loc) · 38.3 KB

File metadata and controls

1417 lines (1112 loc) · 38.3 KB

Cortex API Documentation

Cortex provides OpenAI-compatible and Anthropic-compatible endpoints for unified access to multiple AI providers.

Table of Contents

Authentication

All API requests require authentication using an API key. Include your API key in the request headers:

OpenAI-Compatible Endpoints:

Authorization: Bearer YOUR_API_KEY

Anthropic-Compatible Endpoints:

x-api-key: YOUR_API_KEY

API keys are configured in cortex.yaml and can have specific permissions, rate limits, and access controls.

OpenAI-Compatible Endpoints

POST /v1/chat/completions

Create a chat completion using the OpenAI-compatible API.

Request:

curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "max_tokens": 100,
    "temperature": 0.7
  }'

Request Body:

Field Type Required Description
model string Yes Model ID or alias (e.g., "gpt-4o", "claude-3-5-sonnet-20241022")
messages array Yes Array of message objects with role and content
max_tokens integer No Maximum tokens to generate
temperature float No Sampling temperature (0.0 to 2.0)
stream boolean No Enable streaming responses
tools array No Array of tool definitions for function calling
tool_choice string/object No Control tool usage ("auto", "none", or specific tool)

Response:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1699472000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 8,
    "total_tokens": 23
  }
}

Response Fields:

Field Type Description
id string Unique completion ID
object string Object type (always "chat.completion")
created integer Unix timestamp of creation
model string Model used for completion
choices array Array of completion choices
choices[].index integer Choice index
choices[].message object Generated message with role and content
choices[].finish_reason string Reason completion finished ("stop", "length", "tool_calls")
usage object Token usage statistics
usage.prompt_tokens integer Tokens in the prompt
usage.completion_tokens integer Tokens in the completion
usage.total_tokens integer Total tokens used

GET /v1/models

List available models accessible through the OpenAI-compatible API.

Request:

curl http://localhost:8090/v1/models \
  -H "Authorization: Bearer cortex-dev-key-001"

Response:

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4",
      "object": "model",
      "owned_by": "cortex"
    },
    {
      "id": "gpt-3.5-turbo",
      "object": "model",
      "owned_by": "cortex"
    },
    {
      "id": "claude-3-opus",
      "object": "model",
      "owned_by": "cortex"
    }
  ]
}

POST /v1/embeddings

Create embeddings for text using the OpenAI-compatible API. Supports both single string and array inputs.

Request:

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "model": "openai:text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Request with Array Input:

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "model": "openai:text-embedding-3-small",
    "input": ["First sentence", "Second sentence", "Third sentence"]
  }'

Request Body:

Field Type Required Description
model string Yes Model ID with optional provider prefix (e.g., "openai:text-embedding-3-small" or "text-embedding-3-small")
input string/array Yes Text to embed. Can be a single string, an array of strings, or an array of token arrays
encoding_format string No Encoding format for returned embeddings: "float" (default) or "base64"
dimensions integer No Number of output dimensions (for models that support flexible dimensionality)
user string No End-user identifier for abuse detection

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0023064255, -0.009327292, -0.0028842222, ...],
      "index": 0
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Response Fields:

Field Type Description
object string Object type (always "list")
data array Array of embedding objects
data[].object string Object type (always "embedding")
data[].embedding array The embedding vector (array of floats)
data[].index integer Index of the input item corresponding to this embedding
model string Model used for generating embeddings
usage object Token usage statistics
usage.prompt_tokens integer Number of tokens in the input
usage.total_tokens integer Total tokens used (same as prompt_tokens for embeddings)

Supported Providers:

The embeddings endpoint is available for the following provider types:

Provider Support
OpenAI ✅ Full support
Mistral ✅ Full support
Azure ✅ Full support
Together ✅ Full support
Fireworks ✅ Full support
Groq ✅ Full support
DeepSeek ✅ Full support
Custom ✅ Full support (OpenAI-compatible)
Anthropic ❌ Not supported

Virtual Model Requirements:

Virtual models are supported for embeddings only when they have exactly one enabled candidate. This prevents mixing embeddings from different models (which would produce incompatible vector spaces). If a virtual model has zero or multiple enabled candidates, the request will return a 400 Bad Request error.

Error Responses:

Status Error Description
400 invalid_request_error Empty model, empty input, multi-candidate virtual model
401 authentication_error Missing or invalid API key
404 invalid_request_error Model not found or unknown provider
429 rate_limit_error Rate limit exceeded

POST /v1/audio/transcriptions

Transcribe audio into text using the OpenAI-compatible API. Supports multiple audio formats and streaming output.

Request:

curl -X POST http://localhost:8090/v1/audio/transcriptions \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -F "file=@audio.mp3" \
  -F "model=openai:whisper-1"

Request Fields (multipart/form-data):

Field Type Required Description
model string Yes Model ID with optional provider prefix (e.g., "openai:whisper-1", "groq:whisper-large-v3")
file file Yes Audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, flac, ogg, webm
language string No ISO-639-1 language code (e.g., "en") to improve accuracy and latency
prompt string No Optional text to guide transcription style or continue a previous segment
response_format string No Output format: "json" (default), "text", "srt", "vtt", "verbose_json", "diarized_json"
temperature float No Sampling temperature (0.0 to 1.0). Higher values increase randomness
stream boolean No Enable SSE streaming output (default: false)
include[] array No Additional fields to include (e.g., "logprobs" for gpt-4o-transcribe models)
timestamp_granularities[] array No Timestamp granularities for verbose_json: "word", "segment"
chunking_strategy string No Chunking strategy for diarization: "auto" or JSON object
known_speaker_names[] array No Known speaker names for diarization (max 4)
known_speaker_references[] array No Base64 audio samples for speaker identification (max 4)

Response (json format):

{
  "text": "Hello, this is a transcription of the audio file."
}

Response (verbose_json format):

{
  "task": "transcribe",
  "language": "en",
  "duration": 12.5,
  "text": "Hello, this is a transcription.",
  "words": [
    {"word": "Hello", "start": 0.0, "end": 0.5},
    {"word": "this", "start": 0.6, "end": 0.8}
  ],
  "segments": [
    {"id": 0, "start": 0.0, "end": 2.0, "text": "Hello, this is a transcription."}
  ]
}

Response (diarized_json format for speaker identification):

{
  "text": "Speaker A: Hello. Speaker B: Hi there.",
  "segments": [
    {
      "type": "transcript.text.segment",
      "id": "seg_001",
      "start": 0.0,
      "end": 1.5,
      "text": "Hello.",
      "speaker": "A"
    },
    {
      "type": "transcript.text.segment",
      "id": "seg_002",
      "start": 1.8,
      "end": 3.0,
      "text": "Hi there.",
      "speaker": "B"
    }
  ],
  "usage": {
    "type": "duration",
    "seconds": 3.5
  }
}

Streaming Response (stream=true):

When stream=true, the response is returned as SSE events:

event: transcript.text.delta
data: {"type": "transcript.text.delta", "text": "Hello"}

event: transcript.text.delta
data: {"type": "transcript.text.delta", "text": " world"}

event: transcript.text.done
data: {"type": "transcript.text.done", "text": "Hello world"}

Response Fields:

Field Type Description
text string The transcribed text
usage object Usage statistics (optional, for gpt-4o-transcribe models)
usage.type string Usage type: "tokens" or "duration"
usage.input_tokens integer Input tokens (for token-based usage)
usage.output_tokens integer Output tokens (for token-based usage)
usage.total_tokens integer Total tokens used
usage.seconds float Duration in seconds (for duration-based usage)
logprobs array Token log probabilities (when include[] contains "logprobs")

Supported Providers:

Provider Support Models
OpenAI ✅ Full support whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize
Groq ✅ Full support whisper-1, whisper-large-v3
Azure OpenAI ✅ Full support whisper-1
Custom ✅ OpenAI-compatible Any OpenAI-compatible transcription endpoint
Anthropic ❌ Not supported -

File Size Limits:

  • Maximum audio file size: 25 MB
  • Files exceeding this limit will return a 400 Bad Request error

Error Responses:

Status Error Description
400 invalid_request_error Missing model/file, file too large, unsupported provider
401 authentication_error Missing or invalid API key
404 invalid_request_error Unknown provider or model
429 rate_limit_error Rate limit exceeded
502 server_error Upstream provider error

Anthropic-Compatible Endpoints

POST /anthropic/v1/messages

Create a message using the Anthropic-compatible API.

Request:

curl -X POST http://localhost:8090/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: cortex-dev-key-001" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'

Request Body:

Field Type Required Description
model string Yes Model ID or alias (e.g., "claude-3-5-sonnet-20241022")
messages array Yes Array of message objects with role and content
max_tokens integer Yes Maximum tokens to generate
temperature float No Sampling temperature (0.0 to 1.0)
system string No System prompt to set context
stream boolean No Enable streaming responses
tools array No Array of tool definitions
metadata object No Metadata for the request

Response:

{
  "id": "msg_01ABC123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "model": "claude-3-5-sonnet-20241022",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 8
  }
}

Response Fields:

Field Type Description
id string Unique message ID
type string Object type (always "message")
role string Message role (always "assistant")
content array Array of content blocks
content[].type string Content type ("text" or "tool_use")
content[].text string Text content (for text blocks)
model string Model used for generation
stop_reason string Reason generation stopped ("end_turn", "max_tokens", "tool_use")
usage object Token usage statistics
usage.input_tokens integer Tokens in the input
usage.output_tokens integer Tokens in the output

GET /anthropic/v1/models

List available Anthropic models.

Request:

curl http://localhost:8090/anthropic/v1/models \
  -H "x-api-key: cortex-dev-key-001"

Response:

{
  "data": [
    {
      "id": "claude-3-5-sonnet-20241022",
      "display_name": "Claude 3.5 Sonnet",
      "created_at": "2024-10-22T00:00:00Z",
      "type": "model"
    },
    {
      "id": "claude-3-5-haiku-20241022",
      "display_name": "Claude 3.5 Haiku",
      "created_at": "2024-10-22T00:00:00Z",
      "type": "model"
    }
  ]
}

Provider Management Endpoints

GET /api/providers/{name}/models

List available models for a specific provider.

Path Parameters:

Parameter Type Required Description
name string Yes Provider name (e.g., "openai", "anthropic", "groq")

Request:

curl http://localhost:8090/api/providers/openai/models

Response:

{
  "provider": "openai",
  "models": [
    {
      "id": "gpt-4o",
      "display_name": "GPT-4 Optimized",
      "enabled": true
    },
    {
      "id": "gpt-4o-mini",
      "display_name": "GPT-4 Optimized Mini",
      "enabled": true
    },
    {
      "id": "gpt-3.5-turbo",
      "display_name": "GPT-3.5 Turbo",
      "enabled": true
    }
  ],
  "default_model": "gpt-4o"
}

Response Fields:

Field Type Description
provider string Provider name
models array Array of available models
models[].id string Model ID with provider-specific routing pattern applied
models[].display_name string Human-readable model name
models[].enabled boolean Whether the model is currently enabled
default_model string Default model ID for this provider (from config or first available)

Status Codes:

Code Description
200 Success
400 Bad Request - Provider name is required
404 Not Found - Provider not found
500 Internal Server Error - Failed to retrieve models

Example with Provider-Specific Prefixes:

# Groq provider applies 'groq/' prefix
curl http://localhost:8090/api/providers/groq/models
{
  "provider": "groq",
  "models": [
    {
      "id": "groq/mixtral-8x7b-32768",
      "display_name": "Mixtral 8x7B",
      "enabled": true
    },
    {
      "id": "groq/llama-3.1-70b-versatile",
      "display_name": "LLaMA 3.1 70B",
      "enabled": true
    }
  ],
  "default_model": "groq/mixtral-8x7b-32768"
}

Notes:

  • Model IDs include provider-specific routing patterns (prefixes) as configured in provider_patterns
  • The default_model is taken from the provider configuration, or defaults to the first available model
  • All models returned are considered enabled and available for use
  • Model display names come from the provider's API

GET /api/providers/{name}/auth

Get authentication information and status for a specific provider.

Path Parameters:

Parameter Type Required Description
name string Yes Provider name (e.g., "openai", "anthropic")

Request:

curl http://localhost:8090/api/providers/openai/auth

Response:

{
  "provider": "openai",
  "auth_method": "api_key",
  "api_keys": [
    {
      "masked": "sk-p****...****h7YZ",
      "index": 0
    }
  ],
  "oauth": {
    "configured": false,
    "authenticated": false
  }
}

Response Fields:

Field Type Description
provider string Provider name
auth_method string Authentication method ("api_key", "oauth", or "auto")
api_keys array Array of masked API keys (if using API key auth)
api_keys[].masked string Masked API key showing first and last 4 characters
api_keys[].index integer Index of the API key
oauth object OAuth configuration status
oauth.configured boolean Whether OAuth is configured for this provider
oauth.authenticated boolean Whether OAuth token is valid and authenticated

Status Codes:

Code Description
200 Success
400 Bad Request - Provider name is required
404 Not Found - Provider not found or not configured

OAuth Endpoints

OAuth endpoints manage OAuth 2.0 authentication for supported providers. See the OAuth Authentication Guide for complete documentation.

Start Authorization Flow

GET /oauth/{provider}/authorize

Initiates the OAuth 2.0 authorization flow by redirecting the user to the provider's authorization page.

Path Parameters:

Parameter Type Required Description
provider string Yes Provider name (e.g., "google", "anthropic")

Query Parameters:

Parameter Type Required Description
redirect_url string No URL to redirect to after successful authorization

Example:

# Open in browser to start OAuth flow
http://localhost:8090/oauth/google/authorize

# With custom redirect URL
http://localhost:8090/oauth/google/authorize?redirect_url=http://localhost:3000/success

Response:

Redirects to the provider's authorization page (HTTP 302 redirect).

Notes:

  • User must complete authorization in browser
  • Provider will redirect back to /oauth/{provider}/callback after authorization
  • State token is automatically generated for CSRF protection
  • PKCE code challenge is automatically generated for security

OAuth Callback Handler

GET /oauth/{provider}/callback

Handles the OAuth callback from the provider. This endpoint is automatically called by the OAuth provider after user authorization.

Path Parameters:

Parameter Type Required Description
provider string Yes Provider name (e.g., "google", "anthropic")

Query Parameters (from provider):

Parameter Type Description
code string Authorization code from provider
state string State token for CSRF validation
error string Error code (if authorization failed)
error_description string Error description (if authorization failed)

Example Callback URL:

http://localhost:8090/oauth/google/callback?code=AUTH_CODE&state=STATE_TOKEN

Success Response:

{
  "success": true,
  "provider": "google",
  "expires_at": "2024-01-15T10:30:00Z"
}

Response Fields:

Field Type Description
success boolean Whether authentication was successful
provider string Provider name
expires_at string Token expiration time (ISO 8601 format)

Error Response:

HTTP 400 Bad Request
OAuth error: access_denied - User denied access

Notes:

  • Automatically exchanges authorization code for access token
  • Stores encrypted token in configured storage backend
  • Validates state token to prevent CSRF attacks
  • Verifies PKCE code verifier

Check OAuth Status

GET /oauth/{provider}/status

Returns the current OAuth authentication status for a provider.

Path Parameters:

Parameter Type Required Description
provider string Yes Provider name (e.g., "google", "anthropic")

Example:

curl http://localhost:8090/oauth/google/status

Response:

{
  "provider": "google",
  "configured": true,
  "authenticated": true,
  "expires_at": "2024-01-15T10:30:00Z",
  "scopes": [
    "https://www.googleapis.com/auth/generative-language"
  ]
}

Response Fields:

Field Type Description
provider string Provider name
configured boolean Whether OAuth is configured for this provider
authenticated boolean Whether a valid OAuth token exists
expires_at string Token expiration time (ISO 8601 format, omitted if not authenticated)
scopes array OAuth scopes granted (omitted if not authenticated)

Status Codes:

Code Description
200 Success
405 Method not allowed (use GET)

Example - Not Configured:

{
  "provider": "anthropic",
  "configured": false,
  "authenticated": false
}

Example - Configured but Not Authenticated:

{
  "provider": "openai",
  "configured": true,
  "authenticated": false
}

Example - OpenRouter (Permanent Token):

{
  "provider": "openrouter",
  "configured": true,
  "authenticated": true,
  "token_type": "permanent"
}

Note: OpenRouter returns a permanent API key that never expires, so expires_at is omitted.


Refresh OAuth Token

POST /oauth/{provider}/refresh

Manually forces an OAuth token refresh. Normally, tokens are automatically refreshed before expiration.

Path Parameters:

Parameter Type Required Description
provider string Yes Provider name (e.g., "google", "anthropic")

Example:

curl -X POST http://localhost:8090/oauth/google/refresh

Success Response:

{
  "success": true,
  "provider": "google",
  "expires_at": "2024-01-15T11:30:00Z"
}

Response Fields:

Field Type Description
success boolean Whether refresh was successful
provider string Provider name
expires_at string New token expiration time (ISO 8601 format)

Error Response:

{
  "error": "Failed to refresh token: refresh token expired"
}

Status Codes:

Code Description
200 Success - token refreshed
400 Bad Request - OAuth not configured or no refresh token
405 Method not allowed (use POST)
500 Internal Server Error - refresh failed

Notes:

  • Requires a valid refresh token to be stored
  • Updates the stored token with new access token
  • Refresh token may also be rotated (provider-dependent)
  • Useful for testing token refresh logic
  • Not applicable to OpenRouter (permanent tokens don't need refresh)

Device Code Authorization (Qwen Only)

POST /oauth/qwen/device

Initiates the Device Code flow for Qwen authentication (RFC 8628).

Example:

curl -X POST http://localhost:8090/oauth/qwen/device

Success Response:

{
  "device_code": "abc123...",
  "user_code": "ABCD-1234",
  "verification_uri": "https://login.aliyun.com/oauth/device",
  "verification_uri_complete": "https://login.aliyun.com/oauth/device?user_code=ABCD-1234",
  "expires_in": 900,
  "interval": 5
}

Response Fields:

Field Type Description
device_code string Device code for polling
user_code string User code (not needed - auto-handled)
verification_uri string URL for user to visit
verification_uri_complete string Complete URL with code pre-filled
expires_in integer How long the device code is valid (seconds)
interval integer How often to poll for token (seconds)

Status Codes:

Code Description
200 Success - device code issued
400 Bad Request - OAuth not configured
405 Method not allowed (use POST)
500 Internal Server Error - request failed

Notes:

  • Browser is automatically opened to verification_uri_complete
  • User simply clicks "Authorize" - no code entry needed
  • Polling for token happens automatically
  • Different UX than Authorization Code flow

Poll Device Code Token (Internal)

POST /oauth/qwen/device/poll

Polls for the access token after device authorization. This endpoint is called automatically by the system and is not intended for direct use.

Note: This is an internal endpoint that handles the polling loop for Device Code flow.


Revoke OAuth Token

DELETE /oauth/{provider}/token

Deletes the stored OAuth token for a provider. This removes the token from local storage but does not revoke it with the provider.

Path Parameters:

Parameter Type Required Description
provider string Yes Provider name (e.g., "google", "anthropic")

Example:

curl -X DELETE http://localhost:8090/oauth/google/token

Success Response:

{
  "success": true,
  "provider": "google",
  "message": "Token revoked successfully"
}

Response Fields:

Field Type Description
success boolean Whether revocation was successful
provider string Provider name
message string Success message

Error Response:

HTTP 500 Internal Server Error
Failed to delete token: token not found for provider: google

Status Codes:

Code Description
200 Success - token deleted
405 Method not allowed (use DELETE)
500 Internal Server Error - deletion failed

Notes:

  • Only deletes the token from local storage
  • Does NOT revoke the token with the OAuth provider
  • User will need to re-authenticate to use OAuth again
  • To fully revoke access, revoke the token in the provider's console

OAuth Security

All OAuth endpoints implement security best practices:

PKCE (Proof Key for Code Exchange)

  • Uses SHA-256 code challenge method
  • Protects against authorization code interception
  • Code verifier never transmitted until token exchange

State Tokens

  • 32-byte random state tokens
  • Validates state on callback
  • Prevents CSRF attacks
  • Expires after 10 minutes

Encrypted Token Storage

  • AES-256-GCM encryption
  • Unique nonce per encryption
  • Configurable encryption key
  • Secure file permissions (0600)

Automatic Token Refresh

  • Refreshes before expiration
  • Uses stored refresh token
  • Transparent to API clients
  • Logs refresh operations

For detailed OAuth documentation, see the OAuth Authentication Guide.

Admin Monitoring Endpoints

GET /api/admin/inflight

Retrieve a real-time snapshot of currently in-flight inference requests being processed by the server. This endpoint provides visibility into active /v1/* requests (chat completions, embeddings, etc.) and is useful for monitoring and debugging purposes.

Request:

curl http://localhost:8090/api/admin/inflight \
  -H "Authorization: Bearer cortex-dev-key-001"

Response:

{
  "count": 2,
  "requests": [
    {
      "id": 42,
      "method": "POST",
      "uri": "/v1/chat/completions",
      "elapsed_secs": 15,
      "idle_secs": 3,
      "model": "gpt-4o",
      "provider": "openai",
      "api_key_name": "Development Key"
    },
    {
      "id": 43,
      "method": "POST",
      "uri": "/v1/chat/completions",
      "elapsed_secs": 8,
      "idle_secs": 0,
      "model": "claude-3-5-sonnet-20241022",
      "provider": "anthropic",
      "api_key_name": "Production Key"
    }
  ]
}

Response Fields:

Field Type Description
count integer Total number of in-flight inference requests
requests array Array of in-flight request snapshots
requests[].id integer Unique monotonic request ID assigned by the in-flight middleware
requests[].method string HTTP method (e.g., "POST", "GET")
requests[].uri string Request URI (e.g., "/v1/chat/completions")
requests[].elapsed_secs integer Seconds since the request was registered
requests[].idle_secs integer Seconds since the last body chunk was produced (0 for non-streaming requests)
requests[].model string Model being used (populated after dispatch resolution)
requests[].provider string Provider name (populated after dispatch resolution)
requests[].api_key_name string API key name (populated after auth resolution)

Notes:

  • Only /v1/* paths (inference endpoints) are included in the response
  • Admin, health, OAuth, and other internal endpoints are excluded from the display
  • The idle_secs field is useful for detecting stalled streaming requests
  • Fields like model, provider, and api_key_name may be null if the request has not yet completed the dispatch/auth resolution phase

Runtime Configuration Endpoints

GET /api/config/caching

Get the caching configuration for prompt/response caching features.

Request:

curl http://localhost:8090/api/config/caching \
  -H "Authorization: Bearer cortex-dev-key-001"

Response:

{
  "auto_inject_anthropic_cache_control": true,
  "response_cache_enabled": false,
  "response_cache_ttl_secs": 300
}

Response Fields:

Field Type Description
auto_inject_anthropic_cache_control boolean Automatically inject cache control headers for Anthropic API requests
response_cache_enabled boolean Enable caching of API responses
response_cache_ttl_secs integer Time-to-live for cached responses in seconds

PUT /api/config/caching

Update the caching configuration.

Request:

curl -X PUT http://localhost:8090/api/config/caching \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "auto_inject_anthropic_cache_control": true,
    "response_cache_enabled": true,
    "response_cache_ttl_secs": 600
  }'

Request Body:

Field Type Required Description
auto_inject_anthropic_cache_control boolean No Auto-inject cache control for Anthropic
response_cache_enabled boolean No Enable response caching
response_cache_ttl_secs integer No Cache TTL in seconds (must be positive)

Response:

Returns the updated configuration (same format as GET).


GET /api/config/token-refresh

Get the token refresh configuration for OAuth token management.

Request:

curl http://localhost:8090/api/config/token-refresh \
  -H "Authorization: Bearer cortex-dev-key-001"

Response:

{
  "enabled": true,
  "refresh_threshold_secs": 300,
  "check_interval_secs": 60
}

Response Fields:

Field Type Description
enabled boolean Whether background token refresh is enabled
refresh_threshold_secs integer Seconds before token expiry to trigger refresh
check_interval_secs integer Interval in seconds between background refresh checks

PUT /api/config/token-refresh

Update the token refresh configuration. Changes take effect immediately without requiring a server restart.

Request:

curl -X PUT http://localhost:8090/api/config/token-refresh \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cortex-dev-key-001" \
  -d '{
    "enabled": true,
    "refresh_threshold_secs": 300,
    "check_interval_secs": 60
  }'

Request Body:

Field Type Required Description
enabled boolean No Enable/disable background token refresh
refresh_threshold_secs integer No Seconds before expiry to trigger refresh
check_interval_secs integer No Interval between refresh checks in seconds

Response:

Returns the updated configuration (same format as GET).

Notes:

  • Disabling stops the background refresh task on its next cycle
  • refresh_threshold_secs is used in the next refresh cycle to determine which tokens need refresh
  • check_interval_secs is used for the next sleep interval between refresh cycles

Usage Tracking Endpoints

GET /usage

Retrieve usage records with optional filtering.

Request:

curl "http://localhost:8090/usage?api_key=Development%20Key&limit=10" \
  -H "Authorization: Bearer cortex-dev-key-001"

Query Parameters:

Parameter Type Description
api_key string Filter by API key name
provider string Filter by provider (e.g., "openai", "anthropic")
model string Filter by model ID
start_time string Filter by start time (RFC3339 format)
end_time string Filter by end time (RFC3339 format)
limit integer Maximum number of records to return (default: 100)

Response:

[
  {
    "request_id": "req_123",
    "timestamp": "2024-11-29T10:30:00Z",
    "api_key_name": "Development Key",
    "model": "gpt-4o",
    "provider": "openai",
    "endpoint": "/v1/chat/completions",
    "input_tokens": 15,
    "output_tokens": 8,
    "total_tokens": 23,
    "latency_ms": 1250,
    "status": "success"
  }
]

Record Fields:

Field Type Description
request_id string Unique request identifier
timestamp string When the request occurred (ISO 8601)
api_key_name string Name of the API key used
model string Model used
provider string Provider used
endpoint string API endpoint called
input_tokens integer Input tokens consumed
output_tokens integer Output tokens generated
total_tokens integer Total tokens used
latency_ms integer Request latency in milliseconds
status string Request status ("success" or "error")
error_code string Error code (if status is "error")

GET /usage/summary

Retrieve aggregated usage statistics.

Request:

curl "http://localhost:8090/usage/summary?start_time=2024-11-01T00:00:00Z" \
  -H "Authorization: Bearer cortex-dev-key-001"

Query Parameters:

Parameter Type Description
api_key string Filter by API key name
provider string Filter by provider
model string Filter by model ID
start_time string Filter by start time (RFC3339 format)
end_time string Filter by end time (RFC3339 format)

Response:

{
  "total_requests": 1250,
  "total_tokens": 156000,
  "total_input_tokens": 95000,
  "total_output_tokens": 61000,
  "avg_latency_ms": 1320.5,
  "by_provider": {
    "openai": 750,
    "anthropic": 500
  },
  "by_model": {
    "gpt-4o": 500,
    "gpt-4o-mini": 250,
    "claude-3-5-sonnet-20241022": 500
  },
  "by_api_key": {
    "Development Key": 800,
    "Production Key": 450
  },
  "success_count": 1230,
  "error_count": 20
}

Summary Fields:

Field Type Description
total_requests integer Total number of requests
total_tokens integer Total tokens consumed
total_input_tokens integer Total input tokens
total_output_tokens integer Total output tokens
avg_latency_ms float Average latency in milliseconds
by_provider object Request count by provider
by_model object Request count by model
by_api_key object Request count by API key
success_count integer Number of successful requests
error_count integer Number of failed requests

Error Handling

OpenAI-Compatible Error Format

{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Anthropic-Compatible Error Format

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid API key"
  }
}

Common HTTP Status Codes

Status Code Description
200 Success
400 Bad Request - Invalid request body or parameters
401 Unauthorized - Missing or invalid API key
403 Forbidden - API key lacks permission for the requested operation
429 Too Many Requests - Rate limit exceeded
500 Internal Server Error - Server-side error

Common Error Types

Error Type Description
invalid_request_error The request was malformed or missing required fields
authentication_error Invalid or missing API key
permission_error API key doesn't have permission for the requested resource
rate_limit_error Rate limit exceeded
model_not_found Requested model doesn't exist or isn't accessible
provider_error Error from the underlying AI provider