Skip to content

Latest commit

 

History

History
249 lines (189 loc) · 7.3 KB

File metadata and controls

249 lines (189 loc) · 7.3 KB

Google Vertex AI Provider for Anthropic Claude

This document describes how to configure and use the Google Vertex AI provider in Cortex to access Anthropic Claude models through Google Cloud Platform infrastructure.

Overview

The Vertex AI provider enables accessing Anthropic Claude models via Google Cloud Platform's Vertex AI service. This provides:

  • Google Cloud IAM Integration: Use GCP authentication instead of Anthropic API keys
  • Regional Deployment: Access models from different GCP regions
  • Enterprise SLAs: Benefit from Google Cloud's enterprise-grade service agreements
  • GCP Billing: Unified billing through your Google Cloud account
  • OAuth Token Management: Automatic token refresh for service accounts and ADC

Prerequisites

  1. GCP Project: A Google Cloud Platform project with Vertex AI API enabled
  2. IAM Permissions: Role roles/aiplatform.user or equivalent permissions
  3. Authentication: One of the following:
    • Bearer token (OAuth2 access token)
    • Service account JSON key file
    • Application Default Credentials (ADC)

Configuration

Basic Configuration

Add the Vertex AI provider to your cortex.yaml:

providers:
  vertex-claude:
    type: anthropic-vertex
    description: "Claude via Google Vertex AI"
    supports_streaming: true
    supports_tool_calling: true
    provider_config:
      project_id: "your-gcp-project-id"
      region: "us-east5"
      auth_type: "bearer_token"
      bearer_token: "${GCP_BEARER_TOKEN}"

Configuration Fields

Field Required Default Description
project_id Yes - Your GCP project ID
region No us-east5 GCP region for Vertex AI
auth_type No adc Authentication method
bearer_token No - OAuth2 bearer token (required if auth_type=bearer_token)
service_account_file No - Path to service account JSON key (required if auth_type=service_account)
service_account_json No - Raw service account JSON (alternative to file path)

Supported Regions

  • us-east5 (Default)
  • us-central1
  • europe-west1
  • asia-southeast1

Authentication Methods

1. Bearer Token (OAuth2)

Use a GCP OAuth2 access token (typically obtained via gcloud auth print-access-token):

providers:
  vertex-claude:
    type: anthropic-vertex
    provider_config:
      project_id: "my-gcp-project"
      region: "us-east5"
      auth_type: "bearer_token"
      bearer_token: "${GCP_BEARER_TOKEN}"

Set the environment variable:

export GCP_BEARER_TOKEN=$(gcloud auth print-access-token)

Note: Bearer tokens expire after ~1 hour. For production, use service accounts or ADC.

2. Service Account Key File

Use a service account JSON key file:

providers:
  vertex-claude:
    type: anthropic-vertex
    provider_config:
      project_id: "my-gcp-project"
      region: "us-east5"
      auth_type: "service_account"
      service_account_file: "/path/to/service-account-key.json"

The provider will automatically refresh tokens as needed.

3. Application Default Credentials (ADC)

Use the default GCP credentials configured on your system:

providers:
  vertex-claude:
    type: anthropic-vertex
    provider_config:
      project_id: "my-gcp-project"
      region: "us-east5"
      auth_type: "adc"

ADC will use credentials from:

  1. GOOGLE_APPLICATION_CREDENTIALS environment variable
  2. gcloud CLI configuration
  3. GCE metadata server (when running on Google Cloud)

Model Mapping

The Vertex AI provider automatically transforms Anthropic model IDs to Vertex AI format:

Anthropic Model ID Vertex AI Model ID
claude-3-5-sonnet-20241022 claude-3-5-sonnet-v2@20241022
claude-3-opus-20240229 claude-3-opus@20240229
claude-3-sonnet-20240229 claude-3-sonnet@20240229
claude-3-haiku-20240307 claude-3-haiku@20240307

You can use Anthropic model IDs in your requests - the provider handles the transformation automatically.

Example Requests

Using OpenAI-Compatible API

curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-cortex-api-key" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [
      {
        "role": "user",
        "content": "Hello from Vertex AI!"
      }
    ],
    "max_tokens": 1024
  }'

Using Anthropic API Format

curl -X POST http://localhost:8090/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: your-cortex-api-key" \
  -d '{
    "model": "claude-3-opus-20240229",
    "messages": [
      {
        "role": "user",
        "content": "Hello from Vertex AI!"
      }
    ],
    "max_tokens": 1024
  }'

Database Storage

Provider configuration is automatically stored in the database when migrating from YAML. The configuration is stored in the providers table with:

  • type: anthropic-vertex
  • provider_config_json: Contains the Vertex AI configuration (project_id, region, auth settings)
  • bearer_token is encrypted in the database for security

Environment Variables

You can use environment variable expansion in your configuration:

providers:
  vertex-claude:
    type: anthropic-vertex
    provider_config:
      project_id: "${GCP_PROJECT_ID}"
      region: "${GCP_REGION}"
      auth_type: "bearer_token"
      bearer_token: "${GCP_BEARER_TOKEN}"

Troubleshooting

Error: "missing required project_id"

Ensure your configuration includes project_id in provider_config:

provider_config:
  project_id: "your-gcp-project-id"

Error: "failed to create vertex middleware"

Check that:

  1. Your project_id is correct
  2. Your region is one of the supported regions
  3. Your authentication configuration is valid

Error: "authentication failed"

Verify your authentication:

  • For bearer tokens: Check that the token is valid and not expired
  • For service accounts: Verify the JSON key file path and permissions
  • For ADC: Ensure gcloud auth application-default login is configured

Model Not Available in Region

Some models may not be available in all regions. Try switching to a different region:

provider_config:
  region: "us-east5"  # Try us-east5 for the widest model support

Performance Considerations

  • Token Refresh: Service account and ADC authentication automatically refresh tokens
  • Regional Latency: Choose a region close to your users for lowest latency
  • Streaming: Streaming is fully supported and recommended for interactive applications

Security Best Practices

  1. Use Service Accounts: Prefer service accounts over bearer tokens for production
  2. Rotate Keys: Regularly rotate service account keys
  3. Least Privilege: Grant only roles/aiplatform.user or minimal required permissions
  4. Encrypt Sensitive Data: Bearer tokens are automatically encrypted in the database
  5. Environment Variables: Use environment variables for sensitive configuration

See Also