Skip to content

feat: add Vertex AI support for all providers#135

Merged
Kamilbenkirane merged 10 commits intomainfrom
feat/vertex-ai-support
Feb 9, 2026
Merged

feat: add Vertex AI support for all providers#135
Kamilbenkirane merged 10 commits intomainfrom
feat/vertex-ai-support

Conversation

@Kamilbenkirane
Copy link
Member

@Kamilbenkirane Kamilbenkirane commented Feb 6, 2026

Part of #118

Summary

  • Route requests through Vertex AI when GoogleADC auth is provided
  • Supports Google, Anthropic, Mistral, and DeepSeek providers across text, images, and videos modalities
  • Adds _make_poll_request pattern for long-running operations (Veo uses fetchPredictOperation on Vertex)
  • Bug fixes found during Vertex testing: error handler hardening, Gemini image role fix, DeepSeek usage parser fix
  • Unit tests for all Vertex URL routing + integration tests per modality

Status

Modality Status
Text (generate + stream) ✅ Mistral, DeepSeek, Google working. Anthropic hitting rate limits on my GCP — needs validation
Images ✅ Imagen + Gemini working
Videos (Veo) 🚧 Polling fixed (fetchPredictOperation), inline video parsing (bytesBase64Encoded) not yet handled
Audio ✅ Cloud TTS already works, integration test not yet added
Embeddings 🚧 Routing implemented, integration test not yet added

Remaining

  • Veo Vertex: handle inline bytesBase64Encoded response (or pass storageUri)
  • Audio Vertex integration test
  • Embeddings Vertex integration test
  • Validate Anthropic on Vertex (rate limit issue on my GCP — @XinyueZ can you check?)

Test plan

  • uv run pytest tests/unit_tests/test_vertex_routing.py — 39 tests pass
  • uv run pytest tests/integration_tests/text/ -m integration — Mistral, DeepSeek, Google pass
  • uv run pytest tests/integration_tests/images/ -m integration — Imagen + Gemini pass
  • uv run pytest tests/integration_tests/videos/test_generate.py::test_vertex_generate -m integration
  • uv run pytest tests/integration_tests/ -m integration — full suite

🤖 Generated with Claude Code

Route requests through Vertex AI when GoogleADC auth is provided.
Supports Google, Anthropic, Mistral, and DeepSeek providers across
text, images, and videos modalities. Includes Veo polling fix
(fetchPredictOperation), error handler hardening, Gemini image
role fix, and DeepSeek usage parser fix.

WIP: Veo Vertex inline video (bytesBase64Encoded) parsing not yet
handled — needs base64 decoding or storageUri in request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Kamilbenkirane
Copy link
Member Author

@XinyueZ Here's the branch with Vertex AI support.

Works for Mistral, DeepSeek, Google across text + images. Still hitting rate limits for Claude on my GCP — please check if it works on yours.

Videos (Veo) polling is fixed but inline response parsing still WIP. Audio already works via Cloud TTS. Embeddings routing is implemented but not yet integration-tested.

Should be finished Monday.

@claude
Copy link

claude bot commented Feb 6, 2026

Code Review: PR #135 - Vertex AI Support for All Providers

Summary

This PR adds comprehensive Vertex AI support, allowing requests to be routed through Google Cloud's Vertex AI when GoogleADC auth is provided. The implementation is well-architected and follows the existing patterns in the codebase.


✅ Strengths

1. Excellent Test Coverage

  • 557 lines of comprehensive unit tests in test_vertex_routing.py
  • Tests cover all providers (Google, Anthropic, Mistral, DeepSeek)
  • Tests verify URL routing, error handling, and edge cases (no project_id, different locations)
  • Integration tests for text, images, and videos modalities
  • Good use of mocks to avoid real API calls in unit tests

2. Consistent Pattern Implementation

  • The _build_url() and _get_vertex_endpoint() pattern is consistently applied across all providers
  • Clean separation between API key auth (provider's native endpoints) and GoogleADC auth (Vertex endpoints)
  • Follows the template pattern shown in templates/providers/ with the _make_poll_request addition

3. Good Error Handling

  • Proper validation: raises ValueError when project_id is missing for Vertex AI
  • Error handler hardening mentioned in PR description (lines 295-307 in client.py)
  • Comprehensive exception handling in _handle_error_response with fallback to response text

4. Documentation and Code Quality

  • Clear docstrings explaining auth-based endpoint selection
  • Well-commented code with inline explanations
  • Type hints throughout
  • Clear commit message with co-authorship

🔍 Issues & Concerns

Critical: Security - Hardcoded Secrets in Tests

Location: tests/unit_tests/test_vertex_routing.py:29, 44, 57, 104

mock_creds.token = "fake-token"  # nosec B105

While # nosec B105 suppresses Bandit warnings, these are still hardcoded string literals. Consider using constants or fixtures:

TEST_FAKE_TOKEN = "fake-token"  # nosec B105
mock_creds.token = TEST_FAKE_TOKEN

Priority: Medium (test code only, but sets a precedent)


Bug: Gemini Image Role Fix Not Visible

Issue: The PR description mentions "Gemini image role fix" but the only change in src/celeste/modalities/images/providers/google/gemini.py is:

- return {
+ return {  # Fixed role placement

Question: What was the actual bug? The change appears cosmetic. If there was a role-related fix, it's not evident in the diff.

Action Required: Clarify what the "Gemini image role fix" addressed.


Bug: DeepSeek Usage Parser - Where's the Fix?

Issue: PR description mentions "DeepSeek usage parser fix" but src/celeste/providers/deepseek/chat/client.py shows no changes to usage parsing logic. The map_usage_fields method correctly handles nested prompt_tokens_details and completion_tokens_details.

Question: What was broken in the DeepSeek usage parser?

Action Required: Clarify or add a test case demonstrating the fix.


Incomplete: Veo Inline Video Parsing

Location: src/celeste/providers/google/veo/client.py:171-191

The PR description explicitly marks this as WIP:

Veo Vertex: handle inline bytesBase64Encoded response (or pass storageUri)

Current implementation only handles:

  • Vertex: response.videos[0]
  • AI Studio: response.generateVideoResponse.generatedSamples[0].video

Missing: Base64-encoded inline video handling for Vertex responses

Recommendation: Either:

  1. Add a TODO comment in the code at line 178-183
  2. Open a follow-up issue to track this
  3. Add a test that's marked as @pytest.mark.skip with reason "Inline video parsing not yet implemented"

Performance: Veo Polling Interval

Location: src/celeste/providers/google/veo/config.py:23

POLL_INTERVAL = 10  # seconds

Concern: 10-second intervals may be too aggressive for long-running video generation operations (Veo can take 1-3 minutes). This could lead to excessive API calls.

Recommendation: Consider exponential backoff or increase to 15-20 seconds initial interval.


Code Quality: Inconsistent isinstance() Patterns

Location: Multiple client files

Some files check isinstance(self.auth, GoogleADC) while the template suggests this should be a universal pattern. However, the Cloud TTS client has additional logic:

# src/celeste/providers/google/cloud_tts/client.py (line 540-557 in tests)
if not isinstance(self.auth, GoogleADC):
    object.__setattr__(self, "auth", GoogleADC())

Question: Should all Google providers auto-initialize GoogleADC() when auth is not provided, or is this specific to Cloud TTS?

Recommendation: Document this pattern decision in code comments or contributing guidelines.


Missing: Embeddings & Audio Integration Tests

Status in PR:

  • ✅ Text (generate + stream) - Mistral, DeepSeek, Google working
  • ✅ Images - Imagen + Gemini working
  • 🚧 Videos (Veo) - Polling fixed, inline parsing pending
  • ❌ Audio - Routing implemented, integration test not added
  • ❌ Embeddings - Routing implemented, integration test not added

Recommendation: Add integration tests or mark these as known gaps in the PR description with follow-up issues.


📝 Minor Issues

1. Type Safety: Mock Object Attributes

Location: tests/unit_tests/test_vertex_routing.py:104-105

object.__setattr__(auth, "_credentials", MagicMock(valid=True, token="fake"))  # nosec B106
object.__setattr__(auth, "_project", "adc-fallback-project")

Using object.__setattr__() to bypass Pydantic's frozen model validation is a code smell. Consider:

  • Using Pydantic's model_construct() for test objects
  • Creating a test-specific factory that properly constructs auth objects

2. Documentation: Vertex Endpoint Patterns

The Vertex endpoint patterns are complex:

  • Google models: /publishers/google/models/{model_id}:predict
  • Anthropic: /publishers/anthropic/models/{model_id}:rawPredict
  • Mistral: /publishers/mistralai/models/{model_id}:rawPredict
  • DeepSeek: /endpoints/openapi/chat/completions (no model in URL)

Recommendation: Add a comment in the config files explaining why different publishers use different endpoint patterns (rawPredict vs predict vs OpenAI-compatible).


3. Magic Numbers: Test Token Limits

Location: tests/integration_tests/text/test_generate.py:24

TEST_MAX_TOKENS = 200

Consider moving this to a test fixtures or conftest.py if it's reused across multiple test files.


🔒 Security Considerations

Positive Security Aspects:

  1. No credentials in code: All auth goes through GoogleADC
  2. Proper use of SecretStr: API keys use Pydantic's SecretStr
  3. No hardcoded project IDs: All project IDs come from ADC or user input
  4. GCS URL handling: Proper validation when converting gs:// to https:// (line 232-233 in veo/client.py)

⚠️ Security Review Points:

  1. Error messages: Line 301 in client.py falls back to response.text - ensure this doesn't leak sensitive data in error responses
  2. Token expiration: GoogleADC credentials refresh is handled by Google's auth library - verify this works correctly with long-running operations (Veo polling)

🎯 Performance Considerations

  1. HTTP Connection Pooling: Uses shared http_client via get_http_client() - good for connection reuse
  2. Polling efficiency: See Veo polling interval concern above
  3. No unnecessary requests: URL building is lazy (only when requests are made)

📋 Test Plan Validation

From PR description:

  • ✅ Unit tests: 39 tests for Vertex routing
  • ✅ Integration tests: Mistral, DeepSeek, Google text generation pass
  • ✅ Integration tests: Imagen + Gemini images pass
  • ⏳ Veo integration test pending (inline video parsing)
  • ⏳ Anthropic rate limit issue (needs external validation)
  • ⏳ Full integration suite not run yet

Recommendation: Run the full integration suite before merge, or document known failing tests.


🚀 Recommendations for Merge

Before Merging:

  1. Can merge as-is IF: The "Gemini image role fix" and "DeepSeek usage parser fix" are clarified as non-functional changes or already working correctly
  2. Should fix: Add TODO comments for Veo inline video parsing at line 178-183
  3. Should add: Integration tests for Audio and Embeddings OR open follow-up issues

After Merging:

  1. Create follow-up issues for:
    • Veo inline bytesBase64Encoded handling
    • Anthropic rate limit validation
    • Audio integration tests
    • Embeddings integration tests
  2. Consider adding Vertex AI documentation to the main README
  3. Update api_references.md with Vertex AI endpoint documentation

🎉 Overall Assessment

Rating: ⭐⭐⭐⭐½ (4.5/5)

This is high-quality work with excellent test coverage and consistent implementation. The architecture is sound, the code is clean, and it follows established patterns. The main concerns are:

  1. Incomplete features (Veo inline video) that should be documented
  2. Missing integration tests for Audio/Embeddings
  3. Minor clarifications needed on the "fixes" mentioned in PR description

Recommendation: Approve with minor changes requested

Great work on adding comprehensive Vertex AI support! 🎊


Reviewed with: Claude Code + Claude Sonnet 4.5

Copy link
Contributor

@Seluj78 Seluj78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far

Kamilbenkirane and others added 6 commits February 7, 2026 23:38
…auth

The google-auth package is optional under [gcp], but unit tests in
test_vertex_routing.py and Vertex integration tests need it importable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… mismatch

Vertex Veo responses use videoGcsUri (not uri/gcsUri) and can return
inline base64 instead of a GCS URL. Normalize the key and decode
inline responses directly into VideoArtifact.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dpoint

Vertex embeddings uses :predict with instances format, not :embedContent.
Build correct request body in _init_request when auth is GoogleADC, and
parse predictions response format in _parse_content. Add integration test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…exEndpoint, update templates

- Move isinstance(self.auth, GoogleADC) check from modality _init_request() to
  provider mixin _make_request() for embeddings, keeping auth logic in provider layer
- Fix misplaced class docstring in GoogleEmbeddingsClient mixin
- Rename VertexEndpoint to VertexGenerateContentEndpoint for consistency with
  VertexImagenEndpoint, VertexEmbeddingsEndpoint, etc.
- Add Vertex AI routing patterns (commented) to provider templates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Kamilbenkirane Kamilbenkirane marked this pull request as ready for review February 9, 2026 11:19
Kamilbenkirane and others added 3 commits February 9, 2026 12:53
Move duplicated project_id validation, base URL resolution, and endpoint
formatting from 7 provider _build_url() methods into GoogleADC.build_url().
Also remove manual base64.b64decode from video client (Artifact validator
handles it).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…images

Same pattern as the video client fix - pass base64 string directly to
ImageArtifact(data=...) instead of manual base64.b64decode().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same pattern as Gemini images and Veo video fixes - pass base64 string
directly to ImageArtifact(data=...) instead of manual base64.b64decode().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Kamilbenkirane Kamilbenkirane merged commit a6ef5cf into main Feb 9, 2026
10 of 11 checks passed
@Kamilbenkirane Kamilbenkirane mentioned this pull request Feb 9, 2026
@Kamilbenkirane Kamilbenkirane mentioned this pull request Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants