feat: standardize image handling across all providers #62

Pratham-Mishra04 · 2025-06-08T20:24:52Z

Unified Image Content Handling Across Providers

This PR implements a standardized approach to handling image content across all LLM providers. It adds two key utility functions:

NormalizeImageContent - Automatically detects content type from URL patterns and normalizes media types
ImageDataURLBuilder - Formats base64 image data according to provider requirements

The implementation:

Adds constants for image content types in the schema
Ensures proper handling of both URL and base64 image formats
Adapts each provider to use the new utilities:
- Anthropic: Uses raw base64 data without prefix
- OpenAI: Uses full data URI format
- Bedrock: Properly formats images for Claude models
- Cohere: Adds vision support with proper image formatting
- Vertex: Enables image support

Tests have been updated to verify image handling across providers, including both URL and base64 formats.

coderabbitai · 2025-06-08T20:24:59Z

Summary by CodeRabbit

New Features
- Enhanced support for image content, including improved detection, normalization, and formatting of image data (URLs and base64) across providers.
- Expanded test coverage for image-related features with additional models and providers now included in image tests.
Bug Fixes
- Improved consistency in handling image content types and media types to ensure reliable image processing in chat and vision-capable models.
Refactor
- Streamlined internal image content handling for greater reliability and maintainability.

Summary by CodeRabbit

New Features
- Enhanced support for image content in chat messages, including improved handling and normalization of image data for various providers (Anthropic, Bedrock, Cohere, OpenAI).
- Added support for vision-capable models, such as "c4ai-aya-vision-8b" for Cohere.
Bug Fixes
- Prevented unintended changes to original image data during processing by using copies for normalization and formatting.
Tests
- Enabled image and base image test scenarios for Azure, OpenAI, and Vertex providers.
- Updated image test setup to use new image content type constants.
Refactor
- Updated internal methods and data structures for more robust image content handling and type safety.

Walkthrough

This update introduces a robust normalization and formatting pipeline for image content across multiple providers. The ImageContent struct is enhanced with type safety, normalization, and data URL formatting methods. Provider implementations (Anthropic, Bedrock, Cohere, OpenAI) now copy and normalize image content before use, ensuring immutability and consistent API payloads. Related tests and internal APIs are updated accordingly.

Changes

File(s)	Change Summary
core/schemas/bifrost.go	Adds `ImageContentType` type and constants; updates `ImageContent` struct fields; adds JSON omitempty tags.
core/providers/anthropic.go	Copies and normalizes image content before processing; updates image source map logic; avoids mutating original data.
core/providers/bedrock.go	Copies and formats image content before constructing BedrockAnthropicImageMessage; normalizes media type and URL usage.
core/providers/cohere.go	Adds preliminary image content support; introduces `processImageContent` helper to normalize and format image data.
core/providers/openai.go	Copies and normalizes image content before formatting as data URI in chat request preparation.
core/bifrost.go	Renames some exported methods to unexported; updates internal calls and comments accordingly.
core/providers/utils.go	Adds image content normalization and formatting functions; introduces regex and media type normalization utilities.
core/tests/account.go	Adds `"c4ai-aya-vision-8b"` model to Cohere provider's key mapping.
core/tests/azure_test.go	Enables base image setup flag in Azure provider tests.
core/tests/openai_test.go	Enables image and base image test flags for OpenAI provider tests.
core/tests/tests.go	Updates image content type assignments in tests to use new constants instead of string pointers.
core/tests/vertex_test.go	Enables image and base image test flags for Vertex provider tests.

Sequence Diagram(s)

sequenceDiagram
    participant UserMessage
    participant Provider
    participant ImageContent
    participant APIRequest

    UserMessage->>Provider: Send message (may include image)
    Provider->>ImageContent: Copy and Normalize()
    ImageContent->>ImageContent: Infer type, normalize media type
    Provider->>ImageContent: FormatDataURL(includePrefix)
    ImageContent->>Provider: Return formatted data
    Provider->>APIRequest: Build payload with normalized image data
    APIRequest->>Provider: Send to external API

Possibly related PRs

feat: filter out empty API keys and improve tool calling support #55: Refactors prepareAnthropicChatRequest and improves image content handling and tool message merging in core/providers/anthropic.go.
feat: Refactor message schema and enhance multi-modal support across AI providers #54: Refactors Anthropic provider to adopt a new message schema and enhance multi-modal support, including image content processing.

Suggested reviewers

danpiths
akshaydeo

Poem

A bunny hopped through code today,
Tidying images along the way.
With types and formats now pristine,
Each provider’s logic squeaky clean.
From Cohere to OpenAI’s den,
Our tests now pass—again and again!
🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ef4bbd and 43c207e.

📒 Files selected for processing (12)

core/bifrost.go (5 hunks)
core/providers/anthropic.go (3 hunks)
core/providers/bedrock.go (1 hunks)
core/providers/cohere.go (3 hunks)
core/providers/openai.go (2 hunks)
core/providers/utils.go (3 hunks)
core/schemas/bifrost.go (1 hunks)
core/tests/account.go (1 hunks)
core/tests/azure_test.go (1 hunks)
core/tests/openai_test.go (1 hunks)
core/tests/tests.go (2 hunks)
core/tests/vertex_test.go (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

core/tests/tests.go (1)

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#64
File: transports/bifrost-http/integrations/genai/types.go:383-395
Timestamp: 2025-06-09T16:26:05.727Z
Learning: In the Bifrost schema (core/schemas/bifrost.go), only UserMessage and ToolMessage structs have ImageContent fields. AssistantMessage does not have an ImageContent field and cannot contain images. The schema design intentionally limits image content to user and tool messages only.

core/providers/bedrock.go (1)

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/anthropic.go:526-550
Timestamp: 2025-06-04T09:29:46.287Z
Learning: In core/providers/anthropic.go, the content field in formattedMessages is always of type []interface{} because it's explicitly constructed that way upstream in the prepareAnthropicChatRequest function. Defensive type casting for multiple types is not needed since the type is guaranteed by the construction logic.

core/providers/anthropic.go (2)

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/anthropic.go:526-550
Timestamp: 2025-06-04T09:29:46.287Z
Learning: In core/providers/anthropic.go, the content field in formattedMessages is always of type []interface{} because it's explicitly constructed that way upstream in the prepareAnthropicChatRequest function. Defensive type casting for multiple types is not needed since the type is guaranteed by the construction logic.

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#64
File: transports/bifrost-http/integrations/genai/types.go:383-395
Timestamp: 2025-06-09T16:26:05.727Z
Learning: In the Bifrost schema (core/schemas/bifrost.go), only UserMessage and ToolMessage structs have ImageContent fields. AssistantMessage does not have an ImageContent field and cannot contain images. The schema design intentionally limits image content to user and tool messages only.

🧬 Code Graph Analysis (5)

core/tests/tests.go (1)

core/schemas/bifrost.go (2)

ImageContentTypeURL (177-177)

ImageContentTypeBase64 (176-176)

core/providers/bedrock.go (1)

core/providers/utils.go (1)

FormatImageContent (391-445)

core/bifrost.go (2)

core/schemas/bifrost.go (1)

ModelProvider (31-31)

core/schemas/provider.go (1)

Provider (142-149)

core/providers/cohere.go (2)

core/schemas/bifrost.go (2)

UserMessage (155-157)

ImageContent (181-186)

core/providers/utils.go (1)

FormatImageContent (391-445)

core/providers/utils.go (1)

core/schemas/bifrost.go (3)

ImageContent (181-186)

ImageContentTypeBase64 (176-176)

ImageContentTypeURL (177-177)

🪛 golangci-lint (1.64.8)

core/providers/cohere.go

468-468: func processImageContent is unused

(unused)

🔇 Additional comments (11)

core/tests/vertex_test.go (1)

24-25: LGTM: Image testing enabled for Vertex provider.

The change to enable image and base image testing aligns with the PR's goal of standardizing image handling across all providers.

core/tests/azure_test.go (1)

25-25: LGTM: Base image testing enabled for Azure provider.

Completing the image testing setup by enabling base image tests, consistent with the standardization effort across providers.

core/tests/account.go (1)

80-80: LGTM: Vision model added for Cohere provider.

Adding the "c4ai-aya-vision-8b" model provides necessary vision capabilities for testing Cohere's image handling functionality as part of the standardization effort.

core/tests/openai_test.go (1)

25-26: LGTM: Comprehensive image testing enabled for OpenAI provider.

Enabling both image and base image testing completes the test coverage for OpenAI's image handling capabilities, consistent with the standardization across all providers.

core/tests/tests.go (1)

192-192: LGTM! Type safety improvement aligns with schema changes.

The updates correctly replace string pointer assignments with the new typed constants ImageContentTypeURL and ImageContentTypeBase64, improving type safety and consistency with the updated schema.

Also applies to: 228-228

core/providers/openai.go (1)

229-229: LGTM! Correct implementation of image content formatting.

The code properly formats image content once using FormatImageContent with includePrefix=true (appropriate for OpenAI's data URI requirements) and consistently uses the formatted content throughout. This avoids redundant formatting calls and ensures normalized image handling.

Also applies to: 244-244, 248-250

core/providers/bedrock.go (1)

497-497: LGTM! Correct implementation following the established pattern.

The code properly formats image content once using FormatImageContent with includePrefix=false (appropriate for Bedrock's raw base64 requirements) and correctly processes the media type by removing the "image/" prefix. This follows the same pattern as other providers and avoids redundant formatting calls.

Also applies to: 503-512

core/schemas/bifrost.go (2)

173-178: LGTM! Well-designed type safety improvement.

The introduction of ImageContentType as a string alias with clearly named constants improves type safety and eliminates the need for string pointer comparisons. This provides better compile-time guarantees and aligns with Go's type safety principles.

182-186: LGTM! Improved struct design with better JSON handling.

The updates to ImageContent struct are well-implemented:

Using ImageContentType instead of *string for the Type field eliminates nil pointer concerns

Adding omitempty tags to MediaType and Detail improves JSON serialization by excluding empty optional fields

The changes support the standardized image handling across providers mentioned in the PR objectives

core/bifrost.go (1)

218-220: LGTM! Good encapsulation improvement.

The method visibility changes from exported to unexported are appropriate for internal methods that don't need to be part of the public API. All callers have been consistently updated.

Also applies to: 301-301, 414-417, 515-515, 689-689

core/providers/anthropic.go (1)

377-408: Consistent defensive programming pattern applied.

The implementation correctly creates copies of image content in both branches, ensuring the original ImageContent objects are not modified during processing. This follows the established pattern from the buildAnthropicImageSourceMap function.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Pratham-Mishra04 · 2025-06-08T20:25:09Z

feat: standardize image handling across all providers #62 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac54c86 and 5ad6001.

📒 Files selected for processing (13)

core/providers/anthropic.go (1 hunks)
core/providers/bedrock.go (1 hunks)
core/providers/cohere.go (2 hunks)
core/providers/openai.go (1 hunks)
core/providers/utils.go (2 hunks)
core/schemas/bifrost.go (1 hunks)
core/tests/account.go (1 hunks)
core/tests/azure_test.go (1 hunks)
core/tests/bedrock_test.go (1 hunks)
core/tests/cohere_test.go (1 hunks)
core/tests/openai_test.go (1 hunks)
core/tests/tests.go (2 hunks)
core/tests/vertex_test.go (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (6)

core/providers/openai.go (1)

core/providers/utils.go (2)

NormalizeImageContent (348-373)

ImageDataURLBuilder (381-427)

core/tests/tests.go (1)

core/schemas/bifrost.go (2)

ImageContentTypeURL (177-177)

ImageContentTypeBase64 (176-176)

core/providers/utils.go (1)

core/schemas/bifrost.go (3)

ImageContent (181-190)

ImageContentTypeBase64 (176-176)

ImageContentTypeURL (177-177)

core/providers/bedrock.go (1)

core/providers/utils.go (2)

NormalizeImageContent (348-373)

ImageDataURLBuilder (381-427)

core/providers/cohere.go (2)

core/schemas/bifrost.go (2)

UserMessage (155-157)

ImageContent (181-190)

core/providers/utils.go (2)

NormalizeImageContent (348-373)

ImageDataURLBuilder (381-427)

core/providers/anthropic.go (2)

core/providers/utils.go (2)

NormalizeImageContent (348-373)

ImageDataURLBuilder (381-427)

core/schemas/bifrost.go (2)

ImageContentTypeBase64 (176-176)

ImageContentTypeURL (177-177)

🔇 Additional comments (19)

core/tests/cohere_test.go (1)

21-21: Model change aligns with vision capabilities.

The switch to "c4ai-aya-vision-8b" model is appropriate for testing vision functionality as part of the image handling standardization effort.

However, consider enabling image testing (SetupImage/SetupBaseImage) to ensure the new image handling utilities work correctly with Cohere's vision model.

core/tests/bedrock_test.go (1)

28-28: Good enablement of image testing for Bedrock.

Enabling SetupImage testing while maintaining the informative comment about URL limitations provides comprehensive test coverage for Bedrock's image handling capabilities.

core/tests/azure_test.go (1)

25-25: Complete image testing coverage for Azure.

Enabling SetupBaseImage alongside the existing SetupImage provides comprehensive test coverage for both URL and base64 image formats, ensuring the standardized image handling works correctly with Azure.

core/tests/vertex_test.go (1)

24-25: Comprehensive image testing enabled for Vertex.

Enabling both SetupImage and SetupBaseImage provides complete test coverage for Vertex's image handling capabilities, ensuring the standardized image utilities work correctly with both URL and base64 formats.

core/tests/openai_test.go (1)

25-26: LGTM! Enabling image test scenarios for OpenAI.

The changes correctly enable image testing for the OpenAI provider, which supports vision capabilities. This aligns with the PR's objective to standardize and test image handling across all providers.

core/tests/account.go (1)

80-80: Good addition of vision-capable model for Cohere.

Adding the "c4ai-aya-vision-8b" model supports the enhanced image handling capabilities introduced in this PR. This ensures Cohere provider can be properly tested with image content.

core/tests/tests.go (2)

192-192: Excellent type safety improvement.

Replacing the string literal with the typed constant schemas.ImageContentTypeURL improves type safety and aligns with the new schema definitions. The removal of bifrost.Ptr() indicates the field is no longer a pointer type.

228-228: Consistent use of typed constants for base64 content.

The change to schemas.ImageContentTypeBase64 maintains consistency with the URL content type change above and improves type safety.

core/providers/openai.go (1)

229-232: Well-implemented image content standardization for OpenAI.

The implementation correctly applies the new image processing utilities:

NormalizeImageContent handles type inference and media type normalization

ImageDataURLBuilder with includePrefix=true ensures OpenAI receives the required full data URI format

The processing order is logical - normalize first, then format according to provider requirements. This aligns perfectly with the PR's objective to standardize image handling while respecting provider-specific formatting needs.

core/providers/bedrock.go (2)

497-500: LGTM! Proper image content normalization for Bedrock.

The implementation correctly uses the new utility functions to normalize image content and format it for Bedrock's requirements (raw base64 data without prefix).

508-509: Good defensive programming with prefix removal.

The TrimPrefix call correctly handles cases where the media type might already include the "image/" prefix, ensuring clean format strings for Bedrock's API.
core/schemas/bifrost.go (2)

173-178: Excellent type safety improvement with ImageContentType enum.

The new enum type prevents string typos and makes the API more self-documenting. The constants clearly define the supported image content types.

182-186:
✅ Verification successful

Breaking change: Type field no longer nullable.

The change from *string to ImageContentType improves type safety but is a breaking change. The field is now required rather than optional. The added documentation clearly explains the expected format for base64 data.

Since this is a breaking change, verify that all existing usages have been updated:

🏁 Script executed:
#!/bin/bash
# Description: Search for any remaining usage of pointer assignment to ImageContent.Type field
# Expected: No results, as Type field is no longer a pointer

# Search for code that might be assigning pointers to Type field
rg -A 3 -B 3 "\.Type\s*=" --type go

# Also search for any nil assignments or checks on Type field  
rg -A 3 -B 3 "\.Type.*nil" --type go
Length of output: 7105
Approve: ImageContent.Type is now non-nullable and all usages are updated

All references to the ImageContent.Type field have been converted from a pointer to the ImageContentType enum, and no lingering pointer assignments or nil checks remain:

In core/providers/utils.go, imgContent.Type is inferred and set to schemas.ImageContentTypeBase64 or schemas.ImageContentTypeURL.

In core/providers/anthropic.go, imgCopy.Type is compared against schemas.ImageContentTypeBase64 and schemas.ImageContentTypeURL.

No further changes are required.
core/providers/cohere.go (2)

238-269: Well-structured image handling for chat history.

The implementation correctly handles vision-capable messages by creating content arrays with separate text and image objects. Using ImageDataURLBuilder with true parameter is appropriate for Cohere's API which expects full data URI format.

282-313: Consistent image handling for the last message.

The logic mirrors the chat history implementation, ensuring consistent behavior. The content array structure properly separates text and image data as expected by vision-capable models.

core/providers/anthropic.go (2)

322-337: Excellent defensive programming with image content normalization.

Creating a copy of the input prevents unintended side effects, and the conditional application of ImageDataURLBuilder only for base64 content is smart. The false parameter correctly produces raw base64 data as required by Anthropic's API.

338-351: Clean source map construction for different image types.

The implementation correctly handles both URL and base64 image types, setting appropriate fields in the source map. The type casting to string for the enum value is necessary and correct.

core/providers/utils.go (2)

10-10: LGTM: New import for regex support.

The regexp import is correctly added to support the new image content processing functions.

315-344: LGTM: Clean media type normalization logic.

The function correctly handles common image formats and provides sensible defaults. The switch statement covers the most common image types and the fallback logic is appropriate.

core/providers/utils.go

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ad6001 and 87688e5.

📒 Files selected for processing (13)

core/providers/anthropic.go (1 hunks)
core/providers/bedrock.go (1 hunks)
core/providers/cohere.go (2 hunks)
core/providers/openai.go (1 hunks)
core/providers/utils.go (3 hunks)
core/schemas/bifrost.go (1 hunks)
core/tests/account.go (1 hunks)
core/tests/azure_test.go (1 hunks)
core/tests/bedrock_test.go (1 hunks)
core/tests/cohere_test.go (1 hunks)
core/tests/openai_test.go (1 hunks)
core/tests/tests.go (2 hunks)
core/tests/vertex_test.go (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (6)

core/providers/openai.go (1)

core/providers/utils.go (2)

NormalizeImageContent (352-376)

FormatImageDataURL (384-427)

core/providers/bedrock.go (1)

core/providers/utils.go (2)

NormalizeImageContent (352-376)

FormatImageDataURL (384-427)

core/providers/cohere.go (2)

core/schemas/bifrost.go (2)

UserMessage (155-157)

ImageContent (181-186)

core/providers/utils.go (2)

NormalizeImageContent (352-376)

FormatImageDataURL (384-427)

core/tests/tests.go (1)

core/schemas/bifrost.go (2)

ImageContentTypeURL (177-177)

ImageContentTypeBase64 (176-176)

core/providers/anthropic.go (2)

core/providers/utils.go (2)

NormalizeImageContent (352-376)

FormatImageDataURL (384-427)

core/schemas/bifrost.go (2)

ImageContentTypeBase64 (176-176)

ImageContentTypeURL (177-177)

core/providers/utils.go (1)

core/schemas/bifrost.go (3)

ImageContent (181-186)

ImageContentTypeBase64 (176-176)

ImageContentTypeURL (177-177)

🔇 Additional comments (15)

core/schemas/bifrost.go (2)

173-178: Excellent type safety improvement for image content handling.

The introduction of ImageContentType as a string alias with well-defined constants enhances type safety and prevents invalid values. This standardization aligns perfectly with the PR's goal of unifying image handling across providers.

182-185: Good structural improvements to ImageContent.

The changes improve the struct design:

Changing Type from *string to ImageContentType (non-pointer) ensures the field is always present with a valid value

Adding omitempty tags to optional fields keeps JSON responses clean

Note: This is a breaking change from the previous *string type, but it's justified for better type safety and API consistency.

core/tests/azure_test.go (1)

25-25: Good consistency with other providers' image testing.

Enabling SetupBaseImage: true for Azure completes the image capability testing alongside URL-based images, maintaining consistency with the standardized image handling approach across providers.

core/tests/vertex_test.go (1)

24-25: LGTM! Image testing enabled for Vertex provider.

The changes correctly enable both URL and base64 image test scenarios for the Vertex provider, aligning with the PR's objective to standardize image handling across all providers.

core/tests/openai_test.go (1)

25-26: LGTM! Image testing enabled for OpenAI provider.

The changes correctly enable both URL and base64 image test scenarios for the OpenAI provider, consistent with the standardization effort across all providers.

core/tests/account.go (1)

80-80: LGTM! Vision-capable model added for Cohere.

The addition of "c4ai-aya-vision-8b" to Cohere's supported models list appropriately supports the vision capabilities being standardized across providers.

core/tests/tests.go (1)

192-192: Excellent improvement! Using typed constants instead of magic strings.

The replacement of raw string literals with schemas.ImageContentTypeURL and schemas.ImageContentTypeBase64 constants improves type safety and maintainability. This aligns perfectly with the standardization effort outlined in the PR objectives.

Also applies to: 228-228

core/providers/openai.go (1)

229-232: LGTM: Correct implementation of image content standardization for OpenAI.

The implementation properly normalizes image content and formats it with the full data URI prefix (includePrefix=true), which aligns with OpenAI's API requirements.
core/providers/bedrock.go (2)

497-500: LGTM: Correct image normalization and formatting for Bedrock.

The implementation properly normalizes image content and formats it as raw base64 data (includePrefix=false), which is required by Bedrock's API.

508-509:
✅ Verification successful

Verify that normalized media type includes the full format string.

The comment states that normalizeMediaType ensures the full format, so the "image/" prefix is trimmed. Please verify that the utility function actually provides the complete media type string (e.g., "image/jpeg" not just "jpeg").

🏁 Script executed:
#!/bin/bash
# Verify that normalizeMediaType function provides full media type strings
ast-grep --pattern 'func normalizeMediaType($$$) $$$'
Length of output: 1413
Confirmed normalizeMediaType outputs full media types
The normalizeMediaType utility always returns a full type string prefixed with “image/” (defaulting to “image/jpeg” or mapping known formats), so trimming that prefix here correctly yields just the subtype.
core/providers/utils.go (5)

10-10: LGTM: Import addition is appropriate.

The regexp import is correctly added to support the new image content utilities.

29-31: LGTM: Performance optimization implemented.

The precompiled regex at package level addresses the performance concern from previous reviews. This avoids recompiling the regex on every function call.

319-348: LGTM: Comprehensive media type normalization.

The function handles common image formats well and provides a sensible default. The logic for adding the "image/" prefix is sound and the switch statement is appropriate for this number of cases.

350-376: LGTM: Robust type inference logic.

The function correctly handles different URL patterns for type inference:

Data URI format detection using the precompiled regex

HTTP/HTTPS URL detection for regular URLs

Fallback to base64 for other cases

The media type normalization is properly integrated.

378-427: LGTM: Well-implemented data URL formatting.

The function correctly handles both scenarios:

URLs that already have data URI format (extracts and rebuilds)

Raw base64 data (constructs data URI)

The includePrefix parameter provides the flexibility needed for different provider requirements. The documentation clearly states that the function modifies the input struct in place.

core/tests/cohere_test.go

core/tests/bedrock_test.go

core/providers/cohere.go

core/providers/anthropic.go

core/providers/openai.go

coderabbitai

Actionable comments posted: 5

🔭 Outside diff range comments (1)

core/providers/anthropic.go (1)
387-397: 🧹 Nitpick (assertive)

Unnecessary double copy of ImageContent

messageImageContent is copied once here and copied again inside buildAnthropicImageSourceMap.
Keeping a single copy is sufficient and marginally cheaper:
-   // Create a copy to avoid modifying the original
-   messageImageContent = *msg.UserMessage.ImageContent
+   messageImageContent := msg.UserMessage.ImageContent
(or drop the outer copy altogether and let buildAnthropicImageSourceMap handle it).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ce822c and b507c49.

📒 Files selected for processing (12)

core/bifrost.go (5 hunks)
core/providers/anthropic.go (3 hunks)
core/providers/bedrock.go (1 hunks)
core/providers/cohere.go (3 hunks)
core/providers/openai.go (1 hunks)
core/providers/utils.go (3 hunks)
core/schemas/bifrost.go (2 hunks)
core/tests/account.go (1 hunks)
core/tests/azure_test.go (1 hunks)
core/tests/openai_test.go (1 hunks)
core/tests/tests.go (2 hunks)
core/tests/vertex_test.go (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

core/providers/bedrock.go (1)

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/anthropic.go:526-550
Timestamp: 2025-06-04T09:29:46.287Z
Learning: In core/providers/anthropic.go, the content field in formattedMessages is always of type []interface{} because it's explicitly constructed that way upstream in the prepareAnthropicChatRequest function. Defensive type casting for multiple types is not needed since the type is guaranteed by the construction logic.

🧬 Code Graph Analysis (7)

core/providers/cohere.go (1)

core/schemas/bifrost.go (2)

UserMessage (161-163)

ImageContent (187-192)

core/providers/openai.go (1)

core/schemas/bifrost.go (3)

UserMessage (161-163)

ImageContent (187-192)

ToolMessage (166-169)

core/tests/tests.go (1)

core/schemas/bifrost.go (2)

ImageContentTypeURL (183-183)

ImageContentTypeBase64 (182-182)

core/providers/bedrock.go (1)

core/schemas/bifrost.go (3)

UserMessage (161-163)

ImageContent (187-192)

ToolMessage (166-169)

core/providers/utils.go (1)

core/schemas/bifrost.go (3)

ImageContent (187-192)

ImageContentTypeBase64 (182-182)

ImageContentTypeURL (183-183)

core/bifrost.go (2)

core/schemas/bifrost.go (1)

ModelProvider (37-37)

core/schemas/provider.go (1)

Provider (142-149)

core/providers/anthropic.go (1)

core/schemas/bifrost.go (5)

ImageContentTypeBase64 (182-182)

ImageContentTypeURL (183-183)

UserMessage (161-163)

ImageContent (187-192)

ToolMessage (166-169)

🔇 Additional comments (14)

core/tests/vertex_test.go (1)

24-25: LGTM! Image testing enabled for Vertex provider.

The changes correctly enable both image and base image testing scenarios for the Vertex provider, which aligns with the PR objective to standardize image handling across all providers.

core/tests/azure_test.go (1)

25-25: LGTM! Base image testing enabled for Azure provider.

The change appropriately enables base image testing scenarios for Azure, complementing the already enabled image testing. This ensures comprehensive image handling coverage for the Azure provider.

core/tests/openai_test.go (1)

25-26: LGTM! Image testing enabled for OpenAI provider.

The changes correctly enable both image and base image testing scenarios for the OpenAI provider. The gpt-4o-mini model supports vision capabilities, making these configuration changes appropriate for testing the standardized image handling functionality.
core/tests/account.go (1)

80-80:
✅ Verification successful

Verify the Cohere vision model name.

The addition of "c4ai-aya-vision-8b" aligns with the PR objective to add vision support for Cohere. However, please verify that this model identifier is correct and officially supported by Cohere's API.

🌐 Web query:
What are the current available models for Cohere API, specifically any vision-capable models like "c4ai-aya-vision-8b"?
💡 Result:

Current Models Available on the Cohere API

Cohere provides a range of large language models and multimodal models through their API, optimized for different use cases such as multilingual understanding, text generation, search, and image-related tasks. Here are the primary models currently available, especially focusing on vision-capable models:

Aya Model Family

The Aya family includes both purely text-based and multimodal (vision) models:

Model Name Parameters Modality Context Length Max Output Tokens Description

c4ai-aya-expanse-8b 8B Text 8k 4k Multilingual, optimized for performance across 23 languages

c4ai-aya-expanse-32b 32B Text 128k 4k Larger multilingual model with enhanced performance

c4ai-aya-vision-8b 8B Text, Images 16k 4k Multimodal (state-of-the-art vision + language), covers 23 languages, low latency

c4ai-aya-vision-32b 32B Text, Images 16k 4k Multimodal, larger variant focused on best-in-class multilingual and multimodal performance

Aya Vision models (c4ai-aya-vision-8b and c4ai-aya-vision-32b) are explicitly designed for both text and image (multimodal) tasks, such as image captioning, answering questions about photos, translating text, and generating summaries in 23 languages. They are regarded as best-in-class on various benchmarks, with the 32B model often outperforming much larger competing models[1][9].

Additional Multimodal Models

Embed v3.0: This is a specialized AI search model capable of generating embeddings from both text and images, thus considered multimodal in its capabilities. While it's focused on embedding generation rather than generation or chat, it supports both text and image inputs[2][3][6][8].

Command Models (Primarily Text-Based)

Command A 03-2025: The latest flagship chat model, focused on text and agentic tasks. It does not include vision capabilities but supports advanced multilingual and agent applications, with up to 256,000 token context length[5].

Model Availability and Access

Aya Vision models ("c4ai-aya-vision-8b" and "c4ai-aya-vision-32b") are available through the Cohere API as chat models supporting multimodal (image + text) use cases[1][7][10].

Cohere offers public, research, and sometimes restricted licensing for these models; note that commercial use may be limited for some open models like Aya Vision, which is released under a Creative Commons license with restrictions on commercial use[9].

Summary Table

Model Name Vision-Capable Multilingual Parameters Text Image Typical Use Case

c4ai-aya-vision-8b Yes Yes 8B Yes Yes Multimodal chat, research, QA

c4ai-aya-vision-32b Yes Yes 32B Yes Yes Multimodal, high performance

c4ai-aya-expanse-8b No Yes 8B Yes No Text, multilingual tasks

c4ai-aya-expanse-32b No Yes 32B Yes No Text, advanced multilingual

Command A 03-2025 No Yes N/A Yes No Chat, agent, enterprise

Embed v3.0 multimodal Yes (embeddings) No N/A Yes Yes Embeddings from text and images

Conclusion

The Cohere API currently supports several cutting-edge models, including the c4ai-aya-vision-8b and c4ai-aya-vision-32b as vision-capable (multimodal) models for both text and image inputs. These models are state-of-the-art for multilingual, multimodal tasks and excel in various benchmarks. If your use case involves vision-related AI along with text, these Aya Vision models are the primary offerings from Cohere for such capabilities[1][9][10].

Citations:

1: https://docs.cohere.com/v2/docs/models

2: https://docs.cohere.com/v2/changelog/2025-05-14-oci-models-release-notes

3: https://docs.cohere.com/changelog

4: https://github.blog/changelog/2025-05-08-upcoming-deprecations-for-cohere-models-in-github-models/

5: https://docs.oracle.com/en-us/iaas/Content/generative-ai/cohere-command-a-03-2025.htm

6: https://docs.cohere.com/v2/changelog

7: https://tongfamily.com/2025/01/04/ai-openwebui-missing-mac-manual-with-ollama-and-comfyui-notes/

8: https://docs.cohere.com/v1/changelog

9: https://techcrunch.com/2025/03/04/cohere-claims-its-new-aya-vision-ai-model-is-best-in-class/

10: https://github.com/lobehub/lobe-chat/blob/main/locales/en-US/models.json
Model c4ai-aya-vision-8b Confirmed

The Cohere API officially supports c4ai-aya-vision-8b as a multimodal (text + image) model. You can safely keep this addition in core/tests/account.go (line 80).
core/tests/tests.go (1)

192-192: LGTM! Improved type safety with predefined constants.

The replacement of raw string literals with the new predefined constants schemas.ImageContentTypeURL and schemas.ImageContentTypeBase64 enhances type safety and aligns with the updated ImageContent struct definition.

Also applies to: 228-228

core/providers/openai.go (1)

224-234: LGTM! Proper image content standardization for OpenAI.

The implementation correctly:

Creates copies to avoid mutating original image content

Applies normalization for type inference and media type handling

Formats as full data URI (required by OpenAI API)

This follows the established pattern for image content standardization across providers.

core/providers/bedrock.go (2)

492-502: LGTM! Consistent image content handling for Bedrock.

The implementation properly:

Creates copies to prevent original data mutation

Normalizes content with type inference and media type handling

Formats with raw base64 data (appropriate for Bedrock/Claude)

This aligns with the standardization pattern across all providers.

510-511: Good documentation of media type handling.

The comment correctly explains that the "image/" prefix removal is safe because normalization ensures the full format string is available.

core/bifrost.go (3)

218-220: LGTM! Improved encapsulation by making method unexported.

Renaming SelectKeyFromProviderForModel to selectKeyFromProviderForModel correctly encapsulates this internal implementation detail.

301-301: LGTM! Consistent usage of renamed method.

All callers correctly updated to use the new unexported method name selectKeyFromProviderForModel.

Also applies to: 515-515, 689-689

414-417: LGTM! Improved encapsulation by making method unexported.

Renaming GetProviderQueue to getProviderQueue correctly encapsulates this internal implementation detail.

core/providers/cohere.go (3)

238-262: LGTM! Well-implemented vision support for chat history.

The implementation correctly:

Checks for image content in user messages

Creates content arrays with both text and image data

Uses the helper function to process image content consistently

Falls back to simple message format for non-vision scenarios

This follows the established pattern for vision support across providers.

275-299: LGTM! Consistent vision support for current message.

The implementation mirrors the chat history handling and correctly processes both text and image content for the current message using the same helper function.

463-485: Excellent refactoring with proper image content standardization.

The processImageContent helper function:

Eliminates code duplication (addressing previous review feedback)

Creates copies to prevent original data mutation

Applies standardized normalization and formatting

Returns properly formatted map for Cohere API

This follows the established pattern for image content handling across all providers.

core/providers/utils.go

core/schemas/bifrost.go

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

core/schemas/bifrost.go (2)
194-227: Address duplication concerns from previous review

The regex and normalizeMediaType function are duplicated in other packages as noted in previous reviews.

Consider exporting these (DataURIRegex, NormalizeMediaType) to eliminate cross-package duplication.

302-303: Use idiomatic heap allocation for pointer assignment

Taking the address of the local variable normalizedMediaType works but is non-idiomatic.
-normalizedMediaType := finalMediaType
-ic.MediaType = &normalizedMediaType
+ic.MediaType = schemas.StrPtr(finalMediaType)
This approach is clearer and more consistent with Go idioms.
core/providers/anthropic.go (1)

322-350: LGTM: Excellent defensive programming with proper normalization.

The implementation correctly creates a copy of the image content before normalization and formatting, preventing unintended mutations of the original data. The conditional formatting for base64 content and the use of raw base64 data (includePrefix=false) aligns perfectly with Anthropic's API requirements.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0cd860a and 05ced39.

📒 Files selected for processing (11)

core/bifrost.go (5 hunks)
core/providers/anthropic.go (3 hunks)
core/providers/bedrock.go (1 hunks)
core/providers/cohere.go (3 hunks)
core/providers/openai.go (1 hunks)
core/schemas/bifrost.go (2 hunks)
core/tests/account.go (1 hunks)
core/tests/azure_test.go (1 hunks)
core/tests/openai_test.go (1 hunks)
core/tests/tests.go (2 hunks)
core/tests/vertex_test.go (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

core/providers/bedrock.go (1)

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/anthropic.go:526-550
Timestamp: 2025-06-04T09:29:46.287Z
Learning: In core/providers/anthropic.go, the content field in formattedMessages is always of type []interface{} because it's explicitly constructed that way upstream in the prepareAnthropicChatRequest function. Defensive type casting for multiple types is not needed since the type is guaranteed by the construction logic.

🧬 Code Graph Analysis (6)

core/tests/tests.go (1)

core/schemas/bifrost.go (2)

ImageContentTypeURL (183-183)

ImageContentTypeBase64 (182-182)

core/providers/openai.go (1)

core/schemas/bifrost.go (3)

UserMessage (161-163)

ImageContent (187-192)

ToolMessage (166-169)

core/bifrost.go (2)

core/schemas/bifrost.go (1)

ModelProvider (37-37)

core/schemas/provider.go (1)

Provider (142-149)

core/providers/cohere.go (1)

core/schemas/bifrost.go (2)

UserMessage (161-163)

ImageContent (187-192)

core/providers/anthropic.go (1)

core/schemas/bifrost.go (5)

ImageContentTypeBase64 (182-182)

ImageContentTypeURL (183-183)

UserMessage (161-163)

ImageContent (187-192)

ToolMessage (166-169)

core/providers/bedrock.go (1)

core/schemas/bifrost.go (3)

UserMessage (161-163)

ImageContent (187-192)

ToolMessage (166-169)

🪛 golangci-lint (1.64.8)

core/providers/cohere.go

468-468: func processImageContent is unused

(unused)

🔇 Additional comments (24)

core/schemas/bifrost.go (6)

4-8: LGTM: Required imports for new functionality

The added imports (fmt, regexp, strings) are necessary for the new image content normalization and formatting functionality.

179-184: LGTM: Improved type safety with string alias

The introduction of ImageContentType as a string alias with explicit constants enhances type safety and makes the code more self-documenting compared to raw string literals.

188-192: LGTM: Enhanced struct definition with type safety

The changes improve the struct definition:

Using ImageContentType for the Type field enhances type safety

Adding omitempty to MediaType and Detail fields produces cleaner JSON output

194-196: LGTM: Efficient precompiled regex

Using a precompiled regex variable is a performance best practice. The pattern correctly matches data URI format (data:mediatype;base64,data).

229-258: LGTM: Robust type inference logic

The Normalize method provides solid logic for:

Inferring content type from URL patterns (data URI, HTTP/HTTPS, or raw base64)

Normalizing media types when provided

Proper nil checking

The implementation correctly handles edge cases and follows good defensive programming practices.

260-313: LGTM: Comprehensive data URL formatting logic

The FormatDataURL method handles complex scenarios well:

Detects existing data URI format vs raw base64

Preserves explicit MediaType field preference over URL-embedded media type

Provides flexible output formatting (with/without prefix)

Ensures MediaType field is always normalized and set

The logic is sound and handles edge cases appropriately.

core/tests/vertex_test.go (1)

24-25: LGTM: Enable image testing for Vertex provider

Enabling SetupImage and SetupBaseImage tests aligns with the PR objective to standardize image handling across all providers. This ensures the new image normalization functionality is properly tested for Vertex.

core/tests/azure_test.go (1)

25-25: LGTM: Enable base image testing for Azure provider

Enabling SetupBaseImage ensures the new standardized image handling is tested for the Azure provider, consistent with similar changes across other provider test files.

core/tests/account.go (1)

80-80: LGTM: Add vision-capable model for Cohere

Adding "c4ai-aya-vision-8b" to the Cohere model list supports the PR's image handling standardization by providing a vision-capable model for testing image content processing with the Cohere provider.

core/tests/openai_test.go (1)

25-26: LGTM! Test configuration correctly enables image support.

The flags are appropriately set to enable both URL and base64 image testing for OpenAI, aligning with the standardized image handling implementation.

core/tests/tests.go (2)

192-192: Excellent type safety improvement.

Replacing the raw string pointer with the typed constant schemas.ImageContentTypeURL improves type safety and prevents potential typos.

228-228: Consistent type safety improvement.

Using schemas.ImageContentTypeBase64 maintains consistency with the URL type and provides the same type safety benefits.

core/providers/openai.go (2)

224-228: Good defensive copying to ensure immutability.

Creating copies of the ImageContent struct prevents mutation of the original data, which is essential for maintaining data integrity across the system.

231-234: Proper normalization and formatting sequence.

The sequence of Normalize() followed by FormatDataURL(true) correctly standardizes the image content and formats it with the full data URI format required by OpenAI's API.

core/providers/bedrock.go (3)

492-496: Consistent defensive copying pattern.

The implementation follows the same immutability pattern as other providers by creating copies of the ImageContent struct before processing.

499-502: Appropriate normalization and formatting for Bedrock.

The normalization followed by FormatDataURL(false) correctly prepares the image content for Bedrock's API, which requires raw base64 data rather than full data URIs.

510-510: Smart handling of media type prefix.

Trimming the "image/" prefix while relying on the normalization step to provide the full format string is a clean approach that leverages the standardized normalization logic.

core/bifrost.go (4)

218-220: LGTM: Good encapsulation improvement.

Making this method private is appropriate since it's an internal utility method that doesn't need to be exposed outside the package.

301-301: LGTM: Correctly updated method call.

The method call has been properly updated to use the new unexported method name.

414-417: LGTM: Proper encapsulation of internal method.

Converting this to an unexported method improves the package's API surface by hiding internal implementation details.

515-515: LGTM: Consistent method call updates.

Both call sites have been correctly updated to use the new unexported method name.

Also applies to: 689-689

core/providers/cohere.go (2)

238-263: LGTM: Well-structured vision support preparation.

The implementation correctly prepares for vision support by creating content arrays when image content is present, with appropriate fallback to simple message strings for text-only content.

276-301: LGTM: Consistent implementation pattern.

The last message handling follows the same pattern as the chat history processing, maintaining consistency in the codebase.

core/providers/anthropic.go (1)

390-394: LGTM: Consistent copy pattern application.

The code correctly creates copies of ImageContent structs before passing them to processing functions, maintaining consistency with the defensive programming approach established in buildAnthropicImageSourceMap.

Also applies to: 413-417

core/providers/cohere.go

core/schemas/bifrost.go

core/providers/cohere.go

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (7)

core/providers/utils.go (4)

29-31: Remove duplicate dataURIRegex declaration.

This regex is already declared in core/schemas/bifrost.go, creating unnecessary duplication and potential inconsistency risks.

319-348: Remove duplicate normalizeMediaType function.

This function is duplicated from core/schemas/bifrost.go. Use the canonical implementation to avoid code drift.

350-380: Replace with canonical ImageContent.Normalize method.

This function duplicates receiver method logic that already exists on the ImageContent struct in the schema layer.

382-445: Replace with canonical ImageContent receiver methods.

This function replicates logic that should be implemented as receiver methods on the ImageContent struct. Replace with thin wrapper calling the canonical methods to avoid maintenance issues.

core/providers/anthropic.go (1)

322-341: Excellent implementation with defensive programming practices.

The implementation correctly normalizes and formats image content while creating a copy to avoid modifying the original data. The conditional formatting for base64 content only and the use of raw base64 data (includePrefix=false) aligns perfectly with Anthropic's API requirements.
core/providers/cohere.go (2)
276-301: Duplicate logic - same vision preparation pattern as chat history.

This code block duplicates the same vision preparation logic from the chat history processing section (lines 238-263). Both sections prepare for vision support but don't actually use it.

Apply the same simplification approach as suggested for the chat history section to maintain consistency.

465-481: 💡 Verification agent
🧩 Analysis chain

Remove unused function or implement its usage.

The static analysis correctly identifies this function as unused. Since the image processing calls are commented out in the main logic, this helper function serves no purpose currently.

Choose one of these approaches:

Remove the function until vision support is actually implemented

Keep but document clearly that it's prepared for future use

If keeping the function, add clear documentation:
+// processImageContent processes image content for future Cohere vision API support.
+// NOTE: Currently unused as Cohere v1 does not support image content.
+// This function is prepared for when Cohere adds vision capabilities.
 func processImageContent(imageContent *schemas.ImageContent) map[string]interface{} {
Based on the past review comment from @Pratham-Mishra04 about removing safe copy creation for resource optimization, consider removing this function entirely until actually needed.

🏁 Script executed:
#!/bin/bash
# Search for any references to processImageContent in the repository
rg -n "processImageContent"
Length of output: 538
Address unused processImageContent helper

The processImageContent function is only referenced in commented-out code and isn’t invoked at runtime:

• core/providers/cohere.go:253 – commented-out call
• core/providers/cohere.go:291 – commented-out call
• core/providers/cohere.go:465–481 – function definition

Choose one approach to clean this up:

Remove the entire function until vision support is actually implemented.

Keep it but add clear documentation noting it’s prepared for future use:
+ // processImageContent processes image content for future Cohere vision API support.
+ // NOTE: Currently unused because Cohere v1 does not support images.
+ // This helper is retained to avoid rework when vision capabilities arrive.
  func processImageContent(imageContent *schemas.ImageContent) map[string]interface{} {
      …
  }
Refer back to @Pratham-Mishra04’s suggestion about safe-copy removal when you reintroduce this logic.

🧰 Tools

🪛 golangci-lint (1.64.8)

468-468: func processImageContent is unused

(unused)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05ced39 and 7ef4bbd.

📒 Files selected for processing (12)

core/bifrost.go (5 hunks)
core/providers/anthropic.go (3 hunks)
core/providers/bedrock.go (1 hunks)
core/providers/cohere.go (3 hunks)
core/providers/openai.go (2 hunks)
core/providers/utils.go (3 hunks)
core/schemas/bifrost.go (1 hunks)
core/tests/account.go (1 hunks)
core/tests/azure_test.go (1 hunks)
core/tests/openai_test.go (1 hunks)
core/tests/tests.go (2 hunks)
core/tests/vertex_test.go (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

core/tests/tests.go (1)

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#64
File: transports/bifrost-http/integrations/genai/types.go:383-395
Timestamp: 2025-06-09T16:26:05.727Z
Learning: In the Bifrost schema (core/schemas/bifrost.go), only UserMessage and ToolMessage structs have ImageContent fields. AssistantMessage does not have an ImageContent field and cannot contain images. The schema design intentionally limits image content to user and tool messages only.

core/providers/anthropic.go (1)

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#64
File: transports/bifrost-http/integrations/genai/types.go:383-395
Timestamp: 2025-06-09T16:26:05.727Z
Learning: In the Bifrost schema (core/schemas/bifrost.go), only UserMessage and ToolMessage structs have ImageContent fields. AssistantMessage does not have an ImageContent field and cannot contain images. The schema design intentionally limits image content to user and tool messages only.

core/providers/bedrock.go (2)

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/anthropic.go:526-550
Timestamp: 2025-06-04T09:29:46.287Z
Learning: In core/providers/anthropic.go, the content field in formattedMessages is always of type []interface{} because it's explicitly constructed that way upstream in the prepareAnthropicChatRequest function. Defensive type casting for multiple types is not needed since the type is guaranteed by the construction logic.

Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/bedrock.go:443-468
Timestamp: 2025-06-04T09:32:15.826Z
Learning: In core/providers/bedrock.go, for tool call result messages (ModelChatMessageRoleTool), the Content field represents the actual tool call output. A tool result message should only be created when msg.Content is non-nil, as there's no semantic meaning to a tool result without output content.

🧬 Code Graph Analysis (7)

core/providers/openai.go (1)

core/providers/utils.go (1)

FormatImageContent (391-445)

core/tests/tests.go (1)

core/schemas/bifrost.go (2)

ImageContentTypeURL (177-177)

ImageContentTypeBase64 (176-176)

core/providers/anthropic.go (2)

core/providers/utils.go (1)

FormatImageContent (391-445)

core/schemas/bifrost.go (4)

ImageContentTypeURL (177-177)

UserMessage (155-157)

ImageContent (181-186)

ToolMessage (160-163)

core/bifrost.go (2)

core/schemas/bifrost.go (1)

ModelProvider (31-31)

core/schemas/provider.go (1)

Provider (142-149)

core/providers/cohere.go (2)

core/schemas/bifrost.go (2)

UserMessage (155-157)

ImageContent (181-186)

core/providers/utils.go (1)

FormatImageContent (391-445)

core/providers/bedrock.go (2)

core/providers/utils.go (1)

FormatImageContent (391-445)

core/schemas/bifrost.go (2)

ToolMessage (160-163)

ImageContent (181-186)

core/providers/utils.go (1)

core/schemas/bifrost.go (3)

ImageContent (181-186)

ImageContentTypeBase64 (176-176)

ImageContentTypeURL (177-177)

🪛 golangci-lint (1.64.8)

core/providers/cohere.go

468-468: func processImageContent is unused

(unused)

🔇 Additional comments (15)

core/tests/vertex_test.go (1)

24-25: LGTM! Enhanced test coverage for image handling.

Enabling image test setups for Vertex provider aligns with the PR's goal of standardizing image handling across all providers.

core/tests/azure_test.go (1)

25-25: LGTM! Enhanced base image test coverage.

Enabling base image setup for Azure provider supports comprehensive testing of the new standardized image handling capabilities.

core/tests/account.go (1)

80-80: LGTM! Added vision model support.

Adding the "c4ai-aya-vision-8b" model expands Cohere's vision capabilities, supporting the PR's image standardization efforts.

core/tests/openai_test.go (1)

25-26: LGTM! Image testing enabled for OpenAI provider.

The changes correctly enable both URL and base64 image testing for the OpenAI provider, aligning with the standardized image handling implementation.

core/tests/tests.go (2)

192-192: Type safety improvement with ImageContentType constants.

Good change replacing the raw string pointer with the typed constant schemas.ImageContentTypeURL, which improves type safety and aligns with the schema updates.

228-228: Consistent use of typed constants for base64 images.

Correctly uses schemas.ImageContentTypeBase64 constant instead of a raw string pointer, maintaining consistency with the URL image test above.

core/providers/openai.go (2)

229-229: Excellent implementation of immutable image content formatting.

The use of FormatImageContent(&messageImageContent, true) with includePrefix=true correctly creates a formatted copy for OpenAI's data URI requirements while preserving the original data. This addresses the mutation concerns raised in previous reviews.

244-244: Consistent use of formatted image content.

Correctly uses the formattedImgContent copy for both URL and detail fields, ensuring that OpenAI receives properly formatted image data without mutating the original content.

Also applies to: 248-249

core/schemas/bifrost.go (2)

173-178: Well-designed typed constants for image content.

The introduction of ImageContentType as a string alias with explicit constants improves type safety and prevents invalid values. This is a much better approach than using raw string pointers.

182-185: Improved ImageContent struct with better type safety.

The changes enhance the struct by:

Using the new ImageContentType instead of *string for better type safety

Adding omitempty tags to optional fields (MediaType and Detail)

Making the Type field a required value type rather than optional pointer

These changes align perfectly with the image content standardization effort.

core/providers/bedrock.go (1)

498-516: Excellent implementation of standardized image handling.

The image content formatting and field usage is implemented correctly, ensuring consistent image data handling across providers. The media type extraction with prefix trimming and the use of formatted URL for bytes source follows the expected pattern.

core/bifrost.go (3)

218-220: Good encapsulation by making internal methods private.

Converting SelectKeyFromProviderForModel to selectKeyFromProviderForModel properly encapsulates this internal utility method. The comment has been updated accordingly.

414-417: Consistent encapsulation pattern applied.

Converting GetProviderQueue to getProviderQueue maintains consistency with the other method privatization and properly encapsulates internal queue management logic.

301-301: All method calls updated correctly.

All calling sites have been properly updated to use the new unexported method names, maintaining functional consistency.

Also applies to: 515-515, 689-689

core/providers/anthropic.go (1)

377-408: Consistent copy pattern prevents side effects.

Creating copies of ImageContent structs before processing ensures that the original message data remains unmodified, which is a best practice for preventing unintended mutations. The pattern is applied consistently for both UserMessage and ToolMessage scenarios.

core/providers/bedrock.go

core/providers/cohere.go

Pratham-Mishra04 requested review from akshaydeo and danpiths June 8, 2025 20:24

coderabbitai bot requested changes Jun 8, 2025

View reviewed changes

core/providers/utils.go Outdated Show resolved Hide resolved

core/providers/utils.go Outdated Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 06-09-feat_image_input_normalisation_added_across_all_providers branch from 5ad6001 to 87688e5 Compare June 8, 2025 20:37

coderabbitai bot requested changes Jun 8, 2025

View reviewed changes

core/tests/cohere_test.go Outdated Show resolved Hide resolved

core/tests/bedrock_test.go Outdated Show resolved Hide resolved

core/providers/cohere.go Show resolved Hide resolved

core/providers/anthropic.go Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 06-09-feat_image_input_normalisation_added_across_all_providers branch 2 times, most recently from 8ca5048 to 5ce822c Compare June 8, 2025 20:52

coderabbitai bot approved these changes Jun 8, 2025

View reviewed changes

Pratham-Mishra04 marked this pull request as ready for review June 8, 2025 20:59

akshaydeo reviewed Jun 9, 2025

View reviewed changes

core/providers/openai.go Outdated Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 06-09-feat_image_input_normalisation_added_across_all_providers branch from 5ce822c to b507c49 Compare June 9, 2025 04:44

Pratham-Mishra04 requested a review from akshaydeo June 9, 2025 04:44

coderabbitai bot requested changes Jun 9, 2025

View reviewed changes

core/providers/utils.go Show resolved Hide resolved

core/providers/utils.go Outdated Show resolved Hide resolved

core/providers/utils.go Show resolved Hide resolved

core/schemas/bifrost.go Outdated Show resolved Hide resolved

core/schemas/bifrost.go Outdated Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 06-09-feat_image_input_normalisation_added_across_all_providers branch from b507c49 to 0cd860a Compare June 9, 2025 05:18

coderabbitai bot approved these changes Jun 9, 2025

View reviewed changes

Pratham-Mishra04 force-pushed the 06-09-feat_image_input_normalisation_added_across_all_providers branch from 0cd860a to 05ced39 Compare June 9, 2025 05:29

coderabbitai bot requested changes Jun 9, 2025

View reviewed changes

core/providers/cohere.go Show resolved Hide resolved

coderabbitai bot approved these changes Jun 9, 2025

View reviewed changes

akshaydeo reviewed Jun 9, 2025

View reviewed changes

core/schemas/bifrost.go Outdated Show resolved Hide resolved

Pratham-Mishra04 commented Jun 9, 2025

View reviewed changes

core/providers/cohere.go Outdated Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 06-09-feat_image_input_normalisation_added_across_all_providers branch from 05ced39 to 7ef4bbd Compare June 9, 2025 18:38

Pratham-Mishra04 requested a review from akshaydeo June 9, 2025 18:38

coderabbitai bot requested changes Jun 9, 2025

View reviewed changes

core/providers/bedrock.go Outdated Show resolved Hide resolved

core/providers/cohere.go Show resolved Hide resolved

feat: image input normalisation added across all providers

43c207e

Pratham-Mishra04 force-pushed the 06-09-feat_image_input_normalisation_added_across_all_providers branch from 7ef4bbd to 43c207e Compare June 9, 2025 19:27

coderabbitai bot approved these changes Jun 9, 2025

View reviewed changes

akshaydeo merged commit 41b446a into main Jun 10, 2025
2 checks passed

This was referenced Jun 11, 2025

feat: refactor message content structure to support multi-modal content blocks #74

Merged

bug: fixed provider level minor message parsing bugs #77

Merged

akshaydeo deleted the 06-09-feat_image_input_normalisation_added_across_all_providers branch August 31, 2025 17:30

Model Name	Parameters	Modality	Context Length	Max Output Tokens	Description
c4ai-aya-expanse-8b	8B	Text	8k	4k	Multilingual, optimized for performance across 23 languages
c4ai-aya-expanse-32b	32B	Text	128k	4k	Larger multilingual model with enhanced performance
c4ai-aya-vision-8b	8B	Text, Images	16k	4k	Multimodal (state-of-the-art vision + language), covers 23 languages, low latency
c4ai-aya-vision-32b	32B	Text, Images	16k	4k	Multimodal, larger variant focused on best-in-class multilingual and multimodal performance

Model Name	Vision-Capable	Multilingual	Parameters	Text	Image	Typical Use Case
c4ai-aya-vision-8b	Yes	Yes	8B	Yes	Yes	Multimodal chat, research, QA
c4ai-aya-vision-32b	Yes	Yes	32B	Yes	Yes	Multimodal, high performance
c4ai-aya-expanse-8b	No	Yes	8B	Yes	No	Text, multilingual tasks
c4ai-aya-expanse-32b	No	Yes	32B	Yes	No	Text, advanced multilingual
Command A 03-2025	No	Yes	N/A	Yes	No	Chat, agent, enterprise
Embed v3.0 multimodal	Yes (embeddings)	No	N/A	Yes	Yes	Embeddings from text and images

feat: standardize image handling across all providers #62

feat: standardize image handling across all providers #62

Uh oh!

Conversation

Pratham-Mishra04 commented Jun 8, 2025

Unified Image Content Handling Across Providers

Uh oh!

coderabbitai bot commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Pratham-Mishra04 commented Jun 8, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Current Models Available on the Cohere API

Aya Model Family

Additional Multimodal Models

Command Models (Primarily Text-Based)

Model Availability and Access

Summary Table

Conclusion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Jun 8, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)