-
Notifications
You must be signed in to change notification settings - Fork 60
feat: standardize image handling across all providers #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: standardize image handling across all providers #62
Conversation
Summary by CodeRabbit
Summary by CodeRabbit
WalkthroughThis update introduces a robust normalization and formatting pipeline for image content across multiple providers. The Changes
Sequence Diagram(s)sequenceDiagram
participant UserMessage
participant Provider
participant ImageContent
participant APIRequest
UserMessage->>Provider: Send message (may include image)
Provider->>ImageContent: Copy and Normalize()
ImageContent->>ImageContent: Infer type, normalize media type
Provider->>ImageContent: FormatDataURL(includePrefix)
ImageContent->>Provider: Return formatted data
Provider->>APIRequest: Build payload with normalized image data
APIRequest->>Provider: Send to external API
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (12)
🧰 Additional context used🧠 Learnings (3)core/tests/tests.go (1)
core/providers/bedrock.go (1)
core/providers/anthropic.go (2)
🧬 Code Graph Analysis (5)core/tests/tests.go (1)
core/providers/bedrock.go (1)
core/bifrost.go (2)
core/providers/cohere.go (2)
core/providers/utils.go (1)
🪛 golangci-lint (1.64.8)core/providers/cohere.go468-468: func (unused) 🔇 Additional comments (11)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (13)
core/providers/anthropic.go
(1 hunks)core/providers/bedrock.go
(1 hunks)core/providers/cohere.go
(2 hunks)core/providers/openai.go
(1 hunks)core/providers/utils.go
(2 hunks)core/schemas/bifrost.go
(1 hunks)core/tests/account.go
(1 hunks)core/tests/azure_test.go
(1 hunks)core/tests/bedrock_test.go
(1 hunks)core/tests/cohere_test.go
(1 hunks)core/tests/openai_test.go
(1 hunks)core/tests/tests.go
(2 hunks)core/tests/vertex_test.go
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
core/providers/openai.go (1)
core/providers/utils.go (2)
NormalizeImageContent
(348-373)ImageDataURLBuilder
(381-427)
core/tests/tests.go (1)
core/schemas/bifrost.go (2)
ImageContentTypeURL
(177-177)ImageContentTypeBase64
(176-176)
core/providers/utils.go (1)
core/schemas/bifrost.go (3)
ImageContent
(181-190)ImageContentTypeBase64
(176-176)ImageContentTypeURL
(177-177)
core/providers/bedrock.go (1)
core/providers/utils.go (2)
NormalizeImageContent
(348-373)ImageDataURLBuilder
(381-427)
core/providers/cohere.go (2)
core/schemas/bifrost.go (2)
UserMessage
(155-157)ImageContent
(181-190)core/providers/utils.go (2)
NormalizeImageContent
(348-373)ImageDataURLBuilder
(381-427)
core/providers/anthropic.go (2)
core/providers/utils.go (2)
NormalizeImageContent
(348-373)ImageDataURLBuilder
(381-427)core/schemas/bifrost.go (2)
ImageContentTypeBase64
(176-176)ImageContentTypeURL
(177-177)
🔇 Additional comments (19)
core/tests/cohere_test.go (1)
21-21
: Model change aligns with vision capabilities.The switch to "c4ai-aya-vision-8b" model is appropriate for testing vision functionality as part of the image handling standardization effort.
However, consider enabling image testing (SetupImage/SetupBaseImage) to ensure the new image handling utilities work correctly with Cohere's vision model.
core/tests/bedrock_test.go (1)
28-28
: Good enablement of image testing for Bedrock.Enabling
SetupImage
testing while maintaining the informative comment about URL limitations provides comprehensive test coverage for Bedrock's image handling capabilities.core/tests/azure_test.go (1)
25-25
: Complete image testing coverage for Azure.Enabling
SetupBaseImage
alongside the existingSetupImage
provides comprehensive test coverage for both URL and base64 image formats, ensuring the standardized image handling works correctly with Azure.core/tests/vertex_test.go (1)
24-25
: Comprehensive image testing enabled for Vertex.Enabling both
SetupImage
andSetupBaseImage
provides complete test coverage for Vertex's image handling capabilities, ensuring the standardized image utilities work correctly with both URL and base64 formats.core/tests/openai_test.go (1)
25-26
: LGTM! Enabling image test scenarios for OpenAI.The changes correctly enable image testing for the OpenAI provider, which supports vision capabilities. This aligns with the PR's objective to standardize and test image handling across all providers.
core/tests/account.go (1)
80-80
: Good addition of vision-capable model for Cohere.Adding the "c4ai-aya-vision-8b" model supports the enhanced image handling capabilities introduced in this PR. This ensures Cohere provider can be properly tested with image content.
core/tests/tests.go (2)
192-192
: Excellent type safety improvement.Replacing the string literal with the typed constant
schemas.ImageContentTypeURL
improves type safety and aligns with the new schema definitions. The removal ofbifrost.Ptr()
indicates the field is no longer a pointer type.
228-228
: Consistent use of typed constants for base64 content.The change to
schemas.ImageContentTypeBase64
maintains consistency with the URL content type change above and improves type safety.core/providers/openai.go (1)
229-232
: Well-implemented image content standardization for OpenAI.The implementation correctly applies the new image processing utilities:
NormalizeImageContent
handles type inference and media type normalizationImageDataURLBuilder
withincludePrefix=true
ensures OpenAI receives the required full data URI formatThe processing order is logical - normalize first, then format according to provider requirements. This aligns perfectly with the PR's objective to standardize image handling while respecting provider-specific formatting needs.
core/providers/bedrock.go (2)
497-500
: LGTM! Proper image content normalization for Bedrock.The implementation correctly uses the new utility functions to normalize image content and format it for Bedrock's requirements (raw base64 data without prefix).
508-509
: Good defensive programming with prefix removal.The
TrimPrefix
call correctly handles cases where the media type might already include the "image/" prefix, ensuring clean format strings for Bedrock's API.core/schemas/bifrost.go (2)
173-178
: Excellent type safety improvement with ImageContentType enum.The new enum type prevents string typos and makes the API more self-documenting. The constants clearly define the supported image content types.
182-186
:✅ Verification successful
Breaking change: Type field no longer nullable.
The change from
*string
toImageContentType
improves type safety but is a breaking change. The field is now required rather than optional. The added documentation clearly explains the expected format for base64 data.Since this is a breaking change, verify that all existing usages have been updated:
🏁 Script executed:
#!/bin/bash # Description: Search for any remaining usage of pointer assignment to ImageContent.Type field # Expected: No results, as Type field is no longer a pointer # Search for code that might be assigning pointers to Type field rg -A 3 -B 3 "\.Type\s*=" --type go # Also search for any nil assignments or checks on Type field rg -A 3 -B 3 "\.Type.*nil" --type goLength of output: 7105
Approve:
ImageContent.Type
is now non-nullable and all usages are updatedAll references to the
ImageContent.Type
field have been converted from a pointer to theImageContentType
enum, and no lingering pointer assignments ornil
checks remain:
- In core/providers/utils.go,
imgContent.Type
is inferred and set toschemas.ImageContentTypeBase64
orschemas.ImageContentTypeURL
.- In core/providers/anthropic.go,
imgCopy.Type
is compared againstschemas.ImageContentTypeBase64
andschemas.ImageContentTypeURL
.No further changes are required.
core/providers/cohere.go (2)
238-269
: Well-structured image handling for chat history.The implementation correctly handles vision-capable messages by creating content arrays with separate text and image objects. Using
ImageDataURLBuilder
withtrue
parameter is appropriate for Cohere's API which expects full data URI format.
282-313
: Consistent image handling for the last message.The logic mirrors the chat history implementation, ensuring consistent behavior. The content array structure properly separates text and image data as expected by vision-capable models.
core/providers/anthropic.go (2)
322-337
: Excellent defensive programming with image content normalization.Creating a copy of the input prevents unintended side effects, and the conditional application of
ImageDataURLBuilder
only for base64 content is smart. Thefalse
parameter correctly produces raw base64 data as required by Anthropic's API.
338-351
: Clean source map construction for different image types.The implementation correctly handles both URL and base64 image types, setting appropriate fields in the source map. The type casting to string for the enum value is necessary and correct.
core/providers/utils.go (2)
10-10
: LGTM: New import for regex support.The
regexp
import is correctly added to support the new image content processing functions.
315-344
: LGTM: Clean media type normalization logic.The function correctly handles common image formats and provides sensible defaults. The switch statement covers the most common image types and the fallback logic is appropriate.
5ad6001
to
87688e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (13)
core/providers/anthropic.go
(1 hunks)core/providers/bedrock.go
(1 hunks)core/providers/cohere.go
(2 hunks)core/providers/openai.go
(1 hunks)core/providers/utils.go
(3 hunks)core/schemas/bifrost.go
(1 hunks)core/tests/account.go
(1 hunks)core/tests/azure_test.go
(1 hunks)core/tests/bedrock_test.go
(1 hunks)core/tests/cohere_test.go
(1 hunks)core/tests/openai_test.go
(1 hunks)core/tests/tests.go
(2 hunks)core/tests/vertex_test.go
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
core/providers/openai.go (1)
core/providers/utils.go (2)
NormalizeImageContent
(352-376)FormatImageDataURL
(384-427)
core/providers/bedrock.go (1)
core/providers/utils.go (2)
NormalizeImageContent
(352-376)FormatImageDataURL
(384-427)
core/providers/cohere.go (2)
core/schemas/bifrost.go (2)
UserMessage
(155-157)ImageContent
(181-186)core/providers/utils.go (2)
NormalizeImageContent
(352-376)FormatImageDataURL
(384-427)
core/tests/tests.go (1)
core/schemas/bifrost.go (2)
ImageContentTypeURL
(177-177)ImageContentTypeBase64
(176-176)
core/providers/anthropic.go (2)
core/providers/utils.go (2)
NormalizeImageContent
(352-376)FormatImageDataURL
(384-427)core/schemas/bifrost.go (2)
ImageContentTypeBase64
(176-176)ImageContentTypeURL
(177-177)
core/providers/utils.go (1)
core/schemas/bifrost.go (3)
ImageContent
(181-186)ImageContentTypeBase64
(176-176)ImageContentTypeURL
(177-177)
🔇 Additional comments (15)
core/schemas/bifrost.go (2)
173-178
: Excellent type safety improvement for image content handling.The introduction of
ImageContentType
as a string alias with well-defined constants enhances type safety and prevents invalid values. This standardization aligns perfectly with the PR's goal of unifying image handling across providers.
182-185
: Good structural improvements to ImageContent.The changes improve the struct design:
- Changing
Type
from*string
toImageContentType
(non-pointer) ensures the field is always present with a valid value- Adding
omitempty
tags to optional fields keeps JSON responses cleanNote: This is a breaking change from the previous
*string
type, but it's justified for better type safety and API consistency.core/tests/azure_test.go (1)
25-25
: Good consistency with other providers' image testing.Enabling
SetupBaseImage: true
for Azure completes the image capability testing alongside URL-based images, maintaining consistency with the standardized image handling approach across providers.core/tests/vertex_test.go (1)
24-25
: LGTM! Image testing enabled for Vertex provider.The changes correctly enable both URL and base64 image test scenarios for the Vertex provider, aligning with the PR's objective to standardize image handling across all providers.
core/tests/openai_test.go (1)
25-26
: LGTM! Image testing enabled for OpenAI provider.The changes correctly enable both URL and base64 image test scenarios for the OpenAI provider, consistent with the standardization effort across all providers.
core/tests/account.go (1)
80-80
: LGTM! Vision-capable model added for Cohere.The addition of "c4ai-aya-vision-8b" to Cohere's supported models list appropriately supports the vision capabilities being standardized across providers.
core/tests/tests.go (1)
192-192
: Excellent improvement! Using typed constants instead of magic strings.The replacement of raw string literals with
schemas.ImageContentTypeURL
andschemas.ImageContentTypeBase64
constants improves type safety and maintainability. This aligns perfectly with the standardization effort outlined in the PR objectives.Also applies to: 228-228
core/providers/openai.go (1)
229-232
: LGTM: Correct implementation of image content standardization for OpenAI.The implementation properly normalizes image content and formats it with the full data URI prefix (includePrefix=true), which aligns with OpenAI's API requirements.
core/providers/bedrock.go (2)
497-500
: LGTM: Correct image normalization and formatting for Bedrock.The implementation properly normalizes image content and formats it as raw base64 data (includePrefix=false), which is required by Bedrock's API.
508-509
:✅ Verification successful
Verify that normalized media type includes the full format string.
The comment states that
normalizeMediaType
ensures the full format, so the "image/" prefix is trimmed. Please verify that the utility function actually provides the complete media type string (e.g., "image/jpeg" not just "jpeg").
🏁 Script executed:
#!/bin/bash # Verify that normalizeMediaType function provides full media type strings ast-grep --pattern 'func normalizeMediaType($$$) $$$'Length of output: 1413
Confirmed normalizeMediaType outputs full media types
ThenormalizeMediaType
utility always returns a full type string prefixed with “image/” (defaulting to “image/jpeg” or mapping known formats), so trimming that prefix here correctly yields just the subtype.core/providers/utils.go (5)
10-10
: LGTM: Import addition is appropriate.The
regexp
import is correctly added to support the new image content utilities.
29-31
: LGTM: Performance optimization implemented.The precompiled regex at package level addresses the performance concern from previous reviews. This avoids recompiling the regex on every function call.
319-348
: LGTM: Comprehensive media type normalization.The function handles common image formats well and provides a sensible default. The logic for adding the "image/" prefix is sound and the switch statement is appropriate for this number of cases.
350-376
: LGTM: Robust type inference logic.The function correctly handles different URL patterns for type inference:
- Data URI format detection using the precompiled regex
- HTTP/HTTPS URL detection for regular URLs
- Fallback to base64 for other cases
The media type normalization is properly integrated.
378-427
: LGTM: Well-implemented data URL formatting.The function correctly handles both scenarios:
- URLs that already have data URI format (extracts and rebuilds)
- Raw base64 data (constructs data URI)
The
includePrefix
parameter provides the flexibility needed for different provider requirements. The documentation clearly states that the function modifies the input struct in place.
8ca5048
to
5ce822c
Compare
5ce822c
to
b507c49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🔭 Outside diff range comments (1)
core/providers/anthropic.go (1)
387-397
: 🧹 Nitpick (assertive)Unnecessary double copy of
ImageContent
messageImageContent
is copied once here and copied again insidebuildAnthropicImageSourceMap
.
Keeping a single copy is sufficient and marginally cheaper:- // Create a copy to avoid modifying the original - messageImageContent = *msg.UserMessage.ImageContent + messageImageContent := msg.UserMessage.ImageContent(or drop the outer copy altogether and let
buildAnthropicImageSourceMap
handle it).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (12)
core/bifrost.go
(5 hunks)core/providers/anthropic.go
(3 hunks)core/providers/bedrock.go
(1 hunks)core/providers/cohere.go
(3 hunks)core/providers/openai.go
(1 hunks)core/providers/utils.go
(3 hunks)core/schemas/bifrost.go
(2 hunks)core/tests/account.go
(1 hunks)core/tests/azure_test.go
(1 hunks)core/tests/openai_test.go
(1 hunks)core/tests/tests.go
(2 hunks)core/tests/vertex_test.go
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
core/providers/bedrock.go (1)
Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/anthropic.go:526-550
Timestamp: 2025-06-04T09:29:46.287Z
Learning: In core/providers/anthropic.go, the content field in formattedMessages is always of type []interface{} because it's explicitly constructed that way upstream in the prepareAnthropicChatRequest function. Defensive type casting for multiple types is not needed since the type is guaranteed by the construction logic.
🧬 Code Graph Analysis (7)
core/providers/cohere.go (1)
core/schemas/bifrost.go (2)
UserMessage
(161-163)ImageContent
(187-192)
core/providers/openai.go (1)
core/schemas/bifrost.go (3)
UserMessage
(161-163)ImageContent
(187-192)ToolMessage
(166-169)
core/tests/tests.go (1)
core/schemas/bifrost.go (2)
ImageContentTypeURL
(183-183)ImageContentTypeBase64
(182-182)
core/providers/bedrock.go (1)
core/schemas/bifrost.go (3)
UserMessage
(161-163)ImageContent
(187-192)ToolMessage
(166-169)
core/providers/utils.go (1)
core/schemas/bifrost.go (3)
ImageContent
(187-192)ImageContentTypeBase64
(182-182)ImageContentTypeURL
(183-183)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
ModelProvider
(37-37)core/schemas/provider.go (1)
Provider
(142-149)
core/providers/anthropic.go (1)
core/schemas/bifrost.go (5)
ImageContentTypeBase64
(182-182)ImageContentTypeURL
(183-183)UserMessage
(161-163)ImageContent
(187-192)ToolMessage
(166-169)
🔇 Additional comments (14)
core/tests/vertex_test.go (1)
24-25
: LGTM! Image testing enabled for Vertex provider.The changes correctly enable both image and base image testing scenarios for the Vertex provider, which aligns with the PR objective to standardize image handling across all providers.
core/tests/azure_test.go (1)
25-25
: LGTM! Base image testing enabled for Azure provider.The change appropriately enables base image testing scenarios for Azure, complementing the already enabled image testing. This ensures comprehensive image handling coverage for the Azure provider.
core/tests/openai_test.go (1)
25-26
: LGTM! Image testing enabled for OpenAI provider.The changes correctly enable both image and base image testing scenarios for the OpenAI provider. The gpt-4o-mini model supports vision capabilities, making these configuration changes appropriate for testing the standardized image handling functionality.
core/tests/account.go (1)
80-80
:✅ Verification successful
Verify the Cohere vision model name.
The addition of
"c4ai-aya-vision-8b"
aligns with the PR objective to add vision support for Cohere. However, please verify that this model identifier is correct and officially supported by Cohere's API.
🌐 Web query:
What are the current available models for Cohere API, specifically any vision-capable models like "c4ai-aya-vision-8b"?
💡 Result:
Current Models Available on the Cohere API
Cohere provides a range of large language models and multimodal models through their API, optimized for different use cases such as multilingual understanding, text generation, search, and image-related tasks. Here are the primary models currently available, especially focusing on vision-capable models:
Aya Model Family
The Aya family includes both purely text-based and multimodal (vision) models:
Model Name Parameters Modality Context Length Max Output Tokens Description c4ai-aya-expanse-8b 8B Text 8k 4k Multilingual, optimized for performance across 23 languages c4ai-aya-expanse-32b 32B Text 128k 4k Larger multilingual model with enhanced performance c4ai-aya-vision-8b 8B Text, Images 16k 4k Multimodal (state-of-the-art vision + language), covers 23 languages, low latency c4ai-aya-vision-32b 32B Text, Images 16k 4k Multimodal, larger variant focused on best-in-class multilingual and multimodal performance
- Aya Vision models (c4ai-aya-vision-8b and c4ai-aya-vision-32b) are explicitly designed for both text and image (multimodal) tasks, such as image captioning, answering questions about photos, translating text, and generating summaries in 23 languages. They are regarded as best-in-class on various benchmarks, with the 32B model often outperforming much larger competing models[1][9].
Additional Multimodal Models
- Embed v3.0: This is a specialized AI search model capable of generating embeddings from both text and images, thus considered multimodal in its capabilities. While it's focused on embedding generation rather than generation or chat, it supports both text and image inputs[2][3][6][8].
Command Models (Primarily Text-Based)
- Command A 03-2025: The latest flagship chat model, focused on text and agentic tasks. It does not include vision capabilities but supports advanced multilingual and agent applications, with up to 256,000 token context length[5].
Model Availability and Access
- Aya Vision models ("c4ai-aya-vision-8b" and "c4ai-aya-vision-32b") are available through the Cohere API as chat models supporting multimodal (image + text) use cases[1][7][10].
- Cohere offers public, research, and sometimes restricted licensing for these models; note that commercial use may be limited for some open models like Aya Vision, which is released under a Creative Commons license with restrictions on commercial use[9].
Summary Table
Model Name Vision-Capable Multilingual Parameters Text Image Typical Use Case c4ai-aya-vision-8b Yes Yes 8B Yes Yes Multimodal chat, research, QA c4ai-aya-vision-32b Yes Yes 32B Yes Yes Multimodal, high performance c4ai-aya-expanse-8b No Yes 8B Yes No Text, multilingual tasks c4ai-aya-expanse-32b No Yes 32B Yes No Text, advanced multilingual Command A 03-2025 No Yes N/A Yes No Chat, agent, enterprise Embed v3.0 multimodal Yes (embeddings) No N/A Yes Yes Embeddings from text and images Conclusion
The Cohere API currently supports several cutting-edge models, including the c4ai-aya-vision-8b and c4ai-aya-vision-32b as vision-capable (multimodal) models for both text and image inputs. These models are state-of-the-art for multilingual, multimodal tasks and excel in various benchmarks. If your use case involves vision-related AI along with text, these Aya Vision models are the primary offerings from Cohere for such capabilities[1][9][10].
Citations:
- 1: https://docs.cohere.com/v2/docs/models
- 2: https://docs.cohere.com/v2/changelog/2025-05-14-oci-models-release-notes
- 3: https://docs.cohere.com/changelog
- 4: https://github.blog/changelog/2025-05-08-upcoming-deprecations-for-cohere-models-in-github-models/
- 5: https://docs.oracle.com/en-us/iaas/Content/generative-ai/cohere-command-a-03-2025.htm
- 6: https://docs.cohere.com/v2/changelog
- 7: https://tongfamily.com/2025/01/04/ai-openwebui-missing-mac-manual-with-ollama-and-comfyui-notes/
- 8: https://docs.cohere.com/v1/changelog
- 9: https://techcrunch.com/2025/03/04/cohere-claims-its-new-aya-vision-ai-model-is-best-in-class/
- 10: https://github.com/lobehub/lobe-chat/blob/main/locales/en-US/models.json
Model c4ai-aya-vision-8b Confirmed
The Cohere API officially supports
c4ai-aya-vision-8b
as a multimodal (text + image) model. You can safely keep this addition incore/tests/account.go
(line 80).core/tests/tests.go (1)
192-192
: LGTM! Improved type safety with predefined constants.The replacement of raw string literals with the new predefined constants
schemas.ImageContentTypeURL
andschemas.ImageContentTypeBase64
enhances type safety and aligns with the updatedImageContent
struct definition.Also applies to: 228-228
core/providers/openai.go (1)
224-234
: LGTM! Proper image content standardization for OpenAI.The implementation correctly:
- Creates copies to avoid mutating original image content
- Applies normalization for type inference and media type handling
- Formats as full data URI (required by OpenAI API)
This follows the established pattern for image content standardization across providers.
core/providers/bedrock.go (2)
492-502
: LGTM! Consistent image content handling for Bedrock.The implementation properly:
- Creates copies to prevent original data mutation
- Normalizes content with type inference and media type handling
- Formats with raw base64 data (appropriate for Bedrock/Claude)
This aligns with the standardization pattern across all providers.
510-511
: Good documentation of media type handling.The comment correctly explains that the "image/" prefix removal is safe because normalization ensures the full format string is available.
core/bifrost.go (3)
218-220
: LGTM! Improved encapsulation by making method unexported.Renaming
SelectKeyFromProviderForModel
toselectKeyFromProviderForModel
correctly encapsulates this internal implementation detail.
301-301
: LGTM! Consistent usage of renamed method.All callers correctly updated to use the new unexported method name
selectKeyFromProviderForModel
.Also applies to: 515-515, 689-689
414-417
: LGTM! Improved encapsulation by making method unexported.Renaming
GetProviderQueue
togetProviderQueue
correctly encapsulates this internal implementation detail.core/providers/cohere.go (3)
238-262
: LGTM! Well-implemented vision support for chat history.The implementation correctly:
- Checks for image content in user messages
- Creates content arrays with both text and image data
- Uses the helper function to process image content consistently
- Falls back to simple message format for non-vision scenarios
This follows the established pattern for vision support across providers.
275-299
: LGTM! Consistent vision support for current message.The implementation mirrors the chat history handling and correctly processes both text and image content for the current message using the same helper function.
463-485
: Excellent refactoring with proper image content standardization.The
processImageContent
helper function:
- Eliminates code duplication (addressing previous review feedback)
- Creates copies to prevent original data mutation
- Applies standardized normalization and formatting
- Returns properly formatted map for Cohere API
This follows the established pattern for image content handling across all providers.
b507c49
to
0cd860a
Compare
0cd860a
to
05ced39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
core/schemas/bifrost.go (2)
194-227
: Address duplication concerns from previous reviewThe regex and
normalizeMediaType
function are duplicated in other packages as noted in previous reviews.Consider exporting these (
DataURIRegex
,NormalizeMediaType
) to eliminate cross-package duplication.
302-303
: Use idiomatic heap allocation for pointer assignmentTaking the address of the local variable
normalizedMediaType
works but is non-idiomatic.-normalizedMediaType := finalMediaType -ic.MediaType = &normalizedMediaType +ic.MediaType = schemas.StrPtr(finalMediaType)This approach is clearer and more consistent with Go idioms.
core/providers/anthropic.go (1)
322-350
: LGTM: Excellent defensive programming with proper normalization.The implementation correctly creates a copy of the image content before normalization and formatting, preventing unintended mutations of the original data. The conditional formatting for base64 content and the use of raw base64 data (includePrefix=false) aligns perfectly with Anthropic's API requirements.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (11)
core/bifrost.go
(5 hunks)core/providers/anthropic.go
(3 hunks)core/providers/bedrock.go
(1 hunks)core/providers/cohere.go
(3 hunks)core/providers/openai.go
(1 hunks)core/schemas/bifrost.go
(2 hunks)core/tests/account.go
(1 hunks)core/tests/azure_test.go
(1 hunks)core/tests/openai_test.go
(1 hunks)core/tests/tests.go
(2 hunks)core/tests/vertex_test.go
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
core/providers/bedrock.go (1)
Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/anthropic.go:526-550
Timestamp: 2025-06-04T09:29:46.287Z
Learning: In core/providers/anthropic.go, the content field in formattedMessages is always of type []interface{} because it's explicitly constructed that way upstream in the prepareAnthropicChatRequest function. Defensive type casting for multiple types is not needed since the type is guaranteed by the construction logic.
🧬 Code Graph Analysis (6)
core/tests/tests.go (1)
core/schemas/bifrost.go (2)
ImageContentTypeURL
(183-183)ImageContentTypeBase64
(182-182)
core/providers/openai.go (1)
core/schemas/bifrost.go (3)
UserMessage
(161-163)ImageContent
(187-192)ToolMessage
(166-169)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
ModelProvider
(37-37)core/schemas/provider.go (1)
Provider
(142-149)
core/providers/cohere.go (1)
core/schemas/bifrost.go (2)
UserMessage
(161-163)ImageContent
(187-192)
core/providers/anthropic.go (1)
core/schemas/bifrost.go (5)
ImageContentTypeBase64
(182-182)ImageContentTypeURL
(183-183)UserMessage
(161-163)ImageContent
(187-192)ToolMessage
(166-169)
core/providers/bedrock.go (1)
core/schemas/bifrost.go (3)
UserMessage
(161-163)ImageContent
(187-192)ToolMessage
(166-169)
🪛 golangci-lint (1.64.8)
core/providers/cohere.go
468-468: func processImageContent
is unused
(unused)
🔇 Additional comments (24)
core/schemas/bifrost.go (6)
4-8
: LGTM: Required imports for new functionalityThe added imports (fmt, regexp, strings) are necessary for the new image content normalization and formatting functionality.
179-184
: LGTM: Improved type safety with string aliasThe introduction of
ImageContentType
as a string alias with explicit constants enhances type safety and makes the code more self-documenting compared to raw string literals.
188-192
: LGTM: Enhanced struct definition with type safetyThe changes improve the struct definition:
- Using
ImageContentType
for theType
field enhances type safety- Adding
omitempty
toMediaType
andDetail
fields produces cleaner JSON output
194-196
: LGTM: Efficient precompiled regexUsing a precompiled regex variable is a performance best practice. The pattern correctly matches data URI format (data:mediatype;base64,data).
229-258
: LGTM: Robust type inference logicThe
Normalize
method provides solid logic for:
- Inferring content type from URL patterns (data URI, HTTP/HTTPS, or raw base64)
- Normalizing media types when provided
- Proper nil checking
The implementation correctly handles edge cases and follows good defensive programming practices.
260-313
: LGTM: Comprehensive data URL formatting logicThe
FormatDataURL
method handles complex scenarios well:
- Detects existing data URI format vs raw base64
- Preserves explicit MediaType field preference over URL-embedded media type
- Provides flexible output formatting (with/without prefix)
- Ensures MediaType field is always normalized and set
The logic is sound and handles edge cases appropriately.
core/tests/vertex_test.go (1)
24-25
: LGTM: Enable image testing for Vertex providerEnabling
SetupImage
andSetupBaseImage
tests aligns with the PR objective to standardize image handling across all providers. This ensures the new image normalization functionality is properly tested for Vertex.core/tests/azure_test.go (1)
25-25
: LGTM: Enable base image testing for Azure providerEnabling
SetupBaseImage
ensures the new standardized image handling is tested for the Azure provider, consistent with similar changes across other provider test files.core/tests/account.go (1)
80-80
: LGTM: Add vision-capable model for CohereAdding
"c4ai-aya-vision-8b"
to the Cohere model list supports the PR's image handling standardization by providing a vision-capable model for testing image content processing with the Cohere provider.core/tests/openai_test.go (1)
25-26
: LGTM! Test configuration correctly enables image support.The flags are appropriately set to enable both URL and base64 image testing for OpenAI, aligning with the standardized image handling implementation.
core/tests/tests.go (2)
192-192
: Excellent type safety improvement.Replacing the raw string pointer with the typed constant
schemas.ImageContentTypeURL
improves type safety and prevents potential typos.
228-228
: Consistent type safety improvement.Using
schemas.ImageContentTypeBase64
maintains consistency with the URL type and provides the same type safety benefits.core/providers/openai.go (2)
224-228
: Good defensive copying to ensure immutability.Creating copies of the
ImageContent
struct prevents mutation of the original data, which is essential for maintaining data integrity across the system.
231-234
: Proper normalization and formatting sequence.The sequence of
Normalize()
followed byFormatDataURL(true)
correctly standardizes the image content and formats it with the full data URI format required by OpenAI's API.core/providers/bedrock.go (3)
492-496
: Consistent defensive copying pattern.The implementation follows the same immutability pattern as other providers by creating copies of the
ImageContent
struct before processing.
499-502
: Appropriate normalization and formatting for Bedrock.The normalization followed by
FormatDataURL(false)
correctly prepares the image content for Bedrock's API, which requires raw base64 data rather than full data URIs.
510-510
: Smart handling of media type prefix.Trimming the "image/" prefix while relying on the normalization step to provide the full format string is a clean approach that leverages the standardized normalization logic.
core/bifrost.go (4)
218-220
: LGTM: Good encapsulation improvement.Making this method private is appropriate since it's an internal utility method that doesn't need to be exposed outside the package.
301-301
: LGTM: Correctly updated method call.The method call has been properly updated to use the new unexported method name.
414-417
: LGTM: Proper encapsulation of internal method.Converting this to an unexported method improves the package's API surface by hiding internal implementation details.
515-515
: LGTM: Consistent method call updates.Both call sites have been correctly updated to use the new unexported method name.
Also applies to: 689-689
core/providers/cohere.go (2)
238-263
: LGTM: Well-structured vision support preparation.The implementation correctly prepares for vision support by creating content arrays when image content is present, with appropriate fallback to simple message strings for text-only content.
276-301
: LGTM: Consistent implementation pattern.The last message handling follows the same pattern as the chat history processing, maintaining consistency in the codebase.
core/providers/anthropic.go (1)
390-394
: LGTM: Consistent copy pattern application.The code correctly creates copies of ImageContent structs before passing them to processing functions, maintaining consistency with the defensive programming approach established in
buildAnthropicImageSourceMap
.Also applies to: 413-417
05ced39
to
7ef4bbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (7)
core/providers/utils.go (4)
29-31
: Remove duplicate dataURIRegex declaration.This regex is already declared in
core/schemas/bifrost.go
, creating unnecessary duplication and potential inconsistency risks.
319-348
: Remove duplicate normalizeMediaType function.This function is duplicated from
core/schemas/bifrost.go
. Use the canonical implementation to avoid code drift.
350-380
: Replace with canonical ImageContent.Normalize method.This function duplicates receiver method logic that already exists on the
ImageContent
struct in the schema layer.
382-445
: Replace with canonical ImageContent receiver methods.This function replicates logic that should be implemented as receiver methods on the
ImageContent
struct. Replace with thin wrapper calling the canonical methods to avoid maintenance issues.core/providers/anthropic.go (1)
322-341
: Excellent implementation with defensive programming practices.The implementation correctly normalizes and formats image content while creating a copy to avoid modifying the original data. The conditional formatting for base64 content only and the use of raw base64 data (includePrefix=false) aligns perfectly with Anthropic's API requirements.
core/providers/cohere.go (2)
276-301
: Duplicate logic - same vision preparation pattern as chat history.This code block duplicates the same vision preparation logic from the chat history processing section (lines 238-263). Both sections prepare for vision support but don't actually use it.
Apply the same simplification approach as suggested for the chat history section to maintain consistency.
465-481
: 💡 Verification agent🧩 Analysis chain
Remove unused function or implement its usage.
The static analysis correctly identifies this function as unused. Since the image processing calls are commented out in the main logic, this helper function serves no purpose currently.
Choose one of these approaches:
- Remove the function until vision support is actually implemented
- Keep but document clearly that it's prepared for future use
If keeping the function, add clear documentation:
+// processImageContent processes image content for future Cohere vision API support. +// NOTE: Currently unused as Cohere v1 does not support image content. +// This function is prepared for when Cohere adds vision capabilities. func processImageContent(imageContent *schemas.ImageContent) map[string]interface{} {Based on the past review comment from @Pratham-Mishra04 about removing safe copy creation for resource optimization, consider removing this function entirely until actually needed.
🏁 Script executed:
#!/bin/bash # Search for any references to processImageContent in the repository rg -n "processImageContent"Length of output: 538
Address unused
processImageContent
helperThe
processImageContent
function is only referenced in commented-out code and isn’t invoked at runtime:• core/providers/cohere.go:253 – commented-out call
• core/providers/cohere.go:291 – commented-out call
• core/providers/cohere.go:465–481 – function definitionChoose one approach to clean this up:
- Remove the entire function until vision support is actually implemented.
- Keep it but add clear documentation noting it’s prepared for future use:
+ // processImageContent processes image content for future Cohere vision API support. + // NOTE: Currently unused because Cohere v1 does not support images. + // This helper is retained to avoid rework when vision capabilities arrive. func processImageContent(imageContent *schemas.ImageContent) map[string]interface{} { … }Refer back to @Pratham-Mishra04’s suggestion about safe-copy removal when you reintroduce this logic.
🧰 Tools
🪛 golangci-lint (1.64.8)
468-468: func
processImageContent
is unused(unused)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (12)
core/bifrost.go
(5 hunks)core/providers/anthropic.go
(3 hunks)core/providers/bedrock.go
(1 hunks)core/providers/cohere.go
(3 hunks)core/providers/openai.go
(2 hunks)core/providers/utils.go
(3 hunks)core/schemas/bifrost.go
(1 hunks)core/tests/account.go
(1 hunks)core/tests/azure_test.go
(1 hunks)core/tests/openai_test.go
(1 hunks)core/tests/tests.go
(2 hunks)core/tests/vertex_test.go
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
core/tests/tests.go (1)
Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#64
File: transports/bifrost-http/integrations/genai/types.go:383-395
Timestamp: 2025-06-09T16:26:05.727Z
Learning: In the Bifrost schema (core/schemas/bifrost.go), only UserMessage and ToolMessage structs have ImageContent fields. AssistantMessage does not have an ImageContent field and cannot contain images. The schema design intentionally limits image content to user and tool messages only.
core/providers/anthropic.go (1)
Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#64
File: transports/bifrost-http/integrations/genai/types.go:383-395
Timestamp: 2025-06-09T16:26:05.727Z
Learning: In the Bifrost schema (core/schemas/bifrost.go), only UserMessage and ToolMessage structs have ImageContent fields. AssistantMessage does not have an ImageContent field and cannot contain images. The schema design intentionally limits image content to user and tool messages only.
core/providers/bedrock.go (2)
Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/anthropic.go:526-550
Timestamp: 2025-06-04T09:29:46.287Z
Learning: In core/providers/anthropic.go, the content field in formattedMessages is always of type []interface{} because it's explicitly constructed that way upstream in the prepareAnthropicChatRequest function. Defensive type casting for multiple types is not needed since the type is guaranteed by the construction logic.
Learnt from: Pratham-Mishra04
PR: maximhq/bifrost#55
File: core/providers/bedrock.go:443-468
Timestamp: 2025-06-04T09:32:15.826Z
Learning: In core/providers/bedrock.go, for tool call result messages (ModelChatMessageRoleTool), the Content field represents the actual tool call output. A tool result message should only be created when msg.Content is non-nil, as there's no semantic meaning to a tool result without output content.
🧬 Code Graph Analysis (7)
core/providers/openai.go (1)
core/providers/utils.go (1)
FormatImageContent
(391-445)
core/tests/tests.go (1)
core/schemas/bifrost.go (2)
ImageContentTypeURL
(177-177)ImageContentTypeBase64
(176-176)
core/providers/anthropic.go (2)
core/providers/utils.go (1)
FormatImageContent
(391-445)core/schemas/bifrost.go (4)
ImageContentTypeURL
(177-177)UserMessage
(155-157)ImageContent
(181-186)ToolMessage
(160-163)
core/bifrost.go (2)
core/schemas/bifrost.go (1)
ModelProvider
(31-31)core/schemas/provider.go (1)
Provider
(142-149)
core/providers/cohere.go (2)
core/schemas/bifrost.go (2)
UserMessage
(155-157)ImageContent
(181-186)core/providers/utils.go (1)
FormatImageContent
(391-445)
core/providers/bedrock.go (2)
core/providers/utils.go (1)
FormatImageContent
(391-445)core/schemas/bifrost.go (2)
ToolMessage
(160-163)ImageContent
(181-186)
core/providers/utils.go (1)
core/schemas/bifrost.go (3)
ImageContent
(181-186)ImageContentTypeBase64
(176-176)ImageContentTypeURL
(177-177)
🪛 golangci-lint (1.64.8)
core/providers/cohere.go
468-468: func processImageContent
is unused
(unused)
🔇 Additional comments (15)
core/tests/vertex_test.go (1)
24-25
: LGTM! Enhanced test coverage for image handling.Enabling image test setups for Vertex provider aligns with the PR's goal of standardizing image handling across all providers.
core/tests/azure_test.go (1)
25-25
: LGTM! Enhanced base image test coverage.Enabling base image setup for Azure provider supports comprehensive testing of the new standardized image handling capabilities.
core/tests/account.go (1)
80-80
: LGTM! Added vision model support.Adding the "c4ai-aya-vision-8b" model expands Cohere's vision capabilities, supporting the PR's image standardization efforts.
core/tests/openai_test.go (1)
25-26
: LGTM! Image testing enabled for OpenAI provider.The changes correctly enable both URL and base64 image testing for the OpenAI provider, aligning with the standardized image handling implementation.
core/tests/tests.go (2)
192-192
: Type safety improvement with ImageContentType constants.Good change replacing the raw string pointer with the typed constant
schemas.ImageContentTypeURL
, which improves type safety and aligns with the schema updates.
228-228
: Consistent use of typed constants for base64 images.Correctly uses
schemas.ImageContentTypeBase64
constant instead of a raw string pointer, maintaining consistency with the URL image test above.core/providers/openai.go (2)
229-229
: Excellent implementation of immutable image content formatting.The use of
FormatImageContent(&messageImageContent, true)
withincludePrefix=true
correctly creates a formatted copy for OpenAI's data URI requirements while preserving the original data. This addresses the mutation concerns raised in previous reviews.
244-244
: Consistent use of formatted image content.Correctly uses the
formattedImgContent
copy for both URL and detail fields, ensuring that OpenAI receives properly formatted image data without mutating the original content.Also applies to: 248-249
core/schemas/bifrost.go (2)
173-178
: Well-designed typed constants for image content.The introduction of
ImageContentType
as a string alias with explicit constants improves type safety and prevents invalid values. This is a much better approach than using raw string pointers.
182-185
: Improved ImageContent struct with better type safety.The changes enhance the struct by:
- Using the new
ImageContentType
instead of*string
for better type safety- Adding
omitempty
tags to optional fields (MediaType
andDetail
)- Making the
Type
field a required value type rather than optional pointerThese changes align perfectly with the image content standardization effort.
core/providers/bedrock.go (1)
498-516
: Excellent implementation of standardized image handling.The image content formatting and field usage is implemented correctly, ensuring consistent image data handling across providers. The media type extraction with prefix trimming and the use of formatted URL for bytes source follows the expected pattern.
core/bifrost.go (3)
218-220
: Good encapsulation by making internal methods private.Converting
SelectKeyFromProviderForModel
toselectKeyFromProviderForModel
properly encapsulates this internal utility method. The comment has been updated accordingly.
414-417
: Consistent encapsulation pattern applied.Converting
GetProviderQueue
togetProviderQueue
maintains consistency with the other method privatization and properly encapsulates internal queue management logic.
301-301
: All method calls updated correctly.All calling sites have been properly updated to use the new unexported method names, maintaining functional consistency.
Also applies to: 515-515, 689-689
core/providers/anthropic.go (1)
377-408
: Consistent copy pattern prevents side effects.Creating copies of
ImageContent
structs before processing ensures that the original message data remains unmodified, which is a best practice for preventing unintended mutations. The pattern is applied consistently for both UserMessage and ToolMessage scenarios.
7ef4bbd
to
43c207e
Compare
Unified Image Content Handling Across Providers
This PR implements a standardized approach to handling image content across all LLM providers. It adds two key utility functions:
NormalizeImageContent
- Automatically detects content type from URL patterns and normalizes media typesImageDataURLBuilder
- Formats base64 image data according to provider requirementsThe implementation:
Tests have been updated to verify image handling across providers, including both URL and base64 formats.