Skip to content

Implement multimodal request support for Gemini API (#2) #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

simon-stratagile
Copy link

This pull request introduces multimodal support for the Google Gemini API within the ChatAIze.GenerativeCS library, addressing issue #2. Users can now send requests combining text with various file types (PDF, DOC, TXT, images, audio, video).

Key Changes Implemented:

  • Gemini File Service (FileService.cs, IFileService.cs):

    • Added a new service to handle interactions with the Gemini Files API.
    • Supports uploading files (via path or stream), retrieving file metadata, listing uploaded files, and deleting files.
    • Follows the resumable upload protocol as per Gemini API documentation.
    • File service and interface names (FileService, IFileService) are aligned with existing provider naming conventions (e.g., ChatCompletion.cs).
  • Enhanced Chat Message Structure (ChatMessage.cs, ChatContentPart.cs):

    • ChatMessage.cs now uses an ICollection<IChatContentPart> Parts property to hold different content types within a single message.
    • Introduced IChatContentPart interface and concrete TextPart and FileDataPart classes.
    • FileDataPart encapsulates FileDataSource (MIME type and file URI) for referencing uploaded files.
    • The existing ChatMessage.Content property has been marked [Obsolete] and now acts as a getter/setter for the first TextPart in the Parts collection to maintain backward compatibility.
  • Updated Gemini Chat Provider (ChatCompletion.cs):

    • The CreateChatCompletionRequest method now iterates through message.Parts.
    • Correctly serializes TextPart and FileDataPart (including mime_type and file_uri) into the JSON payload for the Gemini API's generateContent endpoint.
    • Ensures an empty text part is added if a message has no other content parts, as required by the Gemini API.
    • Obsolete warnings for ChatMessage.Content usage (for backward compatibility fallback) have been suppressed with #pragma.
  • Client and DI Integration (GeminiClient.cs, GeminiClientExtension.cs):

    • GeminiClient.cs now instantiates and exposes an IFileService through a public Files property.
    • Dependency Injection in GeminiClientExtension.cs has been updated to register IFileService as a singleton, resolving its instance from the GeminiClient.Files property. This ensures a consistent IFileService instance is used.
  • Model Updates (Models/Gemini/)

    • Added GeminiFile.cs, GeminiFileUploadRequest.cs, GeminiListFilesResponse.cs to represent data structures for the Gemini Files API.
    • Addressed nullable warnings (CS8618) in these models by using the required modifier for non-nullable properties expected from the API and initializing collections.
  • Documentation & Packaging:

    • Updated README.md with a new section explaining how to use the multimodal features, including accessing IFileService, uploading files, and sending chat messages with file references.
    • Incremented the library version in ChatAIze.GenerativeCS.csproj to 0.15.0.
    • Updated package description and tags in the .csproj file to reflect the new multimodal capabilities.

How to Test:

  1. Obtain an instance of GeminiClient.
  2. Access the file service via geminiClient.Files.
  3. Upload a supported file (e.g., PDF, PNG) using fileService.UploadFileAsync(...).
  4. Create a Chat object and add a ChatMessage.
  5. To the ChatMessage.Parts collection, add a TextPart and a FileDataPart using the MimeType and Uri from the uploaded file.
  6. Call geminiClient.CompleteAsync(chat) and observe the model's response, which should consider the content of the uploaded file.

Future Considerations (Not in this PR):

  • Adding higher-level convenience wrappers in GeminiClient.cs to simplify the process of sending a message with a local file (e.g., a method that handles both upload and message creation).

This implementation adheres to the existing coding patterns and architectural style of the library.

Fixes #2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Multimodal Request Support for Gemini API
1 participant