Skip to content

Implement Gemini Vision #551

Closed
Closed
@abrichr

Description

@abrichr

Feature request

We would like to implement Gemini 1.5 and/or Gemini Vision in openadapt.adapters.gemini.

Related: #565

Gemini 1.5: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#sundar-note

Gemini Vision: image

Motivation

https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#sundar-note

1 million tokens

https://www.linkedin.com/feed/update/urn:li:activity:7140972956314247168/

Gemini Pro Vision (multimodal) works good and is available to everyone! I did a quick test and the results I got were similar to GPT4-Vision.
Descriptions are accurate. Colors, and directions of objects are correct! Something Llava did not get right, unfortunately ...
⚡The good thing is you can use Gemini Pro already today, at 1 𝐫𝐞𝐪𝐮𝐞𝐬𝐭 𝐩𝐞𝐫 𝐬𝐞𝐜𝐨𝐧𝐝 compared to 100 requests per day for GPT-4 vision.
Google has enough GPUs to serve the world apparently 😏

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions