Skip to content

Add image inference support #8954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 22, 2025
Merged

Add image inference support #8954

merged 2 commits into from
Apr 22, 2025

Conversation

gsiddh
Copy link

@gsiddh gsiddh commented Apr 22, 2025

Adding image inference support.

@gsiddh gsiddh requested a review from a team as a code owner April 22, 2025 03:19
Copy link

changeset-bot bot commented Apr 22, 2025

⚠️ No Changeset found

Latest commit: 8cdb35c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

Vertex AI Mock Responses Check ⚠️

A newer major version of the mock responses for Vertex AI unit tests is available. update_vertexai_responses.sh should be updated to clone the latest version of the responses: v10.0

@google-oss-bot
Copy link
Contributor

google-oss-bot commented Apr 22, 2025

Size Report 1

Affected Products

  • @firebase/vertexai

    TypeBase (a46fa4a)Merge (d2a41fc)Diff
    browser38.9 kB39.5 kB+566 B (+1.5%)
    main39.9 kB40.5 kB+566 B (+1.4%)
    module38.9 kB39.5 kB+566 B (+1.5%)
  • firebase

    TypeBase (a46fa4a)Merge (d2a41fc)Diff
    firebase-vertexai.js31.2 kB31.6 kB+429 B (+1.4%)

Test Logs

  1. https://storage.googleapis.com/firebase-sdk-metric-reports/qyhWYZTP7I.html

@google-oss-bot
Copy link
Contributor

google-oss-bot commented Apr 22, 2025

Size Analysis Report 1

Affected Products

  • @firebase/vertexai

    • getGenerativeModel

      Size

      TypeBase (a46fa4a)Merge (d2a41fc)Diff
      size22.0 kB22.4 kB+417 B (+1.9%)
      size-with-ext-deps41.0 kB41.4 kB+417 B (+1.0%)

Test Logs

  1. https://storage.googleapis.com/firebase-sdk-metric-reports/clm2afjJgF.html

@gsiddh gsiddh force-pushed the vaihi-image-inf branch 3 times, most recently from 734a1ca to f815c59 Compare April 22, 2025 03:53
@gsiddh gsiddh changed the title first pass Add image inference support Apr 22, 2025
const promptOutput = 'hi';
const promptStub = stub(languageModel, 'prompt').resolves(promptOutput);
const onDeviceParams = {
systemPrompt: 'be yourself'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more relevant test would be to assert onDeviceParams.expectedInputs is threaded through, since that's required for image input to work on-device.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes. thanks for suggesting that. I was thinking to add it in - totally forgot about it. Adding it now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solving in next change within this PR.

Copy link

@erikeldridge erikeldridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shipit! 🚀

@gsiddh gsiddh merged commit 43a69d5 into vaihi-exp Apr 22, 2025
40 of 46 checks passed
@gsiddh gsiddh deleted the vaihi-image-inf branch April 22, 2025 15:49
gsiddh added a commit that referenced this pull request Apr 22, 2025
* Adding image based input for inference

* adding image as input to create language model object
gsiddh added a commit that referenced this pull request Apr 22, 2025
* Adding image based input for inference

* adding image as input to create language model object
gsiddh pushed a commit that referenced this pull request Apr 23, 2025
Fix languageCode parameter in action_code_url (#8912)

* Fix languageCode parameter in action_code_url

* Add changeset

Vaihi add langmodel types. (#8927)

* Adding LanguageModel types. These are based off https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#full-api-surface-in-web-idl

* Adding LanguageModel types.

* Remove bunch of exports

* yarn formatted

* after lint

Define HybridParams (#8935)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Adding smoke test for new hybrid params (#8937)

* Adding smoke test for new hybrid params

* Use the existing name of the model params input

---------

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to in-cloud naming (#8938)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to string type for the inference mode (#8941)

Define ChromeAdapter class (#8942)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

VinF Hybrid Inference: Implement ChromeAdapter (rebased) (#8943)

Adding count token impl (#8950)

VinF Hybrid Inference #4: ChromeAdapter in stream methods (rebased) (#8949)

Define values for Availability enum (#8951)

VinF Hybrid Inference: narrow Chrome input type (#8953)

Add image inference support (#8954)

* Adding image based input for inference

* adding image as input to create language model object

disable count tokens api for on-device inference (#8962)

VinF Hybrid Inference: throw if only_on_device and model is unavailable (#8965)
gsiddh pushed a commit that referenced this pull request Apr 23, 2025
Fix languageCode parameter in action_code_url (#8912)

* Fix languageCode parameter in action_code_url

* Add changeset

Vaihi add langmodel types. (#8927)

* Adding LanguageModel types. These are based off https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#full-api-surface-in-web-idl

* Adding LanguageModel types.

* Remove bunch of exports

* yarn formatted

* after lint

Define HybridParams (#8935)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Adding smoke test for new hybrid params (#8937)

* Adding smoke test for new hybrid params

* Use the existing name of the model params input

---------

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to in-cloud naming (#8938)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to string type for the inference mode (#8941)

Define ChromeAdapter class (#8942)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

VinF Hybrid Inference: Implement ChromeAdapter (rebased) (#8943)

Adding count token impl (#8950)

VinF Hybrid Inference #4: ChromeAdapter in stream methods (rebased) (#8949)

Define values for Availability enum (#8951)

VinF Hybrid Inference: narrow Chrome input type (#8953)

Add image inference support (#8954)

* Adding image based input for inference

* adding image as input to create language model object

disable count tokens api for on-device inference (#8962)

VinF Hybrid Inference: throw if only_on_device and model is unavailable (#8965)
gsiddh pushed a commit that referenced this pull request Apr 23, 2025
Fix languageCode parameter in action_code_url (#8912)

* Fix languageCode parameter in action_code_url

* Add changeset

Vaihi add langmodel types. (#8927)

* Adding LanguageModel types. These are based off https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#full-api-surface-in-web-idl

* Adding LanguageModel types.

* Remove bunch of exports

* yarn formatted

* after lint

Define HybridParams (#8935)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Adding smoke test for new hybrid params (#8937)

* Adding smoke test for new hybrid params

* Use the existing name of the model params input

---------

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to in-cloud naming (#8938)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to string type for the inference mode (#8941)

Define ChromeAdapter class (#8942)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

VinF Hybrid Inference: Implement ChromeAdapter (rebased) (#8943)

Adding count token impl (#8950)

VinF Hybrid Inference #4: ChromeAdapter in stream methods (rebased) (#8949)

Define values for Availability enum (#8951)

VinF Hybrid Inference: narrow Chrome input type (#8953)

Add image inference support (#8954)

* Adding image based input for inference

* adding image as input to create language model object

disable count tokens api for on-device inference (#8962)

VinF Hybrid Inference: throw if only_on_device and model is unavailable (#8965)
gsiddh pushed a commit that referenced this pull request Apr 23, 2025
Fix languageCode parameter in action_code_url (#8912)

* Fix languageCode parameter in action_code_url

* Add changeset

Vaihi add langmodel types. (#8927)

* Adding LanguageModel types. These are based off https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#full-api-surface-in-web-idl

* Adding LanguageModel types.

* Remove bunch of exports

* yarn formatted

* after lint

Define HybridParams (#8935)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Adding smoke test for new hybrid params (#8937)

* Adding smoke test for new hybrid params

* Use the existing name of the model params input

---------

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to in-cloud naming (#8938)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to string type for the inference mode (#8941)

Define ChromeAdapter class (#8942)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

VinF Hybrid Inference: Implement ChromeAdapter (rebased) (#8943)

Adding count token impl (#8950)

VinF Hybrid Inference #4: ChromeAdapter in stream methods (rebased) (#8949)

Define values for Availability enum (#8951)

VinF Hybrid Inference: narrow Chrome input type (#8953)

Add image inference support (#8954)

* Adding image based input for inference

* adding image as input to create language model object

disable count tokens api for on-device inference (#8962)

VinF Hybrid Inference: throw if only_on_device and model is unavailable (#8965)
gsiddh pushed a commit that referenced this pull request Apr 23, 2025
Fix languageCode parameter in action_code_url (#8912)

* Fix languageCode parameter in action_code_url

* Add changeset

Vaihi add langmodel types. (#8927)

* Adding LanguageModel types. These are based off https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#full-api-surface-in-web-idl

* Adding LanguageModel types.

* Remove bunch of exports

* yarn formatted

* after lint

Define HybridParams (#8935)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Adding smoke test for new hybrid params (#8937)

* Adding smoke test for new hybrid params

* Use the existing name of the model params input

---------

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to in-cloud naming (#8938)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to string type for the inference mode (#8941)

Define ChromeAdapter class (#8942)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

VinF Hybrid Inference: Implement ChromeAdapter (rebased) (#8943)

Adding count token impl (#8950)

VinF Hybrid Inference #4: ChromeAdapter in stream methods (rebased) (#8949)

Define values for Availability enum (#8951)

VinF Hybrid Inference: narrow Chrome input type (#8953)

Add image inference support (#8954)

* Adding image based input for inference

* adding image as input to create language model object

disable count tokens api for on-device inference (#8962)

VinF Hybrid Inference: throw if only_on_device and model is unavailable (#8965)
gsiddh pushed a commit that referenced this pull request Apr 23, 2025
Fix languageCode parameter in action_code_url (#8912)

* Fix languageCode parameter in action_code_url

* Add changeset

Vaihi add langmodel types. (#8927)

* Adding LanguageModel types. These are based off https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#full-api-surface-in-web-idl

* Adding LanguageModel types.

* Remove bunch of exports

* yarn formatted

* after lint

Define HybridParams (#8935)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Adding smoke test for new hybrid params (#8937)

* Adding smoke test for new hybrid params

* Use the existing name of the model params input

---------

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to in-cloud naming (#8938)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to string type for the inference mode (#8941)

Define ChromeAdapter class (#8942)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

VinF Hybrid Inference: Implement ChromeAdapter (rebased) (#8943)

Adding count token impl (#8950)

VinF Hybrid Inference #4: ChromeAdapter in stream methods (rebased) (#8949)

Define values for Availability enum (#8951)

VinF Hybrid Inference: narrow Chrome input type (#8953)

Add image inference support (#8954)

* Adding image based input for inference

* adding image as input to create language model object

disable count tokens api for on-device inference (#8962)

VinF Hybrid Inference: throw if only_on_device and model is unavailable (#8965)
gsiddh pushed a commit that referenced this pull request Apr 23, 2025
Fix languageCode parameter in action_code_url (#8912)

* Fix languageCode parameter in action_code_url

* Add changeset

Vaihi add langmodel types. (#8927)

* Adding LanguageModel types. These are based off https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#full-api-surface-in-web-idl

* Adding LanguageModel types.

* Remove bunch of exports

* yarn formatted

* after lint

Define HybridParams (#8935)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Adding smoke test for new hybrid params (#8937)

* Adding smoke test for new hybrid params

* Use the existing name of the model params input

---------

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to in-cloud naming (#8938)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

Moving to string type for the inference mode (#8941)

Define ChromeAdapter class (#8942)

Co-authored-by: Erik Eldridge <erikeldridge@google.com>

VinF Hybrid Inference: Implement ChromeAdapter (rebased) (#8943)

Adding count token impl (#8950)

VinF Hybrid Inference #4: ChromeAdapter in stream methods (rebased) (#8949)

Define values for Availability enum (#8951)

VinF Hybrid Inference: narrow Chrome input type (#8953)

Add image inference support (#8954)

* Adding image based input for inference

* adding image as input to create language model object

disable count tokens api for on-device inference (#8962)

VinF Hybrid Inference: throw if only_on_device and model is unavailable (#8965)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants