[Question] Using CLIP for simple image-text similarity

I'm trying to get a simple image-text similarity thing working with CLIP, and I'm not sure how to do it, or whether it's currently supported with Transformers.js outside of the zero-shot image classification pipeline.

Is there a code example somewhere to get me started? Here's what I have so far:

```js
import { AutoModel, AutoTokenizer } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.1.1';
let tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
let model = await AutoModel.from_pretrained('Xenova/clip-vit-base-patch16');
let inputIds = await tokenizer(["cat", "astronaut"]);
let image = await fetch("https://i.imgur.com/fYhUGoY.jpg").then(r => r.blob());
// how to process the image, and how to pass the image and inputIds to `model`?
```
Here's what I see if I inspect the `model` function in DevTools:

![image](https://github.com/xenova/transformers.js/assets/1167575/8259ccb3-296e-4102-97ff-81a094fd0b83)

I also tried this:

```js
import { AutoModel, AutoTokenizer, AutoProcessor } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.1.1';
let model = await AutoModel.from_pretrained('Xenova/clip-vit-base-patch16');
let processor = await AutoProcessor.from_pretrained("Xenova/clip-vit-base-patch16");
let inputs = await processor({text:["a photo of a cat", "a photo of an astronaut"], images:["https://i.imgur.com/fYhUGoY.jpg"]});
let outputs = await model(inputs);
```

But it seems that `processor` expects an array of images, or something? The above code throws an error saying that an `.rgb()` method should exist on the input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Using CLIP for simple image-text similarity #136

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Using CLIP for simple image-text similarity #136

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions