Skip to content

Commit 2c92943

Browse files
authored
[version] Update to 3.1.0
* Formatting * [version] Update to 3.1.0 * Fix jina clip processor * Fix typo * Support partial model inputs for `JinaCLIPModel` * Increase model load test time (avoid timeouts)
1 parent e848907 commit 2c92943

14 files changed

+128
-16
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ npm i @huggingface/transformers
4747
Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with:
4848
```html
4949
<script type="module">
50-
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2';
50+
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.1.0';
5151
</script>
5252
```
5353

@@ -155,7 +155,7 @@ Check out the Transformers.js [template](https://huggingface.co/new-space?templa
155155

156156

157157

158-
By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2/dist/), which should work out-of-the-box. You can customize this as follows:
158+
By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.1.0/dist/), which should work out-of-the-box. You can customize this as follows:
159159

160160
### Settings
161161

docs/snippets/2_installation.snippet

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ npm i @huggingface/transformers
77
Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with:
88
```html
99
<script type="module">
10-
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2';
10+
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.1.0';
1111
</script>
1212
```

docs/snippets/4_custom-usage.snippet

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22

3-
By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2/dist/), which should work out-of-the-box. You can customize this as follows:
3+
By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.1.0/dist/), which should work out-of-the-box. You can customize this as follows:
44

55
### Settings
66

package-lock.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@huggingface/transformers",
3-
"version": "3.0.2",
3+
"version": "3.1.0",
44
"description": "State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!",
55
"main": "./src/transformers.js",
66
"types": "./types/transformers.d.ts",

src/env.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ import fs from 'fs';
2626
import path from 'path';
2727
import url from 'url';
2828

29-
const VERSION = '3.0.2';
29+
const VERSION = '3.1.0';
3030

3131
// Check if various APIs are available (depends on environment)
3232
const IS_BROWSER_ENV = typeof self !== 'undefined';

src/models.js

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3759,7 +3759,43 @@ export class ChineseCLIPModel extends ChineseCLIPPreTrainedModel { }
37593759
// JinaCLIP models
37603760
export class JinaCLIPPreTrainedModel extends PreTrainedModel { }
37613761

3762-
export class JinaCLIPModel extends JinaCLIPPreTrainedModel { }
3762+
export class JinaCLIPModel extends JinaCLIPPreTrainedModel {
3763+
async forward(model_inputs) {
3764+
const missing_text_inputs = !model_inputs.input_ids;
3765+
const missing_image_inputs = !model_inputs.pixel_values;
3766+
3767+
if (missing_text_inputs && missing_image_inputs) {
3768+
throw new Error('Either `input_ids` or `pixel_values` should be provided.');
3769+
}
3770+
3771+
// If either `input_ids` or `pixel_values` aren't passed, we need to create dummy input since the model requires a value to be specified.
3772+
if (missing_text_inputs) {
3773+
// NOTE: We cannot pass zero-dimension tensor as input for input_ids.
3774+
// Fortunately, the majority of time is spent in the vision encoder, so this shouldn't significantly impact performance.
3775+
model_inputs.input_ids = ones([model_inputs.pixel_values.dims[0], 1]);
3776+
}
3777+
3778+
if (missing_image_inputs) {
3779+
// NOTE: Since we create a zero-sized tensor, this does not increase computation time.
3780+
// @ts-ignore
3781+
const { image_size } = this.config.vision_config;
3782+
model_inputs.pixel_values = full([0, 3, image_size, image_size], 0.0); // (pass zero-dimension tensor)
3783+
}
3784+
3785+
const { text_embeddings, image_embeddings, l2norm_text_embeddings, l2norm_image_embeddings } = await super.forward(model_inputs);
3786+
3787+
const result = {};
3788+
if (!missing_text_inputs) {
3789+
result.text_embeddings = text_embeddings;
3790+
result.l2norm_text_embeddings = l2norm_text_embeddings;
3791+
}
3792+
if (!missing_image_inputs) {
3793+
result.image_embeddings = image_embeddings;
3794+
result.l2norm_image_embeddings = l2norm_image_embeddings;
3795+
}
3796+
return result
3797+
}
3798+
}
37633799

37643800
export class JinaCLIPTextModel extends JinaCLIPPreTrainedModel {
37653801
/** @type {typeof PreTrainedModel.from_pretrained} */
Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,26 @@
1-
import {
1+
import {
22
ImageProcessor,
33
} from "../../base/image_processors_utils.js";
44

5-
export class JinaCLIPImageProcessor extends ImageProcessor {}
5+
export class JinaCLIPImageProcessor extends ImageProcessor {
6+
constructor(config) {
7+
// JinaCLIPImageProcessor uses a custom preprocessor_config.json, so we configure it here
8+
const { resize_mode, fill_color, interpolation, size, ...other } = config;
9+
10+
const new_size = resize_mode === 'squash'
11+
? { width: size, height: size }
12+
: resize_mode === 'shortest'
13+
? { shortest_edge: size }
14+
: { longest_edge: size };
15+
16+
const resample = interpolation === 'bicubic' ? 3 : 2;
17+
super({
18+
...other,
19+
size: new_size,
20+
resample,
21+
do_center_crop: true,
22+
crop_size: size,
23+
do_normalize: true,
24+
});
25+
}
26+
}
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
2+
import { Processor } from "../../base/processing_utils.js";
3+
import { AutoImageProcessor } from "../auto/image_processing_auto.js";
4+
import { AutoTokenizer } from "../../tokenizers.js";
5+
6+
export class JinaCLIPProcessor extends Processor {
7+
static tokenizer_class = AutoTokenizer
8+
static image_processor_class = AutoImageProcessor
9+
10+
async _call(text=null, images=null, kwargs = {}) {
11+
12+
if (!text && !images){
13+
throw new Error('Either text or images must be provided');
14+
}
15+
16+
const text_inputs = text ? this.tokenizer(text, kwargs) : {};
17+
const image_inputs = images ? await this.image_processor(images, kwargs) : {};
18+
19+
return {
20+
...text_inputs,
21+
...image_inputs,
22+
}
23+
}
24+
}

src/models/processors.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
export * from './florence2/processing_florence2.js';
22
export * from './mgp_str/processing_mgp_str.js';
33
export * from './janus/processing_janus.js';
4+
export * from './jina_clip/processing_jina_clip.js';
45
export * from './owlvit/processing_owlvit.js';
56
export * from './pyannote/processing_pyannote.js';
67
export * from './qwen2_vl/processing_qwen2_vl.js';

tests/init.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ export function init() {
5757
registerBackend("test", onnxruntimeBackend, Number.POSITIVE_INFINITY);
5858
}
5959

60-
export const MAX_MODEL_LOAD_TIME = 10_000; // 10 seconds
60+
export const MAX_MODEL_LOAD_TIME = 15_000; // 15 seconds
6161
export const MAX_TEST_EXECUTION_TIME = 30_000; // 30 seconds
6262
export const MAX_MODEL_DISPOSE_TIME = 1_000; // 1 second
6363

tests/processors.test.js

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ env.useFSCache = false;
1010
const sum = (array) => Number(array.reduce((a, b) => a + b, array instanceof BigInt64Array ? 0n : 0));
1111
const avg = (array) => sum(array) / array.length;
1212

13+
/** @type {Map<string, RawImage>} */
1314
const IMAGE_CACHE = new Map();
1415
const load_image = async (url) => {
1516
const cached = IMAGE_CACHE.get(url);
@@ -40,6 +41,7 @@ const MODELS = {
4041
nougat: "Xenova/nougat-small",
4142
owlvit: "Xenova/owlvit-base-patch32",
4243
clip: "Xenova/clip-vit-base-patch16",
44+
jina_clip: "jinaai/jina-clip-v2",
4345
vitmatte: "Xenova/vitmatte-small-distinctions-646",
4446
dinov2: "Xenova/dinov2-small-imagenet1k-1-layer",
4547
// efficientnet: 'Xenova/efficientnet-b0',
@@ -490,6 +492,27 @@ describe("Processors", () => {
490492
MAX_TEST_EXECUTION_TIME,
491493
);
492494

495+
// JinaCLIPImageProcessor
496+
// - custom config overrides
497+
it(
498+
MODELS.jina_clip,
499+
async () => {
500+
const processor = await AutoImageProcessor.from_pretrained(MODELS.jina_clip);
501+
502+
{
503+
const image = await load_image(TEST_IMAGES.tiger);
504+
const { pixel_values, original_sizes, reshaped_input_sizes } = await processor(image);
505+
506+
compare(pixel_values.dims, [1, 3, 512, 512]);
507+
compare(avg(pixel_values.data), -0.06637834757566452);
508+
509+
compare(original_sizes, [[408, 612]]);
510+
compare(reshaped_input_sizes, [[512, 512]]);
511+
}
512+
},
513+
MAX_TEST_EXECUTION_TIME,
514+
);
515+
493516
// VitMatteImageProcessor
494517
// - tests custom overrides
495518
// - tests multiple inputs

tests/utils/tensor.test.js

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,11 @@ describe("Tensor operations", () => {
6969
});
7070

7171
it("should return a crop", async () => {
72-
const t1 = new Tensor("float32", Array.from({ length: 28 }, (_, i) => i + 1), [4, 7]);
72+
const t1 = new Tensor(
73+
"float32",
74+
Array.from({ length: 28 }, (_, i) => i + 1),
75+
[4, 7],
76+
);
7377
const t2 = t1.slice([1, -1], [1, -1]);
7478

7579
const target = new Tensor("float32", [9, 10, 11, 12, 13, 16, 17, 18, 19, 20], [2, 5]);

tests/utils/utils.test.js

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -65,14 +65,14 @@ describe("Utilities", () => {
6565
const [width, height, channels] = [2, 2, 3];
6666
const data = Uint8Array.from({ length: width * height * channels }, (_, i) => i % 5);
6767
const tiny_image = new RawImage(data, width, height, channels);
68-
68+
6969
let image;
7070
beforeAll(async () => {
7171
image = await RawImage.fromURL("https://picsum.photos/300/200");
7272
});
7373

7474
it("Can split image into separate channels", async () => {
75-
const image_data = tiny_image.split().map(x => x.data);
75+
const image_data = tiny_image.split().map((x) => x.data);
7676

7777
const target = [
7878
new Uint8Array([0, 3, 1, 4]), // Reds
@@ -84,7 +84,10 @@ describe("Utilities", () => {
8484
});
8585

8686
it("Can splits channels for grayscale", async () => {
87-
const image_data = tiny_image.grayscale().split().map(x => x.data);
87+
const image_data = tiny_image
88+
.grayscale()
89+
.split()
90+
.map((x) => x.data);
8891
const target = [new Uint8Array([1, 3, 2, 1])];
8992

9093
compare(image_data, target);

0 commit comments

Comments
 (0)