[Feat] Q3VL: Exclude calibration dataset from testing by Victor49152 · Pull Request #271 · mlcommons/endpoints

Victor49152 · 2026-04-07T21:48:40Z

What does this PR do?

Exclude Calibration dataset from perf/acc testing as required for v6.1 round.

Type of change

New feature

Related issues

Checklist

Code follows project style
Pre-commit hooks pass

github-actions · 2026-04-07T21:48:52Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request introduces a list of calibration sample indices to be excluded during the Shopify product catalogue dataset generation. The review feedback suggests using a set for the index list to optimize lookup performance and recommends using unique identifiers instead of absolute indices to ensure robustness across different dataset splits.

src/inference_endpoint/dataset_manager/predefined/shopify_product_catalogue/__init__.py

gemini-code-assist · 2026-04-07T21:50:52Z

src/inference_endpoint/dataset_manager/predefined/shopify_product_catalogue/__init__.py

+            if i in CALIBRATION_SAMPLE_INDEX:
+                continue


Filtering by absolute indices is fragile because it depends on the specific dataset split and ordering. If generate is called with a different split (e.g., ['test'] instead of the default ['train', 'test']), the index i will no longer correspond to the same samples, and the calibration set may not be correctly excluded. Consider using a unique identifier for filtering if available, or adding a check to ensure the exclusion only applies to the expected split.

The dataset is fixed for this task and predefined dataset will be pulled in the same way always, these splits are not really configurable from any configs. So this should be OK for now.

Who is the caller of ShopifyProductCatalogue.generate()?

Looking at the code alone, split is a parameter of ShopifyProductCatalogue.generate() which means it could change if the caller chooses to pass in an argument for split specifically. Taking this into consideration, a design that might make more sense is to have calibration_sample_indices also as an parameter of ShopifyProductCatalogue.generate(), and its default value would be CALIBRATION_SAMPLE_INDEX (or DEFAULT_CALIBRATION_SAMPLE_INDEX). When defining the v6.1 round benchmark settings, you could then put into the readme or the yaml that both split and calibration_sample_indices should not be specified to something other than None or the default values.

The funny part is that at least in the current setup, when you call a predefined dataset, the individual kwargs of this subclass is not configurable by the user. That means if the user don't hack the source code, those kwargs for generate() is the default value all the time. That's why I didn't bother.

You idea is absolutely a better design in general though.

arekay-nv · 2026-04-08T15:57:33Z

@Victor49152 can you add some context on why this is needed? Is the calibration dataset determined by the working group? We might want to rethink the design as this could be useful for other datasets as well.

nvzhihanj · 2026-04-08T16:19:10Z

@Victor49152 can you add some context on why this is needed? Is the calibration dataset determined by the working group? We might want to rethink the design as this could be useful for other datasets as well.

Usually the convention is that we don't need to exclude the calibration dataset from the inference dataset, we simply use ~10% of it to generate quantization dataset

Victor49152 · 2026-04-08T22:20:15Z

@Victor49152 can you add some context on why this is needed? Is the calibration dataset determined by the working group? We might want to rethink the design as this could be useful for other datasets as well.

Based on our experience the model accuracy is not very sensitive on the choice of calibration samples, besides it's only 20/48289 which is very small portion.

However, we received an email from Anton Lokhmotov two weeks ago that states it's a convention to set aside the calibration dataset from validation dataset for fairness. And it's better to follow the same convention for VLM, I'm trying to address this ask.

Exclude calibration dataset from testing

83201b7

Victor49152 requested a review from a team April 7, 2026 21:48

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

precommit-fix

7026abb

Victor49152 force-pushed the feat/q3vl_calibration_dataset branch from fef312e to 7026abb Compare April 7, 2026 22:01

Set may be better

a9f7a8d

Victor49152 self-assigned this Apr 7, 2026

Victor49152 added the type: feature New feature or capability label Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Q3VL: Exclude calibration dataset from testing#271

[Feat] Q3VL: Exclude calibration dataset from testing#271
Victor49152 wants to merge 3 commits intomlcommons:mainfrom
Victor49152:feat/q3vl_calibration_dataset

Victor49152 commented Apr 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

Victor49152 Apr 7, 2026

Uh oh!

wangshangsam Apr 8, 2026

Uh oh!

Victor49152 Apr 8, 2026

Uh oh!

arekay-nv commented Apr 8, 2026

Uh oh!

nvzhihanj commented Apr 8, 2026

Uh oh!

Victor49152 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Victor49152 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Related issues

Checklist

Uh oh!

github-actions bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Victor49152 Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

wangshangsam Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Victor49152 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

arekay-nv commented Apr 8, 2026

Uh oh!

nvzhihanj commented Apr 8, 2026

Uh oh!

Victor49152 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Victor49152 commented Apr 7, 2026 •

edited

Loading

github-actions bot commented Apr 7, 2026 •

edited

Loading