fix : Remove pkl support for reading datasets by arekay-nv · Pull Request #233 · mlcommons/endpoints

arekay-nv · 2026-04-01T02:08:10Z

What does this PR do?

Pickle support can enable arbitrary code execution hence we would like to discourage it.
Note that open-orca only exists in pickle format from MLC - so we convert it internally to jsonl.

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

closes #184

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

github-actions · 2026-04-01T02:08:19Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copilot

Pull request overview

This PR removes pickle (.pkl) dataset loading support to reduce the risk of arbitrary code execution, migrating tests, docs, and templates to use JSONL instead and adding a JSONL test dataset artifact.

Changes:

Replace .pkl references with .jsonl across unit/integration tests and documentation.
Remove .pkl loader support from DatasetFormat / dataset loading implementation.
Add a JSONL dummy dataset and update the OpenOrca predefined dataset flow to cache as JSONL.

Reviewed changes

Copilot reviewed 27 out of 31 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
tests/unit/dataset_manager/test_data_loader.py	Updates dataset loader unit tests from pickle reader fixtures to JSONL.
tests/unit/config/test_yaml_loader.py	Updates YAML loader tests to reference `.jsonl` dataset paths.
tests/unit/config/test_schema.py	Updates schema tests to use `.jsonl` dataset paths.
tests/unit/commands/test_util_commands.py	Updates CLI validation tests to use `.jsonl` dataset paths.
tests/unit/commands/test_benchmark.py	Updates benchmark tests to use `.jsonl` dataset strings/paths.
tests/integration/test_server_roundtrip.py	Switches integration fixture usage from pickle dataset path to JSONL dataset path.
tests/integration/test_end_to_end_oracle.py	Switches end-to-end oracle tests to use JSONL dataset path fixture.
tests/integration/commands/test_benchmark_command.py	Updates integration benchmark command tests to use JSONL dataset fixture.
tests/datasets/Readme.md	Updates test dataset documentation to JSONL naming/format.
tests/datasets/dummy_1k.jsonl	Adds a JSONL dummy dataset for local CLI/testing.
tests/conftest.py	Renames dataset fixtures from pickle to JSONL and updates oracle server fixture wiring.
src/inference_endpoint/dataset_manager/README.md	Removes `.pkl` from the documented supported formats list.
src/inference_endpoint/dataset_manager/predefined/open_orca/init.py	Converts upstream OpenOrca pickle artifact into a local JSONL cache.
src/inference_endpoint/dataset_manager/factory.py	Updates factory documentation to remove pickle support mention.
src/inference_endpoint/dataset_manager/dataset.py	Removes pickle loader and related enum entries; updates docs/comments accordingly.
src/inference_endpoint/config/utils.py	Updates dataset string parsing examples away from `.pkl`.
src/inference_endpoint/config/templates/submission_template.yaml	Updates template dataset paths and formats to JSONL.
src/inference_endpoint/config/templates/offline_template.yaml	Updates offline template dataset path/format to JSONL.
src/inference_endpoint/config/templates/eval_template.yaml	Updates eval template dataset path/format to JSONL.
src/inference_endpoint/config/templates/concurrency_template.yaml	Updates concurrency template dataset path/format to JSONL.
scripts/create_dummy_dataset.py	Updates dummy dataset generator to output JSONL instead of pickle.
README.md	Updates CLI usage examples to reference the JSONL dummy dataset.
examples/02_ServerBenchmarking/README.md	Updates example command to use JSONL dummy dataset.
docs/LOCAL_TESTING.md	Updates local testing docs to use JSONL datasets and revised supported formats text.
docs/DEVELOPMENT.md	Updates development docs to reference JSONL test datasets.
docs/CLI_QUICK_REFERENCE.md	Updates CLI reference examples and defaults to JSONL datasets.
docs/CLI_DESIGN.md	Updates CLI design examples to reference JSONL datasets.
AGENTS.md	Updates architecture/testing docs to reflect JSONL-based test data and supported formats.

Comments suppressed due to low confidence (2)

docs/CLI_QUICK_REFERENCE.md:59

The CLI example uses format=jsonl, but the code path that casts Dataset.format into DatasetFormat expects enum values like .jsonl (leading dot) or huggingface. As written, this example will raise a ValueError at runtime when creating the loader; update the docs (and templates/tests) to use .jsonl or omit format and rely on extension auto-detection.

```bash
--dataset data.jsonl                                         # simple path
--dataset acc:eval.jsonl                                     # accuracy dataset
--dataset data.csv,samples=500,parser.prompt=article         # with options
--dataset perf:data.jsonl,format=jsonl,parser.prompt=text    # explicit format + remap

**src/inference_endpoint/dataset_manager/factory.py:38**
* The factory docstring lists supported formats, but it omits formats that are actually implemented (`csv` and `json` loaders exist in `dataset.py`). Please update this list so it stays aligned with the real loader registry and doesn't mislead users.

class DataLoaderFactory:
"""Factory for creating dataset loaders based on format.

Supports:
- parquet: Parquet format (ParquetReader)
- jsonl: JSON Lines format (JsonlReader)
- hf: HuggingFace datasets (HFDataLoader)
"""

</details>



---

💡 <a href="/mlcommons/endpoints/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.

docs/CLI_QUICK_REFERENCE.md

src/inference_endpoint/config/templates/offline_template.yaml

src/inference_endpoint/config/templates/eval_template.yaml

src/inference_endpoint/config/templates/concurrency_template.yaml

src/inference_endpoint/config/templates/submission_template.yaml

tests/unit/commands/test_benchmark.py

src/inference_endpoint/dataset_manager/predefined/open_orca/__init__.py

src/inference_endpoint/config/templates/submission_template.yaml

gemini-code-assist

Code Review

This pull request migrates the project's dataset format from pickle (.pkl) to JSONL (.jsonl). Key changes include removing pickle-related loaders from the DatasetManager, updating the dummy dataset generation script, and comprehensively revising documentation, examples, and test suites to use the new format. Review feedback identifies several documentation inconsistencies, such as the continued listing of the removed npy format, outdated docstring references for JSON structures, and incorrect class names in the DataLoaderFactory documentation.

docs/CLI_QUICK_REFERENCE.md

src/inference_endpoint/dataset_manager/dataset.py

src/inference_endpoint/dataset_manager/factory.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 27 out of 31 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/dataset_manager/predefined/open_orca/__init__.py

src/inference_endpoint/dataset_manager/factory.py

scripts/create_dummy_dataset.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv · 2026-04-01T14:45:13Z

/gemini review

gemini-code-assist

Code Review

This pull request replaces the use of Pickle-based datasets with JSONL across the entire project, including documentation, configuration templates, and test suites. The PickleListLoader has been removed from the dataset manager, and a conversion utility was implemented for the OpenOrca dataset to migrate existing pickle artifacts to JSONL. Feedback suggests addressing a security vulnerability in the OpenOrca conversion logic, where pd.read_pickle is still used, by implementing integrity verification such as checksums for the downloaded files to prevent arbitrary code execution.

src/inference_endpoint/dataset_manager/predefined/open_orca/__init__.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 27 out of 31 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

tests/integration/test_server_roundtrip.py:33

The test docstring still says it is testing the "PickleReader", but the fixture and loader now use a JSONL dataset path. Please update the docstring to avoid confusion while debugging failures.

    """
    Test the PickleReader by performing a roundtrip request through a mock HTTP Oracle server.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/dataset_manager/predefined/open_orca/__init__.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 27 out of 31 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

tests/integration/test_server_roundtrip.py:33

The test docstring still says "PickleReader" even though this test now uses the JSONL dataset fixture. Please update the wording to avoid confusion about which formats are supported and what this test is validating.

    """
    Test the PickleReader by performing a roundtrip request through a mock HTTP Oracle server.

src/inference_endpoint/dataset_manager/dataset.py:57

Pickle support was removed from DatasetFormat, but there’s no unit test asserting that loading a .pkl dataset now fails with a clear error. Adding a regression test (e.g., Dataset.load_from_file('x.pkl') raising ValueError) would help ensure pickle inputs are consistently rejected.

class DatasetFormat(Enum):
    """Enum defining possible supported formats for accuracy datasets to be saved. The value of the enum
    defines the file extension to be saved as.
    """

    CSV = ".csv"
    """Comma-separated values file with a column header."""

    PARQUET = ".parquet"
    """Apache Parquet file."""

    JSON = ".json"
    """JSON file containing a list of records (dictionaries), where keys are column names."""

    JSONL = ".jsonl"
    """JSON Lines file. Each line is a JSON object where the keys are the column names. It is assumed that
    every row has the same keys."""

    HF = "huggingface"
    """HuggingFace dataset."""

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/dataset_manager/predefined/open_orca/__init__.py

src/inference_endpoint/config/templates/submission_template.yaml

tests/integration/test_end_to_end_oracle.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

viraatc

Review Council — Multi-AI Code Review

Reviewed by: Claude (Codex unavailable due to sandbox restrictions) | Depth: thorough

Found 1 new issue across 1 file (25 existing inline comments were already on this PR — all other findings were already covered):

0 critical/high
0 medium
1 low

#	File	Line	Severity	Category	Summary
1	`tests/integration/test_server_roundtrip.py`	32	low	design	Stale "PickleReader" docstring not updated to match fixture rename

Additional note: Testing gap

After removing DatasetFormat.PICKLE, .PANDAS_DF, and .NUMPY_ARRAY from the enum, there is no negative test verifying that these formats are now rejected (e.g., that DatasetFormat(".pkl") raises ValueError). Consider adding a parametrized test to confirm removed formats are properly rejected — this guards against accidental re-introduction.

What was already well-covered by existing reviewers

The 25 existing comments from Copilot, Gemini, and nv-alicheng thoroughly covered:

Double SHA-256 verification in open_orca/__init__.py (Copilot, nv-alicheng)
Security implications of residual pd.read_pickle() usage (Copilot, Gemini)
Atomic write pattern explanation (nv-alicheng, author)
Factory docstring accuracy (Gemini, Copilot)
Variable naming with format-specific prefixes (nv-alicheng)

Overall this is a clean, well-structured PR that correctly removes pickle support from the dataset pipeline while preserving a carefully guarded conversion path for the upstream OpenOrca artifact.

tests/integration/test_server_roundtrip.py

viraatc

Review Council — Supplementary Findings

4 additional issues found by the Claude agent (completing the thorough review):

0 critical
3 medium
1 low

Note: open_orca/__init__.py:142 has a dead except subprocess.CalledProcessError clause (unreachable because check=False), but the line is outside the diff so it cannot be posted inline.

src/inference_endpoint/dataset_manager/predefined/open_orca/__init__.py

docs/CLI_QUICK_REFERENCE.md

viraatc · 2026-04-01T20:59:16Z

Review Council — Codex Addendum

Codex review completed (run separately due to sandbox restrictions). Codex found 2 issues:

#	Priority	File	Lines	Summary
1	P1	`dataset.py`	49-56	Removing `DatasetFormat.PICKLE` breaks all existing `.pkl` dataset configs from main
2	P2	`open_orca/__init__.py`	104-112	`force=True` not honored when a cached pickle exists

Cross-reference with Claude findings

P2 (force=True) — already flagged by Claude in the supplementary review on line 108. Both reviewers independently identified this as a bug.
P1 (breaking change) — This is the intended behavior of the PR (removing pickle support). However, both Codex and Claude noted the error message at dataset.py:126 could be more actionable (suggest JSONL conversion instead of just "Unsupported file extension: .pkl").

Updated severity summary (all reviewers combined)

Severity	Count	Issues
Medium	3	`force=True` bug (Both), temp file cleanup (Claude), format dots in docs (Claude)
Low	3	Stale "PickleReader" docstring (Claude), `format:` vs `format=` docs (Claude), actionable `.pkl` error msg (Both)

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 27 out of 31 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 27 out of 31 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

src/inference_endpoint/dataset_manager/predefined/open_orca/init.py:138

This code downloads a shell script from GitHub and (later) executes it. Even with a pinned commit hash, executing remote code at runtime is a high-risk supply-chain vector; consider vendoring the script, or verifying its integrity (e.g., hardcoded SHA-256) before execution, or replacing it with a native downloader implementation.

        COMMIT_HASH = "27da4421877f2831eeb615b43ee5098c4b70be7e"
        downloader_url = f"https://raw.githubusercontent.com/mlcommons/r2-downloader/{COMMIT_HASH}/mlc-r2-downloader.sh"
        download_dir = cache_dir
        script_path = cache_dir / "mlc-r2-downloader.sh"
        r = requests.get(downloader_url, timeout=30)
        r.raise_for_status()
        script_path.write_bytes(r.content)
        script_path.chmod(0o755)

tests/integration/test_server_roundtrip.py:33

The docstring still says this test validates the “PickleReader”, but the fixture and dataset path were renamed to JSONL. Please update the wording so the test description matches the new dataset reader behavior.

    """
    Test the PickleReader by performing a roundtrip request through a mock HTTP Oracle server.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/unit/dataset_manager/test_data_loader.py

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 29 out of 33 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/integration/test_server_roundtrip.py

Remove pkl

6ec5c4f

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv requested a review from a team as a code owner April 1, 2026 02:08

Copilot AI review requested due to automatic review settings April 1, 2026 02:08

github-actions bot requested a review from nvzhihanj April 1, 2026 02:08

Copilot started reviewing on behalf of arekay-nv April 1, 2026 02:08 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

docs/CLI_QUICK_REFERENCE.md Outdated Show resolved Hide resolved

src/inference_endpoint/dataset_manager/dataset.py Outdated Show resolved Hide resolved

src/inference_endpoint/dataset_manager/factory.py Outdated Show resolved Hide resolved

Address comments

dee28e9

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv requested a review from Copilot April 1, 2026 02:43

Copilot started reviewing on behalf of arekay-nv April 1, 2026 02:44 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

src/inference_endpoint/dataset_manager/predefined/open_orca/__init__.py Show resolved Hide resolved

src/inference_endpoint/dataset_manager/factory.py Outdated Show resolved Hide resolved

scripts/create_dummy_dataset.py Show resolved Hide resolved

Fixes.

105e31e

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

src/inference_endpoint/dataset_manager/predefined/open_orca/__init__.py Show resolved Hide resolved

Add sha check

33eea9e

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 1, 2026 17:32

Copilot started reviewing on behalf of arekay-nv April 1, 2026 17:33 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

src/inference_endpoint/dataset_manager/predefined/open_orca/__init__.py Outdated Show resolved Hide resolved

Cleanup

65408d1

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv requested a review from Copilot April 1, 2026 17:51

Copilot started reviewing on behalf of arekay-nv April 1, 2026 17:52 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

nv-alicheng approved these changes Apr 1, 2026

View reviewed changes

Address comments

040a896

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

viraatc approved these changes Apr 1, 2026

View reviewed changes

viraatc reviewed Apr 1, 2026

View reviewed changes

tests/integration/test_server_roundtrip.py Show resolved Hide resolved

viraatc reviewed Apr 1, 2026

View reviewed changes

Cleanup pickle decompress

7af3831

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 1, 2026 21:24

Copilot started reviewing on behalf of arekay-nv April 1, 2026 21:25 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

arekay-nv added 2 commits April 1, 2026 17:10

More fixes.

2c95d11

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Merge branch 'main' into arekay/remove_pickle_support

a44c647

Copilot AI review requested due to automatic review settings April 1, 2026 22:13

Copilot started reviewing on behalf of arekay-nv April 1, 2026 22:13 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

tests/unit/dataset_manager/test_data_loader.py Show resolved Hide resolved

arekay-nv added 2 commits April 1, 2026 17:21

Fix dep.

331fea3

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Remove dup header

bb7c0b1

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 1, 2026 22:27

Copilot started reviewing on behalf of arekay-nv April 1, 2026 22:27 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

tests/integration/test_server_roundtrip.py Show resolved Hide resolved

arekay-nv merged commit c00652d into main Apr 1, 2026
10 checks passed

arekay-nv deleted the arekay/remove_pickle_support branch April 1, 2026 22:34

github-actions bot locked and limited conversation to collaborators Apr 1, 2026

Conversation

arekay-nv commented Apr 1, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arekay-nv commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viraatc left a comment

Choose a reason for hiding this comment

Review Council — Multi-AI Code Review

Additional note: Testing gap

What was already well-covered by existing reviewers

Uh oh!

Uh oh!

viraatc left a comment

Choose a reason for hiding this comment

Review Council — Supplementary Findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viraatc commented Apr 1, 2026

Review Council — Codex Addendum

Cross-reference with Claude findings

Updated severity summary (all reviewers combined)

github-actions bot commented Apr 1, 2026 •

edited

Loading