Make sure that dataset_transformation tests actually run in CI #1116

finbarrtimbers · 2025-10-29T17:11:47Z

Previously, they only ran when someone manually ran them.

Note

Runs dataset_transformation unit tests in CI with a new test suite and updated workflow, and factors dataset config construction into a reusable helper.

CI:
- Use 8-Core-XL-Runner-Ubuntu-Latest and pass HF_TOKEN when running pytest in .github/workflows/tests.yml.
Tests:
- Add open_instruct/test_dataset_transformation.py with unit tests for tokenizer equality, config-hash differences, and cached dataset parity (using small splits and temp caches).
Refactor:
- Extract load_dataset_configs(...) and reuse it in get_cached_dataset_tulu_with_statistics(...).
- Remove embedded test code and __main__ block from open_instruct/dataset_transformation.py.

^{Written by Cursor Bugbot for commit 4c7ef47. This will update automatically on new commits. Configure here.}

Added detailed logging throughout the vLLM generation pipeline to diagnose why the benchmark script hangs: - Log prompt submission and queue sizes in benchmark_generators.py - Log actor_manager should_stop status and engine ready status - Log _prefetch_worker iterations and request processing in vllm_utils.py - Log async task creation, completion accumulation, and result queue operations This will help identify where in the pipeline the hang occurs (prompt queue, async tasks, completion queue, or results queue). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added detailed logging at critical points in the vLLM engine initialization pipeline to diagnose where the script hangs: - benchmark_generators.py: Log ActorManager creation and vLLM engine setup - vllm_utils.py create_vllm_engines: Log engine creation loop progress - LLMRayActor.__init__: Log each initialization step - _setup_and_start_async_engine: Log thread creation and startup sequence This will help identify whether the hang occurs during Ray actor creation, actor initialization, or vLLM AsyncLLMEngine startup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The script was hanging because two placement groups were being created: 1. One in setup_vllm_engines() that allocated the GPU 2. Another in create_vllm_engines() that waited forever for the same GPU The placement group created in setup_vllm_engines was only passed to create_vllm_engines when single_gpu_mode=True, causing create_vllm_engines to create a second placement group when single_gpu_mode=False. Fix: Always pass the placement group to create_vllm_engines, preventing the duplicate allocation attempt. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request effectively refactors the testing for dataset_transformation.py by moving ad-hoc tests into a proper unittest suite, which is a great improvement for CI and maintainability. The changes also include significant enhancements to the benchmarking scripts, making them more configurable and robust. The updates to benchmark_generators.py and the new shell scripts for launching benchmarks are well-structured. I've found one minor semantic issue in how epoch_number is being set, which I've commented on. Overall, these are solid improvements to the project's testing and benchmarking infrastructure.

hamishivi

Minor nit on naming but otherwise g2g!

open_instruct/dataset_transformation.py

Co-authored-by: Hamish Ivison <hamishivi@gmail.com>

finbarrtimbers and others added 30 commits October 21, 2025 09:47

Added launch benchmark scripts

3933eee

Fixed priority

c08a8e6

Updated scripts to remove max_token_length

ba6f4cb

Renamed filter from rlvr_filter_v1 to rlvr_max_length_filter_v1

1a29847

Updated code

468c338

Updated script to use stego.

0460bdb

Updated script

25ef2b5

updated script to add tokenizer

5fdde66

Fixed tokenizer flag

9cb2cc8

added more clusters

ec749d7

Added titan

b7dc322

Updated code

be960e9

Updated scripts

a614f83

Gather whole model False

c106989

Now we use saturn

766d347

Now, we let the engines take 10 minutes to init.

43a34cd

Updated code

5edc7d9

Updated script.

0b33b46

Updated code

52c9454

Updated benchmark

07adbf4

updated script

236233e

Updated script

d30d5e6

Updated code

40f01dc

Added logs

185ff20

way more logs

2a17375

set enforce eager in benchmark script

c92cd48

Update code

ac654d6

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

Merge branch 'main' into derisk-32b

7360bd7

This comment was marked as outdated.

Sign in to view

finbarrtimbers added 6 commits October 29, 2025 21:49

Merge branch 'main' into derisk-32b

8c2b7d5

Update gold dataset

d9019f3

tests pass

79d2be3

uses a smaller split when no cache

fe34be3

Merge branch 'main' into derisk-32b

8316284

Now, linter passes

4bb52bc

This comment was marked as outdated.

Sign in to view

Tests pass locally.

81e0380

finbarrtimbers enabled auto-merge October 30, 2025 21:15

Update code to clean up

fed07b3

This comment was marked as outdated.

Sign in to view

finbarrtimbers added 2 commits October 30, 2025 15:41

Added image pruning

b9922d2

use larger runner

f3ee076

This comment was marked as outdated.

Sign in to view

finbarrtimbers added 3 commits October 31, 2025 09:06

Added hf token as an env variable

aca05ac

Added asserts.

83c8c93

Merge branch 'main' into derisk-32b

ab2081a

hamishivi approved these changes Nov 3, 2025

View reviewed changes

open_instruct/dataset_transformation.py Outdated Show resolved Hide resolved

open_instruct/dataset_transformation.py Outdated Show resolved Hide resolved

finbarrtimbers added this pull request to the merge queue Nov 3, 2025

hamishivi removed this pull request from the merge queue due to a manual request Nov 3, 2025

finbarrtimbers and others added 2 commits November 3, 2025 10:05

Update open_instruct/dataset_transformation.py

8604675

Co-authored-by: Hamish Ivison <hamishivi@gmail.com>

Update open_instruct/dataset_transformation.py

0418fb7

Co-authored-by: Hamish Ivison <hamishivi@gmail.com>

finbarrtimbers enabled auto-merge November 3, 2025 17:05

Merge branch 'main' into derisk-32b

4c7ef47

finbarrtimbers added this pull request to the merge queue Nov 3, 2025

Merged via the queue into main with commit 456d6a1 Nov 3, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make sure that dataset_transformation tests actually run in CI #1116

Make sure that dataset_transformation tests actually run in CI #1116

Uh oh!

finbarrtimbers commented Oct 29, 2025 •

edited by cursor bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

hamishivi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make sure that dataset_transformation tests actually run in CI #1116

Make sure that dataset_transformation tests actually run in CI #1116

Uh oh!

Conversation

finbarrtimbers commented Oct 29, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

finbarrtimbers commented Oct 29, 2025 •

edited by cursor bot

Loading