Skip to content

[Bug] Examples in a GRPO training batch are identical #1542

Open
@wizeng23

Description

@wizeng23

What happened?

When printing out a batch of examples during GRPO training, the prompts seem to always be identical.

"prompts": [
    [{"content": "Show me the count of 'h' in 'rhizotic'.", "role": "user"}],
    [{"content": "Show me the count of 'h' in 'rhizotic'.", "role": "user"}],
    [{"content": "Show me the count of 'h' in 'rhizotic'.", "role": "user"}],
]

Steps to reproduce the bug

  1. Modify _count_letters() to raise ValueError with its arguments
  2. Run configs/examples/grpo_letter_counting/gcp_job.yaml with an editable install

System Info

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   Oumi environment information:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

┌────────────────┬────────────────────────────┐
│ Oumi version   │ 0.1.9.dev1+gbc574954       │
│ Python version │ 3.11.11                    │
│ Platform       │ macOS-14.5-arm64-arm-64bit │
└────────────────┴────────────────────────────┘

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   Installed dependencies:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ PACKAGE          ┃ VERSION         ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ accelerate       │ 1.2.1           │
│ aiohttp          │ 3.11.13         │
│ bitsandbytes     │ <not installed> │
│ datasets         │ 3.2.0           │
│ diffusers        │ <not installed> │
│ einops           │ 0.8.0           │
│ jsonlines        │ 4.0.0           │
│ liger-kernel     │ <not installed> │
│ llama-cpp-python │ 0.3.6           │
│ lm-eval          │ 0.4.7           │
│ numpy            │ 1.26.4          │
│ nvidia-ml-py     │ 12.560.30       │
│ omegaconf        │ 2.4.0.dev3      │
│ open_clip_torch  │ <not installed> │
│ pandas           │ 2.2.3           │
│ peft             │ 0.14.0          │
│ pexpect          │ 4.8.0           │
│ pillow           │ 10.3.0          │
│ pydantic         │ 2.9.2           │
│ responses        │ 0.25.6          │
│ sglang           │ <not installed> │
│ skypilot         │ 0.7.0           │
│ tensorboard      │ 2.18.0          │
│ timm             │ <not installed> │
│ torch            │ 2.5.1           │
│ torchdata        │ 0.9.0           │
│ torchvision      │ 0.20.1          │
│ tqdm             │ 4.67.1          │
│ transformers     │ 4.48.3          │
│ trl              │ 0.14.0          │
│ typer            │ 0.15.2          │
│ vllm             │ <not installed> │
│ wandb            │ 0.19.8          │
└──────────────────┴─────────────────┘

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   Environment variables:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ VARIABLE                        ┃ VALUE     ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ ACCELERATE_DYNAMO_BACKEND       │ <not set> │
│ ACCELERATE_DYNAMO_MODE          │ <not set> │
│ ACCELERATE_DYNAMO_USE_DYNAMIC   │ <not set> │
│ ACCELERATE_DYNAMO_USE_FULLGRAPH │ <not set> │
│ ACCELERATE_USE_FSDP             │ <not set> │
│ CUDA_VISIBLE_DEVICES            │ <not set> │
│ LOCAL_RANK                      │ <not set> │
│ LOCAL_WORLD_SIZE                │ <not set> │
│ OUMI_EXTRA_DEPS_FILE            │ <not set> │
│ OUMI_SLURM_CONNECTIONS          │ <not set> │
│ OUMI_USE_SPOT_VM                │ spot      │
│ RANK                            │ <not set> │
│ WORLD_SIZE                      │ <not set> │
└─────────────────────────────────┴───────────┘

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions