generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
🏋 GRPORelated to GRPORelated to GRPO
Description
CI fails for slow tests: https://github.com/huggingface/trl/actions/runs/18836157728/job/53737521966
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
FAILED tests/slow/test_grpo_slow.py::TestGRPOTrainerSlow::test_training_with_transformers_paged[trl-internal-testing/tiny-LlamaForCausalLM-3.2] - RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
FAILED tests/slow/test_grpo_slow.py::TestGRPOTrainerSlow::test_training_with_transformers_paged[trl-internal-testing/tiny-MistralForCausalLM-0.2] - RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.Stacktrace:
> trainer.train()
tests/slow/test_grpo_slow.py:199:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.11/site-packages/transformers/trainer.py:2325: in train
return inner_training_loop(
.venv/lib/python3.11/site-packages/transformers/trainer.py:2674: in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.11/site-packages/transformers/trainer.py:4014: in training_step
inputs = self._prepare_inputs(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
trl/extras/profiling.py:98: in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
trl/trainer/grpo_trainer.py:1033: in _prepare_inputs
generation_batch = self._generate_and_score_completions(generation_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
trl/trainer/grpo_trainer.py:1401: in _generate_and_score_completions
self._generate(prompts)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <trl.trainer.grpo_trainer.GRPOTrainer object at 0x7fb5d8a5c4d0>
prompts = ["Although that way may not be obvious at first unless you're", "Although that way may not be obvious at first unless you're", "Although that way may not be obvious at first unless you're"]
def _generate(self, prompts: list):
device = self.accelerator.device
mode = "train" if self.model.training else "eval"
prompt_ids, completion_ids, logprobs, extra_fields = self._generate_single_turn(prompts)
# Get completion length per sequence, used for logging
prompt_lengths = torch.tensor([len(ids) for ids in prompt_ids], device=device)
completion_lengths = torch.tensor([len(ids) for ids in completion_ids], device=device)
agg_prompt_lengths = self.accelerator.gather(prompt_lengths)
agg_completion_lengths = self.accelerator.gather(completion_lengths)
total_prompt_tokens = agg_prompt_lengths.sum()
total_completion_tokens = agg_completion_lengths.sum() # = num_items_in_batch, required for the DAPO loss
# Log the metrics
if mode == "train":
self.state.num_input_tokens_seen += (total_prompt_tokens + total_completion_tokens).item()
self._metrics[mode]["num_tokens"] = [self.state.num_input_tokens_seen]
# Log completion lengths, mean, min, max
self._metrics[mode]["completions/mean_length"].append(agg_completion_lengths.float().mean().item())
> self._metrics[mode]["completions/min_length"].append(agg_completion_lengths.float().min().item())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
trl/trainer/grpo_trainer.py:1359: RuntimeErrorMetadata
Metadata
Assignees
Labels
🏋 GRPORelated to GRPORelated to GRPO