Packing in DPOTrainer #2469

zhc7 · 2024-12-13T07:00:16Z

Feature request

packing can be supported in dpo trainer.

Motivation

sequences with high variation in length padded together will waste a lot of resources. issue #1274 mentioned this, but i don't think the conclusion is correct. packing won't conflict with pairwise datasets. you just need to unpack the sequence after forwarding. you can easily identify the start and the end of each sequence by the position of 0 in position_ids.

Your contribution

actually i already have a version of dpo trainer that can deal with packing:

    def _forward_one(self, model: nn.Module, batch: dict, name: str):
        batch = dict(batch)
        targets = batch.pop("targets")
        loss_masks = batch.pop("loss_masks").to(torch.bool)
        out = model(**batch)
        logits = out["logits"]
        targets = targets.clone()
        targets[~loss_masks] = 0  # dummy token; we'll ignore the losses on these tokens later
        per_token_logps = torch.gather(logits.log_softmax(-1), dim=2, index=targets.unsqueeze(2)).squeeze(2)
        per_token_logps[~loss_masks] = 0
        # unpack logp
        assert batch["position_ids"].shape[0] == 1
        starts = (batch["position_ids"] == 0).nonzero()[:, 1]
        seqs = []
        for i in range(len(starts) - 1):
            seqs.append(per_token_logps[0, starts[i]: starts[i + 1]].sum())
        seqs.append(per_token_logps[0, starts[-1]:].sum())
        all_logps = torch.stack(seqs)
        return {
            f"{name}_logps": all_logps,
            f"mean_{name}_logits": logits[loss_masks].mean(),
        }

    def concatenated_forward(self, model: nn.Module, batch: dict):
        ret = {}
        for name in ("chosen", "rejected"):
            ret = {**ret, **self._forward_one(model, batch[name], name)}
        return ret

this does not directly fit into the complex logic of dpo trainer, but the idea is that it is possible to do packing.

The text was updated successfully, but these errors were encountered:

qgallouedec · 2024-12-13T16:44:51Z

What you're describing sounds closer to padding-free than packing. We have a (currently draft) PR for this: #2437.
Can you confirm that's it is what you're describing?

At this point I'm not even sure that packing for DPO makes sense. How to ensure that you've as many chosen than rejected? How to ensure they match? How to handle partial sequences?

zhc7 · 2024-12-13T17:16:15Z

Hi, thank you for your response. I looked into the link you provided. I think we are talking about the same thing. I used the word "packing" from https://huggingface.co/blog/packing-with-FA2. The "packing" here actually means concatenating a fixed batch size of samples into one sequence, and use position_ids to mark the boundaries, rather than packing to a fixed length. So there won't be the problems you mentioned. I've also briefly read https://huggingface.co/blog/mayank-mishra/padding-free-transformer this blog, I think the ideas are the same. But I'm not sure how the latter is implemented. Maybe they are the same thing just with different names:)

I breifly went through the pr, I see it is trying to add position_ids in the whole process, so I guess we are talking about the same thing.

qgallouedec added ✨ enhancement New feature or request 🏋 DPO Related to DPO labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packing in DPOTrainer #2469

Packing in DPOTrainer #2469

zhc7 commented Dec 13, 2024

qgallouedec commented Dec 13, 2024 •

edited

Loading

zhc7 commented Dec 13, 2024

Packing in DPOTrainer #2469

Packing in DPOTrainer #2469

Comments

zhc7 commented Dec 13, 2024

Feature request

Motivation

Your contribution

qgallouedec commented Dec 13, 2024 • edited Loading

zhc7 commented Dec 13, 2024

qgallouedec commented Dec 13, 2024 •

edited

Loading