You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sequences with high variation in length padded together will waste a lot of resources. issue #1274 mentioned this, but i don't think the conclusion is correct. packing won't conflict with pairwise datasets. you just need to unpack the sequence after forwarding. you can easily identify the start and the end of each sequence by the position of 0 in position_ids.
Your contribution
actually i already have a version of dpo trainer that can deal with packing:
What you're describing sounds closer to padding-free than packing. We have a (currently draft) PR for this: #2437.
Can you confirm that's it is what you're describing?
At this point I'm not even sure that packing for DPO makes sense. How to ensure that you've as many chosen than rejected? How to ensure they match? How to handle partial sequences?
Hi, thank you for your response. I looked into the link you provided. I think we are talking about the same thing. I used the word "packing" from https://huggingface.co/blog/packing-with-FA2. The "packing" here actually means concatenating a fixed batch size of samples into one sequence, and use position_ids to mark the boundaries, rather than packing to a fixed length. So there won't be the problems you mentioned. I've also briefly read https://huggingface.co/blog/mayank-mishra/padding-free-transformer this blog, I think the ideas are the same. But I'm not sure how the latter is implemented. Maybe they are the same thing just with different names:)
I breifly went through the pr, I see it is trying to add position_ids in the whole process, so I guess we are talking about the same thing.
Feature request
packing can be supported in dpo trainer.
Motivation
sequences with high variation in length padded together will waste a lot of resources. issue #1274 mentioned this, but i don't think the conclusion is correct. packing won't conflict with pairwise datasets. you just need to unpack the sequence after forwarding. you can easily identify the start and the end of each sequence by the position of 0 in
position_ids
.Your contribution
actually i already have a version of dpo trainer that can deal with packing:
this does not directly fit into the complex logic of dpo trainer, but the idea is that it is possible to do packing.
The text was updated successfully, but these errors were encountered: