Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add arg padding_free to DataCollatorForCompletionOnlyLM #1887

Merged
merged 12 commits into from
Aug 26, 2024

Conversation

RhuiDih
Copy link
Contributor

@RhuiDih RhuiDih commented Jul 30, 2024

add argument DataCollatorForCompletionOnlyLM(padding_free=True) to utilize huggingface/transformers#31629

@ArthurZucker
Copy link

cc @kashif this is linked to huggingface/transformers#31629 which will be in the next transformers release!

@kashif
Copy link
Collaborator

kashif commented Jul 31, 2024

thanks @ArthurZucker

@pcuenca
Copy link
Member

pcuenca commented Aug 22, 2024

Gentle ping and info, this post covers the support in transformers: https://huggingface.co/blog/packing-with-FA2

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/utils.py Outdated Show resolved Hide resolved
@kashif kashif marked this pull request as ready for review August 22, 2024 07:05
Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this neat extension to the collator! It LGTM barring the unit tests using the assert methods from unittest

batch = collator(tokenized_instruction)
batch_paddingfree = collator_paddingfree(tokenized_instruction)

assert "attention_mask" not in batch_paddingfree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please replace these assert statements with assert methods from unittest, e.g. self.assertTrue() for this case

batch = collator(tokenized_instruction)
batch_paddingfree = collator_paddingfree(tokenized_instruction)

assert "attention_mask" not in batch_paddingfree
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert "attention_mask" not in batch_paddingfree
self.assertNotIn("attention_mask", batch_paddingfree)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kashif @lewtun thanks for the suggestion, just pushed the update

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
@kashif kashif merged commit fe41acd into huggingface:main Aug 26, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants