Add magic method to our TF models to convert datasets with column inference #17160

Rocketknight1 · 2022-05-10T14:09:02Z

Left to do:

Add docstring
Figure out what type to set for dataset (since datasets isn't imported)
Set default values for batch_size / shuffle?
Do we need to document this anywhere besides the docstring?
Anything else I forgot? (Reviewers please yell at me)

HuggingFaceDocBuilderDev · 2022-05-10T14:24:59Z

The documentation is not available anymore as the PR was closed or merged.

Rocketknight1 · 2022-05-17T16:58:54Z

Hey all, this is the method in transformers that I moved all the column inference code to! I also need some advice:

How do we handle an input type of Dataset when datasets may not be installed?
Should this be moved to a utility function that takes model as an argument rather than a method on TFPreTrainedModel?
Should I set default values for any arguments to reduce the amount of typing users have to do?
How should I make sure users find out about this (besides revamping the examples and notebooks once it's merged)?

gante

In general looks good -- the biggest question to settle is the import requirement of datasets

src/transformers/modeling_tf_utils.py

tests/test_modeling_tf_common.py

gante · 2022-05-19T10:52:15Z

Should this be moved to a utility function that takes model as an argument rather than a method on TFPreTrainedModel?

All our models inherit from TFPreTrainedModel, so it should be fine

Should I set default values for any arguments to reduce the amount of typing users have to do?

I'd keep the same defaults as in datasets, for a consistent experience. But I'm not the most experienced in this domain :D

How should I make sure users find out about this (besides revamping the examples and notebooks once it's merged)?

We are getting to the point where we have a lot of content to announce (these changes, the metrics working correctly, generate and its updates, new models, ...), maybe we can start a once-a-week TF communication of some sort!

sgugger

Thanks for working on this! Let's polish the type annotations/imports and it should be good to go.

src/transformers/dependency_versions_table.py

src/transformers/modeling_tf_utils.py

…any you work for

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

LysandreJik

Looks quite cool!

Rocketknight1 · 2022-05-23T17:28:49Z

Quick update: I think this is ready to merge, but I've only really tested it with the updated to_tf_dataset() method in datasets, which hasn't been merged yet (but is due very soon!). As such, I don't want to merge it until that's in, because there could be edge case issues with the old method that I haven't seen.

Rocketknight1 · 2022-06-06T14:53:36Z

I've merged the to_tf_dataset update so I'm going to merge this one too - though I think it will be a silent 'soft launch' until there's a new release of datasets, to avoid any unforeseen problems. Since this code only adds the new method, it shouldn't disrupt any existing workflows before it's ready to be used.

…erence (huggingface#17160) * Add method to call to_tf_dataset() with column inference * Add test for dataset creation * Add a default arg for data collator * Fix test * Fix call with non-dev version of datasets * Test correct column removal too * make fixup * More tests to make sure we remove unwanted columns * Fix test to avoid predicting on unbuilt models * Fix test to avoid predicting on unbuilt models * Fix test to remove unwanted head mask columns from inputs * Stop pushing your debug breakpoints to the main repo of the $2bn company you work for * Skip the test in convnext because no grouped conv support * Drop bools from the dataset dict * Make style * Skip the training test for models whose input dicts don't give us labels * Skip transformerXL in the test because it doesn't return a simple loss * Skip TFTapas because of some odd NaN losses * make style * make fixup * Add docstring * fixup * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove breakpoint from tests * Fix assert, add requires_backends * Protect tokenizer import with if TYPE_CHECKING * make fixup * Add noqa, more fixup * More rearranging for ~* aesthetics *~ * Adding defaults for shuffle and batch_size to match to_tf_dataset() * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Rocketknight1 force-pushed the magic_tf_dataset_method branch from f76056b to 037e848 Compare May 12, 2022 15:12

Rocketknight1 marked this pull request as ready for review May 12, 2022 17:34

Rocketknight1 requested review from sgugger, gante and LysandreJik May 17, 2022 16:56

gante reviewed May 19, 2022

View reviewed changes

src/transformers/modeling_tf_utils.py Show resolved Hide resolved

src/transformers/modeling_tf_utils.py Outdated Show resolved Hide resolved

tests/test_modeling_tf_common.py Outdated Show resolved Hide resolved

sgugger reviewed May 19, 2022

View reviewed changes

Rocketknight1 added 20 commits May 19, 2022 14:31

Add method to call to_tf_dataset() with column inference

1ccd726

Add test for dataset creation

2c71f84

Add a default arg for data collator

919cf82

Fix test

0066598

Fix call with non-dev version of datasets

b40fa6e

Test correct column removal too

f5f667d

make fixup

258392b

More tests to make sure we remove unwanted columns

ae4be4a

Fix test to avoid predicting on unbuilt models

673e23d

Fix test to avoid predicting on unbuilt models

0ee6e1d

Fix test to remove unwanted head mask columns from inputs

2313b3a

Stop pushing your debug breakpoints to the main repo of the $2bn comp…

221ae78

…any you work for

Skip the test in convnext because no grouped conv support

1506182

Drop bools from the dataset dict

a9010b1

Make style

0a29747

Skip the training test for models whose input dicts don't give us labels

5b35ff4

Skip transformerXL in the test because it doesn't return a simple loss

a1b6e92

Skip TFTapas because of some odd NaN losses

33812ea

make style

fb19ea9

make fixup

a7f6a85

Rocketknight1 and others added 9 commits May 19, 2022 14:31

Add docstring

0787a45

fixup

24c0a66

Update src/transformers/modeling_tf_utils.py

fc89e11

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/modeling_tf_utils.py

0d4553d

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/modeling_tf_utils.py

0311172

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/modeling_tf_utils.py

a7b1f60

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/modeling_tf_utils.py

f01f8c2

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Remove breakpoint from tests

24cc6a6

Fix assert, add requires_backends

e381daf

Rocketknight1 force-pushed the magic_tf_dataset_method branch from e3661b4 to e381daf Compare May 19, 2022 13:32

Rocketknight1 added 5 commits May 19, 2022 14:43

Protect tokenizer import with if TYPE_CHECKING

0f473db

make fixup

753c0c5

Add noqa, more fixup

b23efa1

More rearranging for ~* aesthetics *~

163b4f7

Adding defaults for shuffle and batch_size to match to_tf_dataset()

e29f98f

sgugger reviewed May 19, 2022

View reviewed changes

src/transformers/modeling_tf_utils.py Outdated Show resolved Hide resolved

Update src/transformers/modeling_tf_utils.py

726cb39

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

gante approved these changes May 19, 2022

View reviewed changes

sgugger approved these changes May 19, 2022

View reviewed changes

LysandreJik approved these changes May 19, 2022

View reviewed changes

Rocketknight1 merged commit 19a8a30 into main Jun 6, 2022

Rocketknight1 deleted the magic_tf_dataset_method branch June 6, 2022 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add magic method to our TF models to convert datasets with column inference #17160

Add magic method to our TF models to convert datasets with column inference #17160

Uh oh!

Rocketknight1 commented May 10, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 10, 2022 •

edited

Loading

Uh oh!

Rocketknight1 commented May 17, 2022 •

edited

Loading

Uh oh!

gante left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gante commented May 19, 2022

Uh oh!

sgugger left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Uh oh!

Rocketknight1 commented May 23, 2022

Uh oh!

Rocketknight1 commented Jun 6, 2022

Uh oh!

Uh oh!

Add magic method to our TF models to convert datasets with column inference #17160

Add magic method to our TF models to convert datasets with column inference #17160

Uh oh!

Conversation

Rocketknight1 commented May 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gante commented May 19, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 commented May 23, 2022

Uh oh!

Rocketknight1 commented Jun 6, 2022

Uh oh!

Uh oh!

Rocketknight1 commented May 10, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented May 10, 2022 •

edited

Loading

Rocketknight1 commented May 17, 2022 •

edited

Loading