Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add (M)Luke model training for Token Classification in the examples #14880

Merged
merged 24 commits into from
Jan 31, 2022

Conversation

jplu
Copy link
Contributor

@jplu jplu commented Dec 22, 2021

What does this PR do?

This PR adds the possibility to train the (M)Luke model for a Token Classification task with the accelerate package. It also adds a tiny update to give the possibility to train over multiple datasets configuration, for example being able to concatenate multiple languages of the XTREME PAN-X dataset and train the model over it.

One can easily test with the command:

accelerate launch run_ner_no_trainer.py --model_name_or_path studio-ousia/mluke-base --dataset_name xtreme --dataset_config_name PAN-X.fr,PAN-X.en --output_dir /data/mluke/ --task_name ner --return_entity_level_metrics

/cc @sgugger @LysandreJik

@sgugger
Copy link
Collaborator

sgugger commented Dec 22, 2021

Hi there, thanks a lot for you PR! It will be great to be able to fully use LUKE and mLUKE for token classification!

Now the issue is that we try to keep each example pretty simple so that users can easily tweak and customize them. Adding this in the run_ner_no_trainer example adds a lot of complexity so how about we make this into its own example instead? It could go into a research project of its own.
Same for the data collator. It's very specific to LUKE so it should be added in a module file in the same folder as the example instead of adding this to the main lib directly, I think.

@jplu
Copy link
Contributor Author

jplu commented Dec 22, 2021

Sure! I will rethink how to properly reorder the things and will let you know once pushed!

@NielsRogge
Copy link
Contributor

Awesome, thanks a lot for adding this! Also cc'ing the original authors, @ikuyamada @Ryou0634.

I completely agree with Sylvain here, adding it to the existing run_ner.py script would make it confusing for people that want to leverage BERT/RoBERTa/etc.

@jplu
Copy link
Contributor Author

jplu commented Dec 23, 2021

@sgugger I moved everything into a dedicated folder that is only focus on Luke.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@NielsRogge
Copy link
Contributor

Hi, let's finish this PR and merge it!

@jplu are you able to rebase with master?

@LysandreJik
Copy link
Member

Hey @jplu, thanks for your PR!

I'd just move it to examples/research_projects to have a specific LUKE example, vs having a luke folder in the examples folder :)

@jplu
Copy link
Contributor Author

jplu commented Jan 25, 2022

Sorry for the late reply. I will update accordingly to what you asked ASAP.

@jplu
Copy link
Contributor Author

jplu commented Jan 26, 2022

Ok, done on my side!

Comment on lines 56 to 58
# You should update this to your particular problem to have better documentation of `model_type`
MODEL_CONFIG_CLASSES = list(MODEL_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# You should update this to your particular problem to have better documentation of `model_type`
MODEL_CONFIG_CLASSES = list(MODEL_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)
# You should update this to your particular problem to have better documentation of `model_type`
MODEL_CONFIG_CLASSES = list(MODEL_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)

To be removed (see comments below).

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a README that briefly explains what the script is about and how you can run a basic version of it?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for reworking your PR! I've left a couple of suggestions :-)

examples/research_projects/luke/run_luke_ner_no_trainer.py Outdated Show resolved Hide resolved
examples/research_projects/luke/run_luke_ner_no_trainer.py Outdated Show resolved Hide resolved
src/transformers/models/auto/modeling_auto.py Outdated Show resolved Hide resolved
src/transformers/models/luke/modeling_luke.py Outdated Show resolved Hide resolved
@jplu
Copy link
Contributor Author

jplu commented Jan 31, 2022

Done! Let me know if something is missing.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for amending the PR in this direction! :-)

@sgugger sgugger merged commit aa19f47 into huggingface:master Jan 31, 2022
@jplu jplu deleted the mluke-training branch June 13, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants