-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add (M)Luke model training for Token Classification in the examples #14880
Conversation
Hi there, thanks a lot for you PR! It will be great to be able to fully use LUKE and mLUKE for token classification! Now the issue is that we try to keep each example pretty simple so that users can easily tweak and customize them. Adding this in the |
Sure! I will rethink how to properly reorder the things and will let you know once pushed! |
Awesome, thanks a lot for adding this! Also cc'ing the original authors, @ikuyamada @Ryou0634. I completely agree with Sylvain here, adding it to the existing |
@sgugger I moved everything into a dedicated folder that is only focus on Luke. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi, let's finish this PR and merge it! @jplu are you able to rebase with master? |
Hey @jplu, thanks for your PR! I'd just move it to |
Sorry for the late reply. I will update accordingly to what you asked ASAP. |
Ok, done on my side! |
# You should update this to your particular problem to have better documentation of `model_type` | ||
MODEL_CONFIG_CLASSES = list(MODEL_MAPPING.keys()) | ||
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# You should update this to your particular problem to have better documentation of `model_type` | |
MODEL_CONFIG_CLASSES = list(MODEL_MAPPING.keys()) | |
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES) | |
# You should update this to your particular problem to have better documentation of `model_type` | |
MODEL_CONFIG_CLASSES = list(MODEL_MAPPING.keys()) | |
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES) |
To be removed (see comments below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a README that briefly explains what the script is about and how you can run a basic version of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for reworking your PR! I've left a couple of suggestions :-)
Done! Let me know if something is missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for amending the PR in this direction! :-)
What does this PR do?
This PR adds the possibility to train the (M)Luke model for a Token Classification task with the accelerate package. It also adds a tiny update to give the possibility to train over multiple datasets configuration, for example being able to concatenate multiple languages of the XTREME PAN-X dataset and train the model over it.
One can easily test with the command:
/cc @sgugger @LysandreJik