Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add community notebook for T5 sentiment span extraction #4700

Merged
merged 1 commit into from
Jun 2, 2020

Conversation

enzoampil
Copy link
Contributor

This is an example notebook that aims to increase the coverage of T5 fine-tuning examples to address #4426 .

This notebook presents a high level overview of T5, its significance for the future of NLP in practice, and a thoroughly commented tutorial on how to fine-tune T5 for sentiment span extraction with an extractive Q&A format.

I recently presented this in a webinar published on youtube.

@codecov-commenter
Copy link

codecov-commenter commented Jun 1, 2020

Codecov Report

Merging #4700 into master will decrease coverage by 1.41%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4700      +/-   ##
==========================================
- Coverage   77.14%   75.72%   -1.42%     
==========================================
  Files         128      128              
  Lines       21070    21070              
==========================================
- Hits        16255    15956     -299     
- Misses       4815     5114     +299     
Impacted Files Coverage Δ
src/transformers/modeling_longformer.py 18.70% <0.00%> (-74.83%) ⬇️
src/transformers/configuration_longformer.py 76.92% <0.00%> (-23.08%) ⬇️
src/transformers/modeling_t5.py 77.18% <0.00%> (-6.35%) ⬇️
src/transformers/file_utils.py 73.44% <0.00%> (-0.42%) ⬇️
src/transformers/modeling_bert.py 86.77% <0.00%> (-0.19%) ⬇️
src/transformers/modeling_utils.py 90.17% <0.00%> (+0.23%) ⬆️
src/transformers/modeling_openai.py 81.78% <0.00%> (+1.37%) ⬆️
src/transformers/modeling_gpt2.py 86.21% <0.00%> (+14.10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0866669...34c9a46. Read the comment docs.

@LysandreJik
Copy link
Member

Nice webinar, and cool notebook :)

@patrickvonplaten, do you want to take a look?

@enzoampil
Copy link
Contributor Author

Thanks @LysandreJik ! 😄

@patrickvonplaten
Copy link
Contributor

Awesome notebook @enzoampil!

LGTM for merge!

Which dataset do you use exactly to fine-tune T5 here?

@patrickvonplaten patrickvonplaten merged commit d3ef14f into huggingface:master Jun 2, 2020
@enzoampil enzoampil deleted the add_t5_example branch June 2, 2020 09:39
@enzoampil
Copy link
Contributor Author

Thanks @patrickvonplaten ! 😄

For the dataset, I got it from an ongoing Kaggle competition called Tweet Sentiment Extraction.

The objective is to extract the span from a tweet that indicates its sentiment

Example input:

sentiment: negative
tweet:  How did we just get paid and still be broke as hell?! No shopping spree for me today

Example output:

broke as hell?!

@enzoampil
Copy link
Contributor Author

I was thinking about contributing this to the nlp library, but I'm not sure if Kaggle has policies regarding uploading their datasets to other public sources ...

@patrickvonplaten
Copy link
Contributor

I see! Yeah no worries - I don't think we currently handle dataset processing from on-going kaggle competition links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants