Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update task summary #21067

Merged
merged 14 commits into from
Feb 2, 2023
Merged

Update task summary #21067

merged 14 commits into from
Feb 2, 2023

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented Jan 9, 2023

This is the second part of updating the task summary to be more conceptual. After a brief introduction and background to the tasks Transformers can solve in part 1, this PR is a bit more advanced and digs deeper into explaining how Transformer solves these tasks.

To-do:

  • Add computer vision section
  • Add NLP section

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 9, 2023

The documentation is not available anymore as the PR was closed or merged.

@stevhliu
Copy link
Member Author

Ok, I'm finally finished with the first draft (took a bit longer to learn some models I wasn't familiar with)! I'd appreciate a general review of the scope of this page to make sure we're aligned (ie, are some sections too in-depth, are some not explained well enough?). Thanks in advance @sgugger @MKhalusova ! 🥹

Afterward, I'll ping one of our audio and computer vision experts for a more in-depth review of those sections 🙂

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure looks good to me, thanks for adding this tutorial. It would indeed be valuable to have a vision expert and an audio expert go through the tutorial.

docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Show resolved Hide resolved
Copy link
Contributor

@MKhalusova MKhalusova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive effort and will likely become a super useful doc once it's merged! Thank you for writing it! I feel like we should set expectations for the reader expertise level for all the sections and provide links to resources where they can learn the basics (just like the link to the course). Most sections require some familiarity with the subject, and it is common for folks to have expertise in some modality but not in all of them.

docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
@stevhliu
Copy link
Member Author

Thanks for the feedback, I added some images to go along with the text!

@NielsRogge, would you mind reviewing the computer vision section? This guide is a high-level overview, and the goal is to help users understand how a certain task is solved by a model. Please feel free to let me know if it's too detailed, not detailed enough, or if I got something wrong! Also, if you know of a good beginner's resource for computer vision we can link to, that'd be great as well to set expectations for the reader. Thanks! 👍

@sanchit-gandhi, if you could do the same with the audio section, that'd be awesome. Thank you! 👍

Copy link
Contributor

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Would maybe just go one level higher on the pre-training details for W2V2 (as these are pretty conceptually difficult 😅)

Also think it's fine to expand a bit on how the classification heads work. This is the key point here so maybe doubling down and making sure this is very clear and precise in how we describe it.

Also our convention has been to write the model as Wav2Vec2, which differs from facebook's wav2vec 2.0 and the wav2vec2 naming used here! Perhaps we could stay consistent with our current docs and update to Wav2Vec2?

docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
@stevhliu stevhliu changed the title [WIP] Update task summary Update task summary Jan 30, 2023
@stevhliu stevhliu marked this pull request as ready for review January 30, 2023 16:47
Copy link
Contributor

@MKhalusova MKhalusova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Kudos for undertaking this effort!

docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
docs/source/en/tasks_explained.mdx Outdated Show resolved Hide resolved
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this. In a follow-up PR it would be great to add a section on multimodal models, to explain how CLIP works for instance.

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the write-up!

@stevhliu stevhliu merged commit fbee829 into huggingface:main Feb 2, 2023
@stevhliu stevhliu deleted the update-task-summary-2 branch February 2, 2023 19:41
Shubhamai pushed a commit to Shubhamai/transformers that referenced this pull request Feb 6, 2023
* first draft of audio section

* make style

* first draft of computer vision section

* add convnext and encoder tasks

* finish up nlp tasks

* minor edits

* add arch images, more edits

* fix image links

* apply sanchit feedback

* model naming convention

* apply niels vit feedback

* replace detr for segmentation with mask2former

* apply feedback

* apply feedback
miyu386 pushed a commit to miyu386/transformers that referenced this pull request Feb 9, 2023
* first draft of audio section

* make style

* first draft of computer vision section

* add convnext and encoder tasks

* finish up nlp tasks

* minor edits

* add arch images, more edits

* fix image links

* apply sanchit feedback

* model naming convention

* apply niels vit feedback

* replace detr for segmentation with mask2former

* apply feedback

* apply feedback
ArthurZucker pushed a commit to ArthurZucker/transformers that referenced this pull request Mar 2, 2023
* first draft of audio section

* make style

* first draft of computer vision section

* add convnext and encoder tasks

* finish up nlp tasks

* minor edits

* add arch images, more edits

* fix image links

* apply sanchit feedback

* model naming convention

* apply niels vit feedback

* replace detr for segmentation with mask2former

* apply feedback

* apply feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants