-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor model summary #21408
Refactor model summary #21408
Conversation
The documentation is not available anymore as the PR was closed or merged. |
I don't think a model summary with sections per modality can work. I'm okay with removing the specific of each model currently in the model summary (make sure they are present in the corresponding model pages however as we don't want to lose anything) but I think we need to have a better structure with h2 sections for modalities than h3 sections for different kinds of models (in NLP encoder/decoder/encoder-decoder, in CV transformer/convnet etc.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence about both the original model summary and the proposed new version. My main issue is that it is not entirely clear to me what audience these documents target and what they aim to achieve.
The original document looks like a look-up reference for some of the models (only the popular ones). It gives a very short TLDR of the model doc for each model and useful links, such as a link to model doc, link to checkpoints, demo on spaces and a link to the original paper.
If I know what model I am interested in, its model doc is much more useful, as it has all of the information.
If I want to learn about the difference between encoders and decoders, the information is in the course.
If I want to compare two different models for the same task, I have to jump up and down in the doc and may learn some differences in how they work internally, but what if I’m interested in other aspects such as benchmarks, size of the model, how recent it is, etc.?
The proposed doc, though it is still called “summary of the models” is actually more like a “history of the models”. This is a great read to understand how the approaches have evolved, but it is no longer a lookup for comparison of models. The navigation is different, and as a reader, I have to read the whole thing to
understand the model evolution.
So my question is, what are we aiming to achieve with this doc?
Are we trying to educate on model evolution?
Are we creating a place where one can compare models on several aspects?
Another potential goal may be to make new models discoverable.
This page ranks as #25 among transformer doc pages by pageviews in google analytics which is relatively high. It may indicate that either folks find it useful, or that it is highly linked. It would be great to understand what users expect from this doc, since they clearly come to it.
Good point, this’ll work better and allow me to include convnets more naturally! Great questions @MKhalusova, let me try and clarify (and also refine the purpose of the doc while doing so)! 🙂
The audience is a beginner or someone who is coming from a different modality (say like, from NLP to CV), and the goal is to provide a high-level conceptual overview of the different model types available in each modality.
For sure, the model docs fulfill the role of providing all the nitty-gritty information. But sometimes, this can be too much detail, and you can't really make connections between models or understand why you should use one over the other because you're lacking context. The model summary doc tries to go up a level and give users an introductory overview instead of all the technical details. If they’re interested in learning more, they can follow the links to the specific model doc page.
The course only has very general information about encoders and decoders. For example, it doesn’t tell you how BERT and DeBERTa are different.
Yeah the structure I have now is not the best! 😅 But I think @sgugger's suggestion will improve this quite a bit, where it’ll be more readable, and related sections will be more localized, so you don’t have to jump around as much. The goal though is not to give users all the technical details about a model (size, performance, etc.).
It can be difficult to approach Transformers when there are so many X-former variants. This doc hopes to provide users with a beginner-friendly guide to them so they can make connections and be like oh wait, this CV model is just like an NLP model, and it's just the input that's different. I think we also want to give more context about the models in terms of design decisions and constraints (e.g., Swin does x, unlike ViT because y). In a nutshell, I suppose it's to give users the bigger picture of the Transformer model landscape and give them a mental framework to categorize and think about Transformer models.
I think we can boost the impact of this doc even more by addressing those issues you raise above. An embedded Space at the top of the doc that lets users discover models based on certain parameters (longer sequences, tasks, memory-efficiency, multilinguality, etc.) would be very useful and guide users toward selecting a model for their use-case. I can look into this as a next step! 🙂 |
Updated the structure to be:
If this looks good to everyone, I'll go ahead and fill out the rest of the sections! |
cc00c96
to
7b5e143
Compare
Ok I think this is ready for review now, time to call in the experts! The goal of the doc is to provide a high-level overview of the model types in each modality, so users have more context and can start making connections. @sayakpaul, would you mind reviewing the computer vision and maybe the multimodal sections? @sanchit-gandhi, if you could take a look at the audio section please (I promise it's way shorter this time 😅 )? Thank you both so much! 👏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. Only a few minor nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great structure in this version and a nice flow of information. Excellent summary!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging, please make sure that all the summaries of models you are removing are included in their respective doc pages.
I kept most of the model summary infos that wasn't redundant with what was already on the model doc pages (for example, the original GPT); these have been added under the Tips section. I've also split out the attention mechanisms onto their own page, which I'll expand on later in a separate PR with additional attention types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* first draft of model summary * restructure docs * finish first draft * ✨minor reviews and edits * apply feedbacks * save important info, create new page for attention * add attention doc to toctree * ✨ few more minor fixes
Hi @stevhliu ! Last week I was reading a medium page that linked to an older version of the docs before this PR.
links to: http://nlp.seas.harvard.edu/2018/04/03/attention.html Do you remember if you moved this annotated transformer link to another page in the Hugging Face docs? I'm a computer scientist, and out of 100s of articles I've skimmed the past week, this was the most promising link to begin diving into in ernest. :) I understand if this page wasn't the best fit for it, but on the offchance it was entirely removed from Hugging Face, I think it's a good resource to add back somewhere. |
Hey @gamepad-coder, this page isn't from Hugging Face but from Harvard. Clicking the link works fine for me, it doesn't work for you? |
Hi @LysandreJik ~ Apologies, my previous screenshot shows the code before and after,
Here's a screenshot of
Before PR After PR
Main ask: The code containing the following was removed from the docs by this PR:
and I just want to make sure this link exists somewhere else on Hugging Face. Although I do have the link now, I want to make sure others can find this resource too. Reason: from what I can tell, Hugging Face is becoming what Open AI was meant to be from the beginning -- a resource to make AI accessible to humanity and non-gated. While the original Attention Is All You Need paper is still linked to after this PR (see the link in the after screenshot: original Transformer) (and on current To an absolute beginner, I think the Harvard walkthrough is a much better resource -- and to anyone wanting to practice AI with Python, I think this is a much better primer than the original academic paper alone. |
Hi, the Annotated Transformer blog post is indeed an excellent and iconic resource! Would you like to open a PR to include it in the docs? |
Hi @stevhliu ! Yes I can do that -- do you think this page would still be a good fit for the link? |
✅ Created a PR here ^ Let me know if you'd like me to update anything. |
This PR refactors the model summary:
Would love to hear what you think about the direction of this doc please! @sgugger @MKhalusova @LysandreJik