Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor model summary #21408

Merged
merged 8 commits into from
Feb 15, 2023
Merged

Refactor model summary #21408

merged 8 commits into from
Feb 15, 2023

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented Feb 1, 2023

This PR refactors the model summary:

  • updated with speech/audio, computer vision, and multimodal models (picked based on the ones with the most doc views, this can be refined to show or hide other models)
  • embeds a timeline of when models are released to provide a visual reference
  • provides structure and narrative - instead of a list - to discuss the high-level differences between models (users can compare the models, see trends and progression in the larger modelscape)
  • removed the attention section, which will get its own page (and possibly be expanded with more attention types) in the conceptual guide section

Would love to hear what you think about the direction of this doc please! @sgugger @MKhalusova @LysandreJik

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 1, 2023

The documentation is not available anymore as the PR was closed or merged.

@sgugger
Copy link
Collaborator

sgugger commented Feb 2, 2023

I don't think a model summary with sections per modality can work. I'm okay with removing the specific of each model currently in the model summary (make sure they are present in the corresponding model pages however as we don't want to lose anything) but I think we need to have a better structure with h2 sections for modalities than h3 sections for different kinds of models (in NLP encoder/decoder/encoder-decoder, in CV transformer/convnet etc.).

Copy link
Contributor

@MKhalusova MKhalusova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence about both the original model summary and the proposed new version. My main issue is that it is not entirely clear to me what audience these documents target and what they aim to achieve.
The original document looks like a look-up reference for some of the models (only the popular ones). It gives a very short TLDR of the model doc for each model and useful links, such as a link to model doc, link to checkpoints, demo on spaces and a link to the original paper.
If I know what model I am interested in, its model doc is much more useful, as it has all of the information.
If I want to learn about the difference between encoders and decoders, the information is in the course.
If I want to compare two different models for the same task, I have to jump up and down in the doc and may learn some differences in how they work internally, but what if I’m interested in other aspects such as benchmarks, size of the model, how recent it is, etc.?

The proposed doc, though it is still called “summary of the models” is actually more like a “history of the models”. This is a great read to understand how the approaches have evolved, but it is no longer a lookup for comparison of models. The navigation is different, and as a reader, I have to read the whole thing to
understand the model evolution.

So my question is, what are we aiming to achieve with this doc?
Are we trying to educate on model evolution?
Are we creating a place where one can compare models on several aspects?

Another potential goal may be to make new models discoverable.

This page ranks as #25 among transformer doc pages by pageviews in google analytics which is relatively high. It may indicate that either folks find it useful, or that it is highly linked. It would be great to understand what users expect from this doc, since they clearly come to it.

docs/source/en/model_summary.mdx Outdated Show resolved Hide resolved
docs/source/en/model_summary.mdx Outdated Show resolved Hide resolved
docs/source/en/model_summary.mdx Outdated Show resolved Hide resolved
@stevhliu
Copy link
Member Author

stevhliu commented Feb 3, 2023

but I think we need to have a better structure with h2 sections for modalities than h3 sections for different kinds of models

Good point, this’ll work better and allow me to include convnets more naturally!

Great questions @MKhalusova, let me try and clarify (and also refine the purpose of the doc while doing so)! 🙂

My main issue is that it is not entirely clear to me what audience these documents target and what they aim to achieve.

The audience is a beginner or someone who is coming from a different modality (say like, from NLP to CV), and the goal is to provide a high-level conceptual overview of the different model types available in each modality.

If I know what model I am interested in, its model doc is much more useful, as it has all of the information.

For sure, the model docs fulfill the role of providing all the nitty-gritty information. But sometimes, this can be too much detail, and you can't really make connections between models or understand why you should use one over the other because you're lacking context. The model summary doc tries to go up a level and give users an introductory overview instead of all the technical details. If they’re interested in learning more, they can follow the links to the specific model doc page.

If I want to learn about the difference between encoders and decoders, the information is in the course.

The course only has very general information about encoders and decoders. For example, it doesn’t tell you how BERT and DeBERTa are different.

If I want to compare two different models for the same task, I have to jump up and down in the doc and may learn some differences in how they work internally, but what if I’m interested in other aspects such as benchmarks, size of the model, how recent it is, etc.?

Yeah the structure I have now is not the best! 😅 But I think @sgugger's suggestion will improve this quite a bit, where it’ll be more readable, and related sections will be more localized, so you don’t have to jump around as much. The goal though is not to give users all the technical details about a model (size, performance, etc.).

So my question is, what are we aiming to achieve with this doc?

It can be difficult to approach Transformers when there are so many X-former variants. This doc hopes to provide users with a beginner-friendly guide to them so they can make connections and be like oh wait, this CV model is just like an NLP model, and it's just the input that's different. I think we also want to give more context about the models in terms of design decisions and constraints (e.g., Swin does x, unlike ViT because y). In a nutshell, I suppose it's to give users the bigger picture of the Transformer model landscape and give them a mental framework to categorize and think about Transformer models.

but what if I’m interested in other aspects such as benchmarks
Are we creating a place where one can compare models on several aspects?

I think we can boost the impact of this doc even more by addressing those issues you raise above. An embedded Space at the top of the doc that lets users discover models based on certain parameters (longer sequences, tasks, memory-efficiency, multilinguality, etc.) would be very useful and guide users toward selecting a model for their use-case. I can look into this as a next step! 🙂

@stevhliu
Copy link
Member Author

stevhliu commented Feb 6, 2023

Updated the structure to be:

## Computer vision

### Encoder
### ConvNet

## NLP

### Encoder
### Decoder
...

If this looks good to everyone, I'll go ahead and fill out the rest of the sections!

@stevhliu
Copy link
Member Author

stevhliu commented Feb 8, 2023

Ok I think this is ready for review now, time to call in the experts! The goal of the doc is to provide a high-level overview of the model types in each modality, so users have more context and can start making connections.

@sayakpaul, would you mind reviewing the computer vision and maybe the multimodal sections? @sanchit-gandhi, if you could take a look at the audio section please (I promise it's way shorter this time 😅 )? Thank you both so much! 👏

@stevhliu stevhliu marked this pull request as ready for review February 8, 2023 22:12
@stevhliu stevhliu changed the title [WIP] Refactor model summary Refactor model summary Feb 8, 2023
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Only a few minor nits.

docs/source/en/model_summary.mdx Outdated Show resolved Hide resolved
docs/source/en/model_summary.mdx Outdated Show resolved Hide resolved
docs/source/en/model_summary.mdx Outdated Show resolved Hide resolved
Copy link
Contributor

@MKhalusova MKhalusova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great structure in this version and a nice flow of information. Excellent summary!

docs/source/en/model_summary.mdx Show resolved Hide resolved
docs/source/en/model_summary.mdx Outdated Show resolved Hide resolved
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging, please make sure that all the summaries of models you are removing are included in their respective doc pages.

@stevhliu
Copy link
Member Author

I kept most of the model summary infos that wasn't redundant with what was already on the model doc pages (for example, the original GPT); these have been added under the Tips section. I've also split out the attention mechanisms onto their own page, which I'll expand on later in a separate PR with additional attention types.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@stevhliu stevhliu merged commit 7a5533b into huggingface:main Feb 15, 2023
@stevhliu stevhliu deleted the new-model-summary branch February 15, 2023 18:35
ArthurZucker pushed a commit to ArthurZucker/transformers that referenced this pull request Mar 2, 2023
* first draft of model summary

* restructure docs

* finish first draft

* ✨minor reviews and edits

* apply feedbacks

* save important info, create new page for attention

* add attention doc to toctree

* ✨ few more minor fixes
@gamepad-coder
Copy link
Contributor

Hi @stevhliu !

Last week I was reading a medium page that linked to an older version of the docs before this PR.
And today it took me almost an hour to find the following sentence again (removed by this PR):

For a gentle introduction check the annotated transformer.

links to: http://nlp.seas.harvard.edu/2018/04/03/attention.html

Do you remember if you moved this annotated transformer link to another page in the Hugging Face docs?

XG0Cw97mWN Screenshot 2024-03-10 192805

I'm a computer scientist, and out of 100s of articles I've skimmed the past week, this was the most promising link to begin diving into in ernest. :)

I understand if this page wasn't the best fit for it, but on the offchance it was entirely removed from Hugging Face, I think it's a good resource to add back somewhere.

@LysandreJik
Copy link
Member

Hey @gamepad-coder, this page isn't from Hugging Face but from Harvard. Clicking the link works fine for me, it doesn't work for you?

https://nlp.seas.harvard.edu/2018/04/03/attention.html

@gamepad-coder
Copy link
Contributor

gamepad-coder commented Mar 11, 2024

Hi @LysandreJik ~

Apologies, my previous screenshot shows the code before and after,
but it only shows the before picture of the docs.

  • (I should add an after picture of the docs to better illustrate what my ask is).

Here's a screenshot of

  • the docs before this PR on the left.
  • the docs after this PR on the right.

Before PR

After PR

image

Main ask:

The code containing the following was removed from the docs by this PR:

For a gentle introduction check the [annotated transformer](http://nlp.seas.harvard.edu/2018/04/03/attention.html).

and I just want to make sure this link exists somewhere else on Hugging Face.

Although I do have the link now, I want to make sure others can find this resource too.


Reason: from what I can tell, Hugging Face is becoming what Open AI was meant to be from the beginning -- a resource to make AI accessible to humanity and non-gated.

While the original Attention Is All You Need paper is still linked to after this PR (see the link in the after screenshot: original Transformer) (and on current main 2024-3-11),
the Annotated Transformer link is a walkthrough of that Attention Is All You Need paper annotated by Harvard (see the link in the before screenshot: annotated transformer).

To an absolute beginner, I think the Harvard walkthrough is a much better resource -- and to anyone wanting to practice AI with Python, I think this is a much better primer than the original academic paper alone.

@stevhliu
Copy link
Member Author

Hi, the Annotated Transformer blog post is indeed an excellent and iconic resource! Would you like to open a PR to include it in the docs?

@gamepad-coder
Copy link
Contributor

Hi @stevhliu !
Sorry for the delay, I've had a bad work-life balance this quarter :)

Yes I can do that -- do you think this page would still be a good fit for the link?
Or do you think adding it to another section of the docs would be better?

@gamepad-coder
Copy link
Contributor

✅ Created a PR here ^

Let me know if you'd like me to update anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants