Added docs links to supported tasks #257

josephrocca · 2023-08-22T15:00:04Z

I linked to the feature-extraction example for sentence-similarity - relevant issues:

So, for now at least, can I add an example like this to the docs for feature-extraction?

let extractor = await pipeline('feature-extraction', 'Xenova/e5-large-v2');
let dotProduct = (vec1, vec2) => vec1.reduce((sum, val, i) => sum + val * vec2[i], 0);

let passage1 = await extractor('passage: She likes carrots and celery.', { pooling: 'mean', normalize: true });
let passage2 = await extractor('passage: This is a good calculus guide.', { pooling: 'mean', normalize: true });
let query = await extractor('query: Taking care of rabbits', { pooling: 'mean', normalize: true });

let similarity1 = dotProduct(query.data, passage1.data);
let similarity2 = dotProduct(query.data, passage2.data);

xenova · 2023-08-22T15:07:02Z

Amazing! Thanks so much 🤗. Could you also update 5_supported-tasks.snippet?

Afterwards I'll generate a preview for it

josephrocca · 2023-08-22T15:21:17Z

@xenova Done 👍

Also, on top of the above sentence-similarity example, where do you think is the best place to add examples of popular workflows that don't currently fit into a pipeline? I'm mainly thinking about this CLIP (separate image/text) example:

#227 (comment)

(Thought: Maybe there's a way to simply use the text and image parts as separate feature-extraction pipelines?)

xenova · 2023-08-22T15:27:23Z

So, for now at least, can I add an example like this to the docs for feature-extraction?

Hmm, I think I'd like to keep the code snippets to only use the pipeline function (and avoid pre- and post-processing needed by the user). But as you identified, there is technically no sentence-similarity pipeline (even though the functionality does exist). Perhaps we can just add the sentence-similarity (and even embeddings) pipelines to transformers.js before transformers 😅

Also, on top of the above sentence-similarity example, where do you think is the best place to add examples of popular workflows that don't currently fit into a pipeline? I'm mainly thinking about this CLIP (separate image/text) example:

I put those examples here and here, but I do agree that since it's quite a popular use-case, it might be worth creating a tutorial/guide for it. Same for other embeddings.

josephrocca · 2023-08-22T15:30:12Z

Can I add a link to the available models too? E.g. something like this:

Where the (models) link is:

https://huggingface.co/models?pipeline_tag=fill-mask&library=transformers.js&sort=trending

(If yes, and you prefer a different place/format/link-text/etc. let me know)

Relevant:

[Question] Can you list all available models using tranformers.js? #238

xenova · 2023-08-22T15:32:16Z

Can I add a link to the available models too? E.g. something like this:

That's a great idea! Yes please! 🤗

If yes, and you prefer a different place/format/link-text/etc. let me know

I'm not too picky/bothered :) I don't think it's too confusing or anything.

HuggingFaceDocBuilderDev · 2023-08-22T15:51:44Z

The documentation is not available anymore as the PR was closed or merged.

josephrocca · 2023-08-22T16:30:58Z

Yes please!

Done!

Hmm, I think I'd like to keep the code snippets to only use the pipeline function (and avoid pre- and post-processing needed by the user)

This seems like a bad idea imo if it's at the cost of the user/dev experience. I know I'd definitely have benefited from a code snippet like this. Is this just a mild preference, or something you're quite sure about? I definitely prefer that docs examples are as useful as possible to newbies. The other end of the spectrum is a very "technical" list of snippets/facts (parameter types, return values, etc.) - things that don't really help the users who are in need of the most help - the newbies who are just trying to get something working as a starting point.

As a user I definitely would have benefited from having an example like the one I gave. I've created gists of minimal examples like that that I can refer back to, and I think every user would have to end up repeating that work. Cosine vs dot? pooling? normalization? passage1 isn't a vector? ohh passage1.data. etc. - this can add up to 30 mins of work or more, which isn't a great experience. If the docs contain simple, working snippets for common tasks then it's such a breath of fresh air - all the technical data on parameter/return value types etc. should be secondary to that (again, in order to prioritise helping newbies get started quickly).

Perhaps we can just add the sentence-similarity (and even embeddings) pipelines to transformers.js before transformers

Even if the sentence-similarity pipeline does this behind the scenes, I think the feature-extraction pipeline should still have an example like this since it's such a common use case. A dot product is as much post-processing as an addition/multiplication - i.e. this example is not super specialised/unique.

Worth noting also that sometimes the pre-existing pipelines don't quite fit the use case - e.g. I may have some existing vectors, and some text (instead of just text pairs), or I may want to save the vectors as well as the similarity scores, rather than just getting a similarity score. Or I may want to compare features across modalities like with CLIP. IIUC, these are the sorts of things people will use the feature-extraction pipeline for, and so it makes sense to give them examples of basic stuff like checking vector similarity.

Apologies for the wall of text! 😅

xenova · 2023-08-22T18:39:45Z

Is this just a mild preference, or something you're quite sure about?

Mild preference :) If something is better for the dev experience, then I'll do that!

I definitely prefer that docs examples are as useful as possible to newbies. The other end of the spectrum is a very "technical" list of snippets/facts (parameter types, return values, etc.) - things that don't really help the users who are in need of the most help - the newbies who are just trying to get something working as a starting point.

Agreed, though I would say that the /api/pipelines section is meant to have those technical details, while /pipelines shouldn't (it should be high-level).

As a user I definitely would have benefited from having an example like the one I gave. I've created gists of minimal examples like that that I can refer back to, and I think every user would have to end up repeating that work. Cosine vs dot? pooling? normalization? passage1 isn't a vector? ohh passage1.data. etc. - this can add up to 30 mins of work or more, which isn't a great experience. If the docs contain simple, working snippets for common tasks then it's such a breath of fresh air - all the technical data on parameter/return value types etc. should be secondary to that (again, in order to prioritise helping newbies get started quickly).

Yes that's definitely something which should be improved. Perhaps adding a table of contents to the top of /api/pipelines which would link them to the relevant code snippets would be a simple addition for now (to replace the ugly auto-generated block which is there right now).

For example, it could be similar to the available tasks section, but also linking to (or including) the parameters

Or I may want to compare features across modalities like with CLIP. IIUC, these are the sorts of things people will use the feature-extraction pipeline for

Currently, the feature-extraction pipeline is only for text (something I actually found out recently, as I also thought it was for all modalities). The recommended way to get the raw model outputs is by loading models with the from_pretrained method of AutoModel, AutoModelForXXX, or XXXModel, running the Processor and/or tokenizer separately, and passing these inputs to the model. This is obviously quite tedious, and code snippets for this will help greatly.

josephrocca · 2023-08-22T20:50:17Z

though I would say that the /api/pipelines section is meant to have those technical details, while /pipelines shouldn't (it should be high-level).

Nothing wrong with having technical details there imo (especially now that we have links that go straight to relevant code snippets - much easier for newbies to navigate), but if there are already example code snippets there, why not make them as useful as possible to the dev that's reading them? If 50% of people hitting the page want to do X, then the code snippet should probably show an example of X - especially if it's just a couple more lines of code.

But I agree that stuff that's higher level (than e.g. a dot product or whatever), should probably go on a separate page (same with not-as-common use cases).

xenova · 2023-08-22T21:12:39Z

Yeah that makes sense 👍 The library also has some other (not-as-well documented) methods for dot product and cosine similarity, so we could always just use those.

For now, I'll merge these changes (as I am prepping v2.5.3 now), and we can continue improving the docs in other PRs 😇🤗

Thanks again for these improvements!

Added docs links to supported tasks

59d4383

Add docs links to supported tasks

31becf5

josephrocca added 2 commits August 22, 2023 23:44

Add HF models links to supported tasks

1b0766c

Add HF model links to supported tasks

652276b

xenova merged commit 9bb6923 into huggingface:main Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added docs links to supported tasks #257

Added docs links to supported tasks #257

josephrocca commented Aug 22, 2023 •

edited

Loading

xenova commented Aug 22, 2023 •

edited

Loading

josephrocca commented Aug 22, 2023

xenova commented Aug 22, 2023

josephrocca commented Aug 22, 2023

xenova commented Aug 22, 2023

HuggingFaceDocBuilderDev commented Aug 22, 2023 •

edited

Loading

josephrocca commented Aug 22, 2023 •

edited

Loading

xenova commented Aug 22, 2023 •

edited

Loading

josephrocca commented Aug 22, 2023 •

edited

Loading

xenova commented Aug 22, 2023

Added docs links to supported tasks #257

Added docs links to supported tasks #257

Conversation

josephrocca commented Aug 22, 2023 • edited Loading

xenova commented Aug 22, 2023 • edited Loading

josephrocca commented Aug 22, 2023

xenova commented Aug 22, 2023

josephrocca commented Aug 22, 2023

xenova commented Aug 22, 2023

HuggingFaceDocBuilderDev commented Aug 22, 2023 • edited Loading

josephrocca commented Aug 22, 2023 • edited Loading

xenova commented Aug 22, 2023 • edited Loading

josephrocca commented Aug 22, 2023 • edited Loading

xenova commented Aug 22, 2023

josephrocca commented Aug 22, 2023 •

edited

Loading

xenova commented Aug 22, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 22, 2023 •

edited

Loading

josephrocca commented Aug 22, 2023 •

edited

Loading

xenova commented Aug 22, 2023 •

edited

Loading

josephrocca commented Aug 22, 2023 •

edited

Loading