Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add stream to pipeline parameters #30487

Open
not-lain opened this issue Apr 25, 2024 · 9 comments
Open

add stream to pipeline parameters #30487

not-lain opened this issue Apr 25, 2024 · 9 comments
Labels
Core: Pipeline Internals of the library; Pipeline. Feature request Request for a new feature

Comments

@not-lain
Copy link
Contributor

Feature request

add option to stream output from pipeline

Motivation

using tokenizer.apply_chat_template then other stuff then model.generate is pretty repetitive and I think it's time to integrate this with pipelines, also it's time to add a streaming pipeline too.

Your contribution

I can provide this resource as a reference.
This is a pr I made with the requested feature https://huggingface.co/google/gemma-1.1-2b-it/discussions/14.
another tip I can provide is don't use yield and return in the same function, you should separate them (it's a python problem)
sadly I'm a bit busy lately to open a PR, but if I could find some time I'll try to help out.

@amyeroberts
Copy link
Collaborator

Hi @not-lain, thanks for opening a feature request!

using tokenizer.apply_chat_template then other stuff then model.generate is pretty repetitive

Could you elaborate on this a bit e.g. with a code snippet? Is is the streaming feature when generating you wish to be able to use?

@amyeroberts amyeroberts added Core: Pipeline Internals of the library; Pipeline. Feature request Request for a new feature labels Apr 25, 2024
@not-lain
Copy link
Contributor Author

@amyeroberts
normally when someone wants to stream their output (example: https://huggingface.co/spaces/ysharma/Chat_with_Meta_llama3_8b) they need to apply all that code, and this has been quite a repetitive process for AI models, and I thought we can implement this within the transformers library.

@not-lain
Copy link
Contributor Author

I was thinking about integrating this with only text-generation models, but I think we can do that too with image-to-text models.

this is a good resource for that: https://huggingface.co/blog/idefics#getting-started-with-idefics

@amyeroberts
Copy link
Collaborator

Thanks for sharing an example!

I'm not sure this is really something we want to add to the pipelines. Pipelines are intended to be simple objects which enable users to get predictions in one line, they're not intended to support all transformers' functionality. In this case, I think it makes sense to leave streaming outside as it enables the user to have full control of the threads and yielding logic.

cc @Rocketknight1 @gante for your thoughts

@Rocketknight1
Copy link
Member

Yeah, I'm on @amyeroberts's side here - pipelines are (imo) a sort of high-level "on-ramp" API for transformers, which make it easy for users to quickly get outputs from common workflows. We definitely don't want to pack them full of features to handle every use-case - that's what the lower-level API is for! If we make pipelines very feature-heavy, then they become very big and confusing for new users, which defeats their purpose.

Once users are streaming output and working with threads/yielding/async/etc. they're probably advanced enough that they don't need the pipelines anyway.

@fakerybakery
Copy link

Personally would love to have streaming support in pipelines - it’s the one missing feature. Currently, streaming is quite difficult to use, but this would make it so much easier.

@gante
Copy link
Member

gante commented May 13, 2024

FYI: we will be refactoring generate over the next weeks, including adding a better support for yield. It may work with pipelines, but it would be a side-effect: as @Rocketknight1 wrote, we don't want to pack too many features there, as it would defeat the point. The pipeline API is not designed to work with async stuff :)

@not-lain
Copy link
Contributor Author

it's ok, I understand.
I will also take a look at the generate issue, maybe I can help out a little

@gante
Copy link
Member

gante commented May 29, 2024

generate refactor tracker: #30810

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core: Pipeline Internals of the library; Pipeline. Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

5 participants