Description
Streamer
https://huggingface.co/docs/transformers/generation_strategies#streaming
Reason for request
Currently, iterating max_new_tokens: 1
takes much longer time than single generation. Text generation takes time even for light model. Token streaming is key feature for user experience. In my case. Task-specific text generation could be a key feature of AI app development using transformers.js with low cost.
Additional context
I'm not sure the TextStreamer
class need to be compatibility with python transformers. I wrote an use case proposal with TextStreamer extends TransformStream
. AsyncIterable, AsynsGeneator and Stream API might be usable.
Suggesting streaming code
let aggregatedResponse = '';
const streamer = TextStreamer()
const pipe: QuestionAnsweringPipeline = await pipeline(
'text-generation',
model,
{ quantized: true }
)
pipe(prompt, { streamer: streamer })
res = new Response(streamer.readable)
This is vercel's approach
https://github.com/vercel/ai/blob/main/packages/core/streams/ai-stream.ts
https://github.com/vercel-labs/ai-chatbot/blob/main/app/api/chat/route.ts