Is It Possible to Await Each Streamed Token or Chunk? #467

SpicyMelonYT · 2025-05-31T08:54:31Z

SpicyMelonYT
May 31, 2025

It seems that when we supply an onTextChunk or onResponseChunk callback to the .prompt function, and those callbacks are asynchronous functions, their internal await behavior doesn’t affect the token generation timing as expected. For example, if each callback awaits a 1-second delay, I would expect the next token not to be processed until that second has passed. However, this doesn’t appear to be the case.

Example callback:

onTextChunk: async (textChunk) => {
  if (textChunk.includes("t")) {
    await new Promise((resolve) => setTimeout(resolve, 1000));
  }
  process.stdout.write(textChunk);
},

In this example, whenever the chunk contains a 't', it should delay for 1 second. But what actually happens is all the chunks without 't' print immediately, and then the delayed ones print afterward, each one a second apart.

This suggests that the callback itself isn't being awaited before continuing to the next token. Is there currently a way to ensure that token generation respects asynchronous behavior within these callbacks?

giladgd · 2025-06-01T02:23:32Z

giladgd
Jun 1, 2025
Maintainer

You're right, all the on_ callbacks act as event handlers, thus they don't wait on a promise returned from these handlers.
The callbacks don't always get called directly on each token generation - they might be delayed by a few tokens at times since the chat class ensures that the current token isn't part of a chat syntax marker (such as end of generation, beginning/end of a thought segment, etc.) before dispatching the generation events for it.

Why do you need to delay the generation?

0 replies

SpicyMelonYT · 2025-06-02T07:16:17Z

SpicyMelonYT
Jun 2, 2025
Author

Oh, I see. So you're saying we don't see all the tokens actually generated by the models because some are reserved for special moments, like when the model finishes or when thinking models include a thought section first. That makes sense.

The reason I want the ability to delay isn't critical, although having it in the future would be really nice. I have a custom wrapper module we've been using for a project. It provides a simple way to handle logic for onStart, onResponse, onAbort, onEnd, and onError. All of them are async. Each one, including onResponse, is expected to properly await any time-consuming logic. But onResponse doesn’t behave that way, because it's actually onResponseChunk that awaits the onResponse callback. So what I did for the other four doesn’t work for the fifth.

I was just trying to make sure it worked the same way. It wasn’t for a specific use case, just for future-proofing.

I still think it would be a useful feature. If I'm not mistaken, when using the Python Ollama module, I was able to delay generation per token. We used the generator method to iterate over each chunk. I wonder if that was truly delaying the generation, or if it was just looping over chunks that were already being populated in some cached array. So even though we delayed the loop, the generation itself might not have been affected.

2 replies

giladgd Jun 2, 2025
Maintainer

With OpenAI API and similar SDKs the response streams regardless of how fast you consume it, so it'll just queue on your end to consume at your own pace.
You can implement something similar with your wrapper by pushing all the events to a queue and propagating them one by one while waiting for the listener callback promise before moving on to the next enqueued item.

SpicyMelonYT Jun 2, 2025
Author

Yeah that is true, I guess I could really just populate a queue system. The actual real delay of each generation to my knowledge right now is not needed. I will do that as a workaround. Especially because technically if there were tokens it generated, but have yet to actual get sent to the user or the tool system or anything like that, then it's really like it never happened since the AI, when prompted next, could be prompted as if it ONLY generated what we got from the pool. I know how to do that. So yeah thanks, I should have thought of that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is It Possible to Await Each Streamed Token or Chunk? #467

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Is It Possible to Await Each Streamed Token or Chunk? #467

Uh oh!

SpicyMelonYT May 31, 2025

Replies: 2 comments · 2 replies

Uh oh!

giladgd Jun 1, 2025 Maintainer

Uh oh!

SpicyMelonYT Jun 2, 2025 Author

Uh oh!

giladgd Jun 2, 2025 Maintainer

Uh oh!

SpicyMelonYT Jun 2, 2025 Author

SpicyMelonYT
May 31, 2025

Replies: 2 comments 2 replies

giladgd
Jun 1, 2025
Maintainer

SpicyMelonYT
Jun 2, 2025
Author

giladgd Jun 2, 2025
Maintainer

SpicyMelonYT Jun 2, 2025
Author