- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 11k
 
Description
🚀 The feature, motivation and pitch
in entrypoints.openai.serving_completions.py I see OpenAIServingCompletion holds the method completion_stream_generator that can return usage info for each chunk by using StreamOptions continuous_usage_stats.
line 297.
``if (request.stream_options
and request.stream_options.include_usage):
if (request.stream_options.continuous_usage_stats
or output.finish_reason is not None):
prompt_tokens = len(res.prompt_token_ids)
completion_tokens = len(output.token_ids)
usage = UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=prompt_tokens + completion_tokens,
)
Somehow, this is not the case in entrypoints.openai.serving_chat.py. I propose to add this feature for OpenAIServingChat.
What do you think ?
Alternatives
No response
Additional context
No response