- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[gpt-oss][1][bugfix] fix streaming final output #24466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0fd7f79    to
    016bafd      
    Compare
  
    02e414b    to
    0c31b0b      
    Compare
  
    | @aarnphm @DarkLight1337 @robertgshaw2-redhat @simon-mo this PR is ready for review :) | 
| Also CC @yeqcharlotte | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG! saw you define StreamingResponsesResponse in later PR, do we plan on update BaseModel in this diff as well
| 
 I have a follow up PR here: #24556. i thought it would be easier for review to split them into 2 PRs but could combine them too :) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR addresses three different changes. I recommend splitting it into multiple separate PRs. This PR should focus only on fixing the issue where the output of the last event is empty.
82fd949    to
    c1098b4      
    Compare
  
    7ec9669    to
    309699c      
    Compare
  
    309699c    to
    f4e284d      
    Compare
  
    f4e284d    to
    9a71956      
    Compare
  
    This reverts commit c87ca3325edbd5e80800df6e4151cee6a9c8c923. Signed-off-by: Andrew Xia <axia@meta.com>
9a71956    to
    9b19217      
    Compare
  
    | # Check if the current token is part of reasoning content | ||
| self._update_num_reasoning_tokens() | ||
| self.last_tok = tok | ||
| if len(self._messages) - self.num_init_messages < len( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add a unit test covering this behavior. the test can be constructed similar to https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/test_context.py#L313
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty for the suggestion, just added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ready for re-review @chaunceyjiang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
per @chaunceyjiang's comments, I've also split up this PR into a couple follow ups:
Test Plan
Test Result
Before
^ note in this final response, output is an empty array. This is not what we want.
After:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.