-
-
Couldn't load subscription status.
- Fork 10.9k
[CI][gpt-oss] Enable python tool tests in CI #24315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables Python tool tests in the CI, which is a great step forward. The changes involve un-skipping existing tests and adding necessary logic to validate the output of the code interpreter tool. My review focuses on improving the robustness of the new test logic. Specifically, I've identified a potential fragility in how the numerical result is extracted from the model's output string. The suggestions aim to make the tests less likely to fail due to minor variations in the model's response format, ensuring a more reliable CI pipeline.
| index = output_string.rfind('9', 0, last_index) | ||
| else: | ||
| index = -1 | ||
| assert parse_integer_from_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about assert "977966748" in output_string?
and can you use a harder question that the model can't answer if it doesn't use python tool properly? e.g., multiple 2 very very large numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about
assert "977966748" in output_string?
It may not work as expected because output text probably contains comma-seperator and/or markdown format for mathematical expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and can you use a harder question that the model can't answer if it doesn't use python tool properly? e.g., multiple 2 very very large numbers
I will try to calculate the product of two large prime numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add restrictions statement in prompt to simplify output text for easier assertion.
| "`var_a=9999999967*9999999769` " | ||
| "`var_b=9999999943*9999999781` " | ||
| "`var_c=9999999929*9999999787`. " | ||
| "Show only the sorted variable names with `<` using ascii."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure the test will fail without "--tool-server", "demo"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you are right, the model may guess out the correct answer.
I make a new prompt to let model infer the decimals of a root cube of a big number. And I try it with and without tool server. As expected, it returns correct digits of decimals with tool server while without tool server it returns incorrect numbers and sometimes it may take very long time in reasoning.
|
Depend on #182 |
|
Do you need to install gpt-oss to run this test? |
|
Also CC @yeqcharlotte |
I think so, it seems like gpt-oss is not listed in any requirments txt file |
b2d94cd to
4435a7a
Compare
vllm/entrypoints/tool.py
Outdated
| @@ -13,6 +13,28 @@ | |||
| logger = init_logger(__name__) | |||
|
|
|||
|
|
|||
| def patch_call_python_script_with_uv(): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do this in unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because python tool is called in a seperate process(namely a RemoteOpenAIServer), I don't think we have a method to patch across processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
|
@wuhang2014 CI failure is related. Can you take a look? |
Sure, I think gpt-oss is not doing well with empty text returned from python tool. Maybe we can add a hint in text like "Empty result from python tool" instead, and I will try it later. |
Head branch was pushed to by a user without write access
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
7c2b30e to
1ea4ad4
Compare
bf6c410 to
5f91521
Compare
Signed-off-by: wuhang <wuhang6@huawei.com>
5f91521 to
fb93699
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you.
Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
|
@wangxiyuan sorry for that. It is fixed in #26392 |
|
@heheda12345 thanks for the quick fix. It makes sense. |
Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Fix #24199
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.