-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Salesforce xLAM handler and fix minor issues #532
Conversation
Thanks for your contribution @zuxin666! Will review it today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @zuxin666,
A few things that I think need to be addressed.
- In the
convert_to_xlam_tool
function, these three lines are not necessary. In your model card on HuggingFace, theconvert_to_xlam_tool
function doesn't contain these lines. Further, the example function doc provided on HuggingFace doesn't have the per-parameterrequired
field, but rather a generalrequired
list (the same format as the BFCL dataset). Thus no conversion is needed for therequired
part. - In the
decode_ast
function, these lines should be removed. This will result in double-casting issue, and introduce false positives in evaluation score. For a detailed explanation, please check out [BFCL] Fix Double-Casting Issue in model_handler for Java and JS category. #516. We addressed this issue for all model handlers there. - The BFCL dataset is not in the OpenAI format. Thus, you might want to have these lines (from the OpenAI handler) to convert the dataset entry to OpenAI format before calling the
convert_to_xlam_tool
function. Language-specific hints at the end of the prompt should be included (this line). Every model should get the same prompt to ensure fairness.Edit, this has been done in the OSS_Handler part.
Let me know what you think and thank you again for your contribution.
Hi @HuanzhiMao , thanks for the feedback!!
There are a lot of such cases, where I can not find why the model outputs should be misclassified as false according to the
I will check the remaining two items later. |
#538 has been merged. |
Hey @zuxin666, Sorry for the delayed response. PR #538 and #545 should have addressed the Java and JavaScript issues you mentioned. I tested this PR with both the version you currently have and the version where these lines are removed for double-casting reason. The results show that adding the double-casting only brings a <0.5% increase in overall accuracy for both models, with about 5% increase for Regarding my previous fourth point, you are correct. It's already handled in Looking forward to seeing the Salesforce models on the leaderboard! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @HuanzhiMao , thanks for fixing them!! I checked the other two items, and found that there are no issues for the We have also removed the lines that might cause double-casting problems in the We just did another round of experiments with the newest updated data and eval framework and confirm that the pipeline works well and as expected. The results from our side (with 1 A100 GPU) are as follows for your reference:
Looking forward to see the model on the leaderboard! Thank you!! |
Sure no problem. Hope this |
Yea I agree. The reason why the formula sticks to use 8 x V100 machine is because that's the "most advanced" computing resource we have in our lab 😂 We will see if we can get some better GPUs :/ |
This PR will add ability to inference on Salesforce xLAM function-calling models. They are open-sourced on [Huggingface](https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4). Therefore, no cost per thousand function calls will be recorded. Apart from the `xlam_handler`, this PR also: - improves the `oss_handler` and the `base_handler` by passing the `dtype` (for vllm) as another argument. This is crucial since we have witnessed some output issues if we enforce our `bfloat16` model to be loaded with `float16`. - fixed one issue in `berkeley-function-call-leaderboard/apply_function_credential_config.py`. Without the fixing, the saved data with this script will not be loaded properly. @HuanzhiMao Would you please take a look? Thanks for the great benchmark! --------- Co-authored-by: Huanzhi (Hans) Mao <huanzhimao@gmail.com>
This PR will add ability to inference on Salesforce xLAM function-calling models. They are open-sourced on Huggingface. Therefore, no cost per thousand function calls will be recorded.
Apart from the
xlam_handler
, this PR also:oss_handler
and thebase_handler
by passing thedtype
(for vllm) as another argument. This is crucial since we have witnessed some output issues if we enforce ourbfloat16
model to be loaded withfloat16
.berkeley-function-call-leaderboard/apply_function_credential_config.py
. Without the fixing, the saved data with this script will not be loaded properly.@HuanzhiMao Would you please take a look? Thanks for the great benchmark!