-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java/Javascript split questions #424
Comments
Hi @vandyxiaowei, Regarding your first question: Regarding your second question: Let me know if you have more questions! |
Hi~,can we show the detailed scores for Java/Javascript splits on the leaderboard (instead of the averaged Simple function)? |
…est Category (#538) In our BFCL official communication channels, including the evaluation manual blog, GitHub issue replies (such as #424), and our Discord channel, we have previously stated the following: > For Java and JavaScript test category, before querying the model, we do some pre-processing on the prompt and function document. Specifically, at the end of the prompt, we will explicitly state that `the provided function is in Java 8/JavaScript/Python syntax`. And for parameter types that are not native to JSON, we will change their type to `String` (since `String` is JSON compatible) and add in the parameter description that `f" This is Java/JavaScript {value['type']} in string representation."` > As an example, when expecting type `ArrayList`, model will get the instruction that this is a `String` type parameter with the parameter description containing `"This is Java ArrayList in string representation."`, and thus the model should output the value in `String` format (eg, `"new ArrayList<>(Arrays.asList(10, 20, 30))"`), which is JSON compatible. However, the code for `language_specific_pre_processing` did not implement this correctly. Due to an indentation issue, the parameter description was only modified when the parameter type was `any`, and the part where the parameter type is cast to `String` was never implemented. This issue was unnoticed until PR #516 was merged because of the double-casting problem. It *significantly impacts* the evaluation score for the Java and JavaScript categories. We will update the leaderboard very soon. This PR: - Addresses the above issue and ensures that the evaluation logic aligns with the previously described behaviour - Updates two entries in the JavaScript dataset, due to their parameters missing a `description` field - Index: `14, 45` We sincerely apologize for the oversight.
…est Category (ShishirPatil#538) In our BFCL official communication channels, including the evaluation manual blog, GitHub issue replies (such as ShishirPatil#424), and our Discord channel, we have previously stated the following: > For Java and JavaScript test category, before querying the model, we do some pre-processing on the prompt and function document. Specifically, at the end of the prompt, we will explicitly state that `the provided function is in Java 8/JavaScript/Python syntax`. And for parameter types that are not native to JSON, we will change their type to `String` (since `String` is JSON compatible) and add in the parameter description that `f" This is Java/JavaScript {value['type']} in string representation."` > As an example, when expecting type `ArrayList`, model will get the instruction that this is a `String` type parameter with the parameter description containing `"This is Java ArrayList in string representation."`, and thus the model should output the value in `String` format (eg, `"new ArrayList<>(Arrays.asList(10, 20, 30))"`), which is JSON compatible. However, the code for `language_specific_pre_processing` did not implement this correctly. Due to an indentation issue, the parameter description was only modified when the parameter type was `any`, and the part where the parameter type is cast to `String` was never implemented. This issue was unnoticed until PR ShishirPatil#516 was merged because of the double-casting problem. It *significantly impacts* the evaluation score for the Java and JavaScript categories. We will update the leaderboard very soon. This PR: - Addresses the above issue and ensures that the evaluation logic aligns with the previously described behaviour - Updates two entries in the JavaScript dataset, due to their parameters missing a `description` field - Index: `14, 45` We sincerely apologize for the oversight.
gorilla/berkeley-function-call-leaderboard/eval_checker/java_type_converter.py
Line 37 in ae5f0a2
Thanks!
The text was updated successfully, but these errors were encountered: