-
Notifications
You must be signed in to change notification settings - Fork 3
Added functionality for adding metadata using validate api #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced `process_score_metadata` function to standardize score metadata. - Updated `validate` method to include optional logging of internally processed metadata. - Adjusted `_remediate` method to accept updated metadata structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also don't forget to bump the version and update release notes so that the changes can go live!
metadata[metric] = score_data["score"] | ||
|
||
# Add is_bad flags with standardized naming | ||
is_bad_key = score_to_is_bad_key.get(metric, f"is_not_{metric}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to use f"is_not_{metric}"
as a fallback, or can we define all expected keys in score_to_is_bad_key
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended to support arbitrary evals passed into the Validator.
For example, a user-defined eval like query_ease_customized
would require a corresponding top-level key fallback. We automatically run and threshold that eval, so it needs to be handled explicitly.
The fallback approach of using f"is_not_{metric}"
feels awkward to me—probably the main reason I’m hesitant about relying on top-level keys to store the nested is_bad
flags.
Would it be cleaner to restructure it like this instead?
metadata = {
"is_bad": {
"trustworthiness": True,
"response_helpfulness": False,
...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that could be cleaner; is the metadata already expected in a certain format by the frontend? If not, I would say to structure it in a more straightforward/easy to understand format like you're suggesting.
thresholds_dict[metric] = thresholds.get_threshold(metric) | ||
metadata["thresholds"] = thresholds_dict | ||
|
||
return metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you plan to add label
to the metadata that's passed to project.query
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not long term, no. But I've added it in e0141b3 for now.
Key Info
What changed?
Two new arguments added to
validate()
in the Valdiator APImetadata: dict
for user provided metadatalog_results: bool
for including internally computed metadata, such as scores from TrustwortyRAG