Skip to content

Added functionality for adding metadata using validate api #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 17, 2025

Conversation

sanjanag
Copy link
Contributor

@sanjanag sanjanag commented Apr 8, 2025

Key Info

  • Implementation plan: link
  • Priority: urgent

What changed?

Two new arguments added to validate() in the Valdiator API

  • metadata: dict for user provided metadata
  • log_results: bool for including internally computed metadata, such as scores from TrustwortyRAG

elisno added 4 commits April 16, 2025 22:43
- Introduced `process_score_metadata` function to standardize score metadata.
- Updated `validate` method to include optional logging of internally processed metadata.
- Adjusted `_remediate` method to accept updated metadata structure.
@elisno elisno marked this pull request as ready for review April 16, 2025 23:59
@elisno elisno requested a review from kelsey-wong April 16, 2025 23:59
Copy link
Contributor

@kelsey-wong kelsey-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also don't forget to bump the version and update release notes so that the changes can go live!

metadata[metric] = score_data["score"]

# Add is_bad flags with standardized naming
is_bad_key = score_to_is_bad_key.get(metric, f"is_not_{metric}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to use f"is_not_{metric}" as a fallback, or can we define all expected keys in score_to_is_bad_key?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to support arbitrary evals passed into the Validator.

For example, a user-defined eval like query_ease_customized would require a corresponding top-level key fallback. We automatically run and threshold that eval, so it needs to be handled explicitly.

The fallback approach of using f"is_not_{metric}" feels awkward to me—probably the main reason I’m hesitant about relying on top-level keys to store the nested is_bad flags.

Would it be cleaner to restructure it like this instead?

metadata = {
    "is_bad": {
        "trustworthiness": True,
        "response_helpfulness": False,
        ...
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that could be cleaner; is the metadata already expected in a certain format by the frontend? If not, I would say to structure it in a more straightforward/easy to understand format like you're suggesting.

thresholds_dict[metric] = thresholds.get_threshold(metric)
metadata["thresholds"] = thresholds_dict

return metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan to add label to the metadata that's passed to project.query?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not long term, no. But I've added it in e0141b3 for now.

@elisno elisno merged commit ad5ec94 into main Apr 17, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants