Add reward rubric scorer, CI workflow, and documentation#9
Add reward rubric scorer, CI workflow, and documentation#9JoyboyBrian merged 14 commits intomainfrom
Conversation
Summary of ChangesHello @JoyboyBrian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the repository's evaluation capabilities by integrating a new rubric-based reward scoring system. It provides the core logic for scoring, flexible configuration options, and example data to facilitate local testing. A dedicated CI workflow ensures that rubric definitions are automatically validated, and the comprehensive documentation updates guide users through these new features and the overall repository structure. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive reward rubric scoring system, including the scorer implementation, configuration, example data, a CI workflow, and updated documentation. The overall structure is well-organized. My review focuses on improving the maintainability and clarity of the new Python script reward_rubric/reward_rubric.py by addressing code duplication and enhancing documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive new feature for rubric-based reward scoring, including the scorer implementation, configuration, example data, a CI workflow, and excellent documentation updates in the README. The code is well-structured and the feature is well-integrated. My review includes a few suggestions to improve robustness, maintainability, and consistency, such as hardening the configuration loading, simplifying default value handling, improving a shell script path, and addressing dependency management to avoid duplication. Overall, this is a great addition to the repository.
This pull request introduces a rubric-based reward scoring system for support conversations, including local and CI/CD automation for evaluating rubric changes. The main additions are a new
reward_rubricmodule with configuration and example files, a GitHub Actions workflow to validate rubric edits, and supporting documentation and scripts. These changes make it easy to locally and automatically score conversations using hosted models, ensuring rubric edits are validated in CI.Rubric-based scoring system:
reward_rubric/reward_rubric.pymodule, which defines the@osmosis_rubric score_support_conversationfunction for scoring support conversations against a configurable rubric using hosted models. Includes a CLI for local testing and detailed output.reward_rubric/reward_rubric_config.yaml(rubric text, score range, ground truth, model info) andreward_rubric/reward_rubric_example.json(sample conversation for local evaluation). [1] [2]Automation and CI/CD integration:
.github/workflows/reward_rubric.ymlto run the rubric scorer automatically on pushes and pull requests that modify the rubric config. The workflow installs dependencies, injects the API key, and runs the scoring script so reviewers see automated results.scripts/run_reward_rubric.shfor convenient local rubric evaluation with environment-based API key support.Documentation and dependency updates:
README.mdto document the new rubric scorer, repository layout, usage instructions, and CI/CD setup steps.pyproject.tomlto require Python 3.12 and added dependencies for YAML and rubric evaluation.