Skip to content

Comments

Add reward rubric scorer, CI workflow, and documentation#9

Merged
JoyboyBrian merged 14 commits intomainfrom
brian/reward_rubric
Oct 14, 2025
Merged

Add reward rubric scorer, CI workflow, and documentation#9
JoyboyBrian merged 14 commits intomainfrom
brian/reward_rubric

Conversation

@JoyboyBrian
Copy link
Contributor

@JoyboyBrian JoyboyBrian commented Oct 13, 2025

This pull request introduces a rubric-based reward scoring system for support conversations, including local and CI/CD automation for evaluating rubric changes. The main additions are a new reward_rubric module with configuration and example files, a GitHub Actions workflow to validate rubric edits, and supporting documentation and scripts. These changes make it easy to locally and automatically score conversations using hosted models, ensuring rubric edits are validated in CI.

Rubric-based scoring system:

  • Added the reward_rubric/reward_rubric.py module, which defines the @osmosis_rubric score_support_conversation function for scoring support conversations against a configurable rubric using hosted models. Includes a CLI for local testing and detailed output.
  • Added reward_rubric/reward_rubric_config.yaml (rubric text, score range, ground truth, model info) and reward_rubric/reward_rubric_example.json (sample conversation for local evaluation). [1] [2]

Automation and CI/CD integration:

  • Introduced .github/workflows/reward_rubric.yml to run the rubric scorer automatically on pushes and pull requests that modify the rubric config. The workflow installs dependencies, injects the API key, and runs the scoring script so reviewers see automated results.
  • Added scripts/run_reward_rubric.sh for convenient local rubric evaluation with environment-based API key support.

Documentation and dependency updates:

  • Overhauled README.md to document the new rubric scorer, repository layout, usage instructions, and CI/CD setup steps.
  • Updated pyproject.toml to require Python 3.12 and added dependencies for YAML and rubric evaluation.

@gemini-code-assist
Copy link

Summary of Changes

Hello @JoyboyBrian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the repository's evaluation capabilities by integrating a new rubric-based reward scoring system. It provides the core logic for scoring, flexible configuration options, and example data to facilitate local testing. A dedicated CI workflow ensures that rubric definitions are automatically validated, and the comprehensive documentation updates guide users through these new features and the overall repository structure.

Highlights

  • New Reward Rubric Scorer: Introduced a new rubric-based scoring mechanism (reward_rubric.py) that leverages osmosis_ai.evaluate_rubric to assess conversations against a defined rubric, enabling more nuanced evaluation of AI responses.
  • Rubric Configuration and Examples: Added reward_rubric_config.yaml to define the rubric prompt, scoring parameters, and default model information, alongside reward_rubric_example.json providing sample conversation data for testing.
  • CI Workflow for Rubric Validation: Implemented a GitHub Actions workflow (.github/workflows/reward_rubric.yml) to automatically run the rubric scorer whenever its configuration changes, ensuring continuous validation and visibility for reviewers.
  • Updated Documentation and Repository Structure: The README.md has been extensively revised to reflect the new repository layout, requirements, and detailed instructions for setting up and using the FastMCP server, numeric reward functions, and the newly added rubric-based scoring.
  • Dependency Updates: Added starlette and pyyaml to pyproject.toml and requirements.txt to support the new rubric functionality and its configuration loading.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/reward_rubric.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive reward rubric scoring system, including the scorer implementation, configuration, example data, a CI workflow, and updated documentation. The overall structure is well-organized. My review focuses on improving the maintainability and clarity of the new Python script reward_rubric/reward_rubric.py by addressing code duplication and enhancing documentation.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

@JoyboyBrian
Copy link
Contributor Author

/gemini review

@JoyboyBrian JoyboyBrian requested a review from BaiqingL October 13, 2025 13:04
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive new feature for rubric-based reward scoring, including the scorer implementation, configuration, example data, a CI workflow, and excellent documentation updates in the README. The code is well-structured and the feature is well-integrated. My review includes a few suggestions to improve robustness, maintainability, and consistency, such as hardening the configuration loading, simplifying default value handling, improving a shell script path, and addressing dependency management to avoid duplication. Overall, this is a great addition to the repository.

@JoyboyBrian JoyboyBrian merged commit 0300042 into main Oct 14, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants