Skip to content

Comments

[Refactor] Update the example usage for the @osmosis_rubric#11

Merged
BaiqingL merged 6 commits intomainfrom
brian/reward_rubric
Oct 31, 2025
Merged

[Refactor] Update the example usage for the @osmosis_rubric#11
BaiqingL merged 6 commits intomainfrom
brian/reward_rubric

Conversation

@JoyboyBrian
Copy link
Contributor

This pull request modernizes the rubric scoring workflow and improves usability, maintainability, and dataset support for the support conversation evaluation system. The most important changes include a refactor of the rubric scorer script to use a schema-driven config and dataset loader, updates to the workflow and documentation to support new usage patterns, and the introduction of a sample dataset for batch evaluation.

Rubric evaluation system refactor:

  • Major refactor of reward_rubric.py to use schema-driven config loading, dataset records, and a simplified entrypoint (score_support_conversation). The script now loads YAML config and JSONL data, supports batch evaluation, and handles provider/model selection via config or environment.
  • Updated reward_rubric_config.yaml to use a versioned schema with a rubrics[] array, separating rubric details and supporting multiple rubrics and default values.

Dataset and example improvements:

  • Changed reward_rubric_example.json to use a flat structure (solution_str, original_input, ground_truth) instead of a message array, matching new dataset format.
  • Added sample_data.jsonl as a JSONL dataset for batch rubric evaluation and CLI preview, with multiple conversation records.

Workflow and script updates:

  • Updated GitHub Actions workflow (reward_rubric.yml) to call the new shell script (run_reward_rubric.sh) and trigger on changes to the example and scorer script, not just the config. [1] [2] [3]
  • Simplified run_reward_rubric.sh to invoke the scorer as a Python module and accept CLI arguments for alternate data files.

Documentation enhancements:

  • Expanded README.md to explain the new config schema, script usage, dataset format, and CLI options for previewing and evaluating rubrics. [1] [2] [3]

gemini-code-assist[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@JoyboyBrian
Copy link
Contributor Author

The test fails is expected. Because we refactored the osmosis_ai sdk, which hasn't been merged and released yet.

@BaiqingL BaiqingL merged commit afde4b1 into main Oct 31, 2025
1 check failed
@Osmosis-AI Osmosis-AI deleted a comment from gemini-code-assist bot Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants