Skip to content

Feature Request: String-Based Comparison Reward model for RLOOTrainer #2280

Open
@HiroshigeAoki

Description

Feature request

Add an option to the RLOOTrainer that enables the use of string-based reward models, such as BLEU and Levenshtein distance, for evaluating model outputs.

Motivation

Currently, the reward_model in RLOOTrainer accepts tensor inputs only, limiting the ability to use string-based metrics for reward model. Incorporating string comparison metrics would allow users to leverage a broader range of string similarity measures.

Your contribution

I am open to collaborating with the community to implement this feature!

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions