Skip to content

add reward model api #3665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 30, 2025
Merged

add reward model api #3665

merged 5 commits into from
Jun 30, 2025

Conversation

CUHKSZzxy
Copy link
Collaborator

@CUHKSZzxy CUHKSZzxy commented Jun 24, 2025

Usage

import requests
import time
from typing import List
from transformers import AutoTokenizer

# get reward
def request_reward(text: List[str], host: str, rm: str, retry_delay=0.2, max_retries=3) -> float:

    for i in range(max_retries):
        try:
            res = requests.post(
                f"http://{host}/pooling",
                json={
                    "model": rm,
                    "input": text,
                },
            )
            rewards = [e['data'] for e in res.json()['data']]
            print(f'rewards: {rewards}')

            return rewards
        except Exception as e:
            print(f"Error requesting reward: {e}")
            time.sleep(retry_delay)
            continue
    print(f"Failed to request reward after {max_retries} retries")
    with open("error.log", "a", encoding="utf8") as f:
        f.write(f"Text: {text}\n")
    return None


def main():
    model_path = "Qwen/Qwen2.5-Math-RM-72B"
    chat = [
        {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
        {"role": "user", "content": "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"},
        {"role": "assistant", "content": "To determine how much Janet makes from selling the duck eggs at the farmers' market, we need to follow these steps:\n\n1. Calculate the total number of eggs laid by the ducks each day.\n2. Determine how many eggs Janet eats and bakes for herself each day.\n3. Find out how many eggs are left to be sold.\n4. Calculate the revenue from selling the remaining eggs at $2 per egg.\n\nLet's start with the first step:\n\n1. Janet's ducks lay 16 eggs per day.\n\nNext, we calculate how many eggs Janet eats and bakes for herself each day:\n\n2. Janet eats 3 eggs for breakfast every morning.\n3. Janet bakes 4 eggs for her friends every day.\n\nSo, the total number of eggs Janet eats and bakes for herself each day is:\n\\[ 3 + 4 = 7 \\text{ eggs} \\]\n\nNow, we find out how many eggs are left to be sold:\n\\[ 16 - 7 = 9 \\text{ eggs} \\]\n\nFinally, we calculate the revenue from selling the remaining eggs at $2 per egg:\n\\[ 9 \\times 2 = 18 \\text{ dollars} \\]\n\nTherefore, Janet makes 18 dollars every day at the farmers' market."}
    ]

    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

    conversation_str = tokenizer.apply_chat_template(
        chat,
        tokenize=False, 
        add_generation_prompt=False
    )

    request_reward(text=conversation_str, host='0.0.0.0:23333', rm=model_path)


main()

@lvhan028 lvhan028 requested a review from RunningLeon June 24, 2025 14:13
@lvhan028 lvhan028 added the enhancement New feature or request label Jun 24, 2025
Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 merged commit 51ab631 into InternLM:main Jun 30, 2025
4 of 5 checks passed
irexyc pushed a commit that referenced this pull request Jul 1, 2025
* add reward model api

* fix typo

* align with vllm api

* fix batch

* list -> List
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants