Add tasks to replicate Math-shepherd #1052

plaguss · 2024-11-06T15:58:27Z

Description

WORK IN PROGRESS

This task Integrates the tasks to replicate:
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Example pipeline:

from datasets import load_dataset

from distilabel.steps.tasks.math_shepherd.generator import MathShepherdGenerator
from distilabel.steps.tasks.math_shepherd.completer import MathShepherdCompleter
from distilabel.steps.tasks.math_shepherd.utils import FormatPRM
from distilabel.models import InferenceEndpointsLLM
from distilabel.pipeline import Pipeline
from distilabel.steps import CombineOutputs, ExpandColumns

ds_name = "openai/gsm8k"

ds = load_dataset(ds_name, "main", split="test").rename_column("question", "instruction").select(range(3))


with Pipeline(name="Math-Shepherd") as pipe:
    model_id_70B = "meta-llama/Meta-Llama-3.1-70B-Instruct"
    model_id_8B = "meta-llama/Meta-Llama-3.1-8B-Instruct"

    llm_70B = InferenceEndpointsLLM(
        model_id=model_id_8B,
        tokenizer_id=model_id_8B,
        generation_kwargs={"max_new_tokens": 1024, "temperature": 0.6},
    )
    llm_8B = InferenceEndpointsLLM(
        model_id=model_id_8B,
        tokenizer_id=model_id_8B,
        generation_kwargs={"max_new_tokens": 2048, "temperature": 0.6},
    )

    generator_golden = MathShepherdGenerator(
        name="golden_generator",
        llm=llm_70B,
    )
    generator = MathShepherdGenerator(
        name="generator",
        llm=llm_8B,
        M=5  # Generate 5 sample solutions
    )
    completer = MathShepherdCompleter(
        name="completer",
        llm=llm_8B,
        N=4  # Each solution will be tested with 4 completions during labelling
    )

    combine = CombineOutputs()
    expand = ExpandColumns(
        name="expand_columns",
        columns=["solutions"],
        encoded=True,
    )
    formatter = FormatPRM(name="format_prm")
    [generator_golden, generator] >> combine >> completer >> expand >> formatter


if __name__ == "__main__":
    distiset = pipe.run(use_cache=False, dataset=ds)
    distiset.push_to_hub("plaguss/test_math_shepherd_prm")

A sample dataset can be seen at plaguss/test_math_shepherd_prm

…to math-shepherd

github-actions · 2024-11-06T15:59:53Z

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-1052/

codspeed-hq · 2024-11-06T16:07:15Z

CodSpeed Performance Report

Merging #1052 will not alter performance

_{Comparing math-shepherd (3ca7b7d) with develop (e830e25)}

Summary

✅ 1 untouched benchmarks

plaguss added 8 commits October 25, 2024 17:28

Add draft for math-shepherd generator

9757b5a

First draft of step by step generator

83ab0ea

Merge branch 'develop' of https://github.com/argilla-io/distilabel in…

b7c2df5

…to math-shepherd

First working version of the math shepherd generator

ee10a6f

Add helper function to parse the solutions

5f5d823

Add passing tests

3fbb18b

First version of completer working decently enough

80a76b3

Initial version of the generator

6d63567

plaguss added the enhancement New feature or request label Nov 6, 2024

plaguss added this to the 1.5.0 milestone Nov 6, 2024

plaguss self-assigned this Nov 6, 2024

plaguss added 11 commits November 11, 2024 12:50

Update prompt to be similar across generator and completer

9e3f1ce

Add an example of how to implement the math-shepherd recipe

1454218

Add example pipeline

6353ecf

Include the implementation as a paper in the docs

04d8241

Add the label category for the completer

3a24aa2

Add docs and redirect imports

ed0d351

Update ExpandColumns to allow decoding json encoded lists

619ee8f

Add FormatPRM step to prepare the data for training

2bee8e0

Update example with FormatPRM

a7be8bb

Add tutorial to reproduce Math-Shepherd

fb71c63

Redirect import

3ca7b7d

plaguss marked this pull request as ready for review November 12, 2024 12:02

plaguss requested a review from gabrielmbmb November 12, 2024 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tasks to replicate Math-shepherd #1052

Add tasks to replicate Math-shepherd #1052

plaguss commented Nov 6, 2024 •

edited

Loading

github-actions bot commented Nov 6, 2024

codspeed-hq bot commented Nov 6, 2024 •

edited

Loading

Add tasks to replicate Math-shepherd #1052

Are you sure you want to change the base?

Add tasks to replicate Math-shepherd #1052

Conversation

plaguss commented Nov 6, 2024 • edited Loading

Description

github-actions bot commented Nov 6, 2024

codspeed-hq bot commented Nov 6, 2024 • edited Loading

CodSpeed Performance Report

Merging #1052 will not alter performance

Summary

plaguss commented Nov 6, 2024 •

edited

Loading

codspeed-hq bot commented Nov 6, 2024 •

edited

Loading