Skip to content

Conversation

sharvin2187
Copy link
Contributor

@sharvin2187 sharvin2187 commented May 21, 2025

Description

  • Adds an example to run a grpo job using AzureML.
  • Includes sample datasets for the job, these have been shortened to 50 samples only as the repo has a policy that PR-size < 2MB. We can gradually add the extra data back in.

Checklist

  • I have read the contribution guidelines.
  • I have coordinated with the docs team (mldocs@microsoft.com) if this PR deletes files or changes any file names or file extensions.
  • Pull request includes test coverage for the included changes.
  • This notebook or file is added to the CODEOWNERS file, pointing to the author or the author's team.

@sharvin2187 sharvin2187 force-pushed the shjondhale/build-grpo branch from 7193c51 to 7e212f5 Compare May 26, 2025 17:56
@yeshsurya yeshsurya requested review from Copilot and yeshsurya May 27, 2025 05:18
yeshsurya
yeshsurya previously approved these changes May 27, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a GRPO job example with trainer scripts, configs, callbacks, environment setup, and sample datasets for a medical MCQA task in AzureML.

  • Adds reward functions and trainer code for reasoning tasks
  • Introduces YAML/JSON configs and Docker/requirements for environment
  • Includes sample datasets (50 records each) and AML setup script

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/python/jobs/grpo/src/grpo_trainer_rewards.py Implements format and accuracy reward functions
sdk/python/jobs/grpo/src/BldDemo_Reasoning_Train.py Main GRPO training script with CLI arg parsing
sdk/python/jobs/grpo/aml_setup.py AML workspace setup for dataset, model, compute, env
sdk/python/jobs/grpo/src/grpo_trainer_callbacks.py Callback to save HF-transformers models in MLflow
sdk/python/jobs/grpo/src/grpo_trainer_config.yaml Trainer configuration for vLLM, reward weighting
sdk/python/jobs/grpo/environment/Dockerfile Docker image build and package installs
sdk/python/jobs/grpo/environment/requirements.txt Python dependencies
sdk/python/jobs/grpo/datasets/med_mcqa/*.jsonl Sample dataset splits for training/eval/testing
Comments suppressed due to low confidence (4)

sdk/python/jobs/grpo/aml_setup.py:13

  • The Model import at line 13 duplicates the earlier import from azure.ai.ml.entities. Remove one of the imports to prevent confusion and improve code clarity.
from mlflow.models import Model

sdk/python/jobs/grpo/aml_setup.py:100

  • Fix the typo in the comment: change "falsh" to "flash".
# This job requires falsh attention and needs A100 or H100 GPUs.

sdk/python/jobs/grpo/src/grpo_trainer_rewards.py:14

  • There are no tests covering the new reward functions (format_reward and _medmcqa_match_fn). Consider adding unit tests to validate correct behavior and edge cases.
def format_reward(completions, **kwargs):

sdk/python/jobs/grpo/src/BldDemo_Reasoning_Train.py:186

  • The SaveMLflowModelCallback constructor does not accept a preprocessor argument but is called with it here, leading to a runtime TypeError. Change the argument name or update the callback signature to match.
preprocessor=tokenizer,

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sharvin2187 sharvin2187 requested review from Copilot and yeshsurya May 27, 2025 06:35
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a GRPO example for running a policy optimization job on AzureML, including new reward functions, callbacks, dataset samples, environment specs, and setup scripts.

  • Introduce reward computation and accuracy matching functions.
  • Provide a full training script and AML workspace setup.
  • Add sample datasets, Dockerfile, and requirements for reproducible runs.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/python/jobs/grpo/src/grpo_trainer_rewards.py Implemented format and accuracy reward functions
sdk/python/jobs/grpo/src/grpo_trainer_config.yaml Added YAML config for GRPO training parameters
sdk/python/jobs/grpo/src/grpo_trainer_callbacks.py Added MLflow save callback for model artifact management
sdk/python/jobs/grpo/src/BldDemo_Reasoning_Train.py Example training script using GRPOTrainer
sdk/python/jobs/grpo/environment/requirements.txt Pinned Python package dependencies
sdk/python/jobs/grpo/environment/Dockerfile Docker image build instructions
sdk/python/jobs/grpo/datasets/med_mcqa/validation.jsonl Sample validation split for medical MCQA
sdk/python/jobs/grpo/datasets/med_mcqa/train.jsonl Sample training split for medical MCQA
sdk/python/jobs/grpo/datasets/med_mcqa/test.jsonl Sample test split for medical MCQA
sdk/python/jobs/grpo/aml_setup.py Script to register dataset, model, compute, environment in AML workspace
Comments suppressed due to low confidence (2)

sdk/python/jobs/grpo/src/grpo_trainer_rewards.py:14

  • These new reward functions lack associated unit tests; consider adding tests to verify that format_reward, _medmcqa_match_fn, and accuracy produce expected outputs for a variety of inputs.
def format_reward(completions, **kwargs):

sdk/python/jobs/grpo/datasets/med_mcqa/validation.jsonl:1

  • The dataset prompt contains a typo (Murphy&;s); it should be Murphy's sign for clarity and correctness.
"Murphy&;s sign is seen in?"

@yeshsurya
Copy link
Contributor

It would be good to add path in CODEOWNERs file as well

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sharvin2187
Copy link
Contributor Author

Adding a top-level README file to help users navigate the repo will be useful. Can also have pointers to the demo video on Build website on that README page.

Done.

babu-namburi
babu-namburi previously approved these changes May 28, 2025
Copy link
Contributor

@babu-namburi babu-namburi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…d.ipynb

Co-authored-by: Gayatri Penumetsa <181455625+gpenumetsa-msft@users.noreply.github.com>
yeshsurya
yeshsurya previously approved these changes May 28, 2025
…d.ipynb

Co-authored-by: Gayatri Penumetsa <181455625+gpenumetsa-msft@users.noreply.github.com>
…d.ipynb

Co-authored-by: Gayatri Penumetsa <181455625+gpenumetsa-msft@users.noreply.github.com>
@yeshsurya yeshsurya self-requested a review May 28, 2025 08:42
yeshsurya
yeshsurya previously approved these changes May 28, 2025
Copy link

@anoopkunchukuttan anoopkunchukuttan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In launch_grpo_command_job-med-mcqa-commented.ipynb, when describing the setup (data/model) Section 1- add some more details. Here is the snippet you can use:

The Azure Machine Learning (AML) **setup process is encapsulated** into a script that provisions all required resources in the workspace. \
By the end of the setup, the AML workspace will be fully configured with the below resources: 

- **Dataset** : [MedMCQA](https://medmcqa.github.io): A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. We use a modified version of the MedMCQA dataset, restricting our experiments to question/answer pairs having only a single correct answer. The modified dataset used in the demo can be found in `datasets/med_mcqa`
- **Model** : [Qwen2_5-7B-Instruct_base](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
- **Compute Cluster**: STANDARD_ND96ISR_H100_V5 cluster with at least 2 nodes 
- **Environment**: Is designed for GRPO specific large-scale, distributed training and inference of reasoning models using Azure Machine Learning, TRL, DeepSpeed, vLLM, and LoRA.

@sharvin2187
Copy link
Contributor Author

In launch_grpo_command_job-med-mcqa-commented.ipynb, when describing the setup (data/model) Section 1- add some more details. Here is the snippet you can use:

The Azure Machine Learning (AML) **setup process is encapsulated** into a script that provisions all required resources in the workspace. \
By the end of the setup, the AML workspace will be fully configured with the below resources: 

- **Dataset** : [MedMCQA](https://medmcqa.github.io): A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. We use a modified version of the MedMCQA dataset, restricting our experiments to question/answer pairs having only a single correct answer. The modified dataset used in the demo can be found in `datasets/med_mcqa`
- **Model** : [Qwen2_5-7B-Instruct_base](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
- **Compute Cluster**: STANDARD_ND96ISR_H100_V5 cluster with at least 2 nodes 
- **Environment**: Is designed for GRPO specific large-scale, distributed training and inference of reasoning models using Azure Machine Learning, TRL, DeepSpeed, vLLM, and LoRA.

Done.

@yeshsurya yeshsurya requested a review from babu-namburi May 28, 2025 08:54
Copy link
Contributor

@gpenumetsa-msft gpenumetsa-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and suggested changes. LGTM.

@yeshsurya yeshsurya requested a review from novaturient95 May 28, 2025 09:01
@sharvin2187 sharvin2187 merged commit d078b75 into main May 28, 2025
7 checks passed
@sharvin2187 sharvin2187 deleted the shjondhale/build-grpo branch May 28, 2025 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants