GRPO comprehensive review: tracking issue

## GRPO Comprehensive Review — Tracking Issue

This issue tracks all findings from the GRPO code review across both `openadapt-ml` and `openadapt-evals`.

### Critical — Blocks production training

- [ ] #35 — **Replace custom training loop with TRL GRPOTrainer** (openadapt-ml)
- [ ] https://github.com/OpenAdaptAI/openadapt-evals/issues/76 — **Support parallel rollout collection via VM pool** (openadapt-evals)
- [ ] https://github.com/OpenAdaptAI/openadapt-evals/issues/77 — **Populate observation with task instruction** (openadapt-evals)

### Important — Required for external integration / RL training use case

- [ ] #36 — **Add pluggable action format (DSL vs JSON)** (openadapt-ml)
- [ ] #37 — **Extend action types (scroll, key, noop)** (openadapt-ml)
- [ ] #38 — **Add pluggable reward functions / verifier registry** (openadapt-ml)
- [ ] #39 — **Add training loop tests with mocked environment** (openadapt-ml)

### Medium — Reliability and performance

- [ ] #40 — **Add checkpoint resume** (openadapt-ml)
- [ ] #41 — **Use PEFT named adapters for reference policy** (openadapt-ml)
- [ ] https://github.com/OpenAdaptAI/openadapt-evals/issues/78 — **Add health check and error recovery for rollout collection** (openadapt-evals)

### Batch 2 — Prompt, data structure, VLM, and correctness fixes

- [ ] #43 — **Align prompt format across SFT, GRPO, and CoT warmup** (openadapt-ml, bug)
- [ ] #44 — **Replace _grpo_raw_text monkey-patch with proper data structure** (openadapt-ml, refactor)
- [ ] #45 — **VLM processor metadata misalignment after token concatenation** (openadapt-ml, bug)
- [ ] #46 — **Inner tokenizer extraction bypasses processor preprocessing** (openadapt-ml, bug)
- [ ] #47 — **Use JSONL append-only format for training log** (openadapt-ml, performance)
- [ ] #48 — **Add gradient accumulation across multiple groups** (openadapt-ml, enhancement)
- [ ] #49 — **Coordinate precision loss in fraction-pixel roundtrip** (openadapt-ml, bug)
- [ ] #50 — **Minor issues: regex whitespace, CoT adapter reuse, shutil import** (openadapt-ml, bug)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO comprehensive review: tracking issue #42

GRPO Comprehensive Review — Tracking Issue

Critical — Blocks production training

Important — Required for external integration / RL training use case

Medium — Reliability and performance

Batch 2 — Prompt, data structure, VLM, and correctness fixes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPO comprehensive review: tracking issue #42

Description

GRPO Comprehensive Review — Tracking Issue

Critical — Blocks production training

Important — Required for external integration / RL training use case

Medium — Reliability and performance

Batch 2 — Prompt, data structure, VLM, and correctness fixes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions