Releases · OpenAdaptAI/openadapt-ml

04 Mar 04:33

abrichr

v0.14.1

d33cb05

v0.14.1 Latest

Latest

v0.14.1 (2026-03-04)

Bug Fixes

Lower PyTorch minimum to 2.8.0 for vLLM compatibility (#53, c0bc069)

vLLM 0.11.0 pins torch==2.8.0. The GPU E2E validation (openadapt-evals PR #87) confirmed the full ML stack works with PyTorch 2.8.0+cu128. The previous >=2.9.1 constraint prevented installing openadapt-ml alongside vLLM in the same environment.

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.14.0...v0.14.1

Assets 6

04 Mar 02:22

abrichr

v0.14.0

69412fe

v0.14.0

v0.14.0 (2026-03-04)

Features

Add dual training backend support (standalone + verl-agent) (#51, 4419b21)

feat: add dual training backend support (standalone + verl-agent)

Add backend field to GRPOConfig ("standalone" or "verl") to support switching between training backends:

standalone: existing trainer.py (single-GPU, episode-level rewards)
verl: verl-agent/VAGEN integration (multi-GPU, GiGPO per-step credit)

New verl_backend.py provides build_vagen_config() to map GRPOConfig to VAGEN-compatible config, and train_with_verl() as the integration point (placeholder until full end-to-end is wired up).

No existing function signatures or behavior modified.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

style: format verl_backend.py with ruff

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.13.0...v0.14.0

Assets 6

03 Mar 23:29

abrichr

v0.13.0

99ae082

v0.13.0

v0.13.0 (2026-03-03)

Features

Add docs sync trigger (#52, dff678a)

Detailed Changes: v0.12.0...v0.13.0

Assets 6

03 Mar 02:06

abrichr

v0.12.0

ee9cb79

v0.12.0

v0.12.0 (2026-03-03)

Features

Add GRPO training module with minimal TRL bridge (#34, 339e5d3)

docs: add experimental roadmap and evidence context to vision

Add 2x2 experimental matrix (retrieval × fine-tuning) to Core Thesis - Add evidence context to benchmark table: note it's an internal synthetic benchmark (~3 UI elements) that validates the pipeline, not real-world performance. Link to openadapt-evals for ongoing WAA/OSWorld evaluation.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

fix: use 46.7% consistently in 2x2 matrix

Was showing 33-47% range which conflated preliminary (n=3) and full (n=45) results. The validated number is 46.7%.

feat: add GRPO training module for online RL

Add openadapt_ml/training/grpo/ package with: - GRPOConfig for training hyperparameters - GRPORolloutCollector connecting to openadapt-evals RLEnvironment - GRPOTrainer implementing custom GRPO loop for multimodal VLMs - Binary reward function and group-relative advantage computation - Chain-of-thought warm-up pipeline for SFT pre-training - 20 unit tests passing without GPU

fix: address review findings in GRPO module

Replace copy.deepcopy(model) with LoRA state dict snapshot (prevents OOM) - Mark _compute_rollout_loss as scaffold with dummy forward pass for grad flow - Fix collect_rollout call to match RLEnvironment API (task_id in signature) - Add model.eval()/model.train() toggling around rollout/training phases - Remove unused gradient_accumulation_steps config field - Use actual screen_size from RLEnvironment instead of hardcoded 1920x1200 - Clamp CLICK coordinates to [0.0, 1.0] to prevent invalid pixel values - Validate task_ids non-empty at start of train() - Export CoT warmup functions from package init - Add BenchmarkAction fallback when openadapt-evals not installed - Add 9 new tests: action parser (8) + empty task_ids validation (1) - All 29 tests passing

feat: implement GRPO loss computation and fix cot_warmup dependency

Implement the core _compute_rollout_loss method that was previously a NotImplementedError scaffold. The implementation:

Reconstructs VLM prompts from rollout observations - Formats actions back to DSL text via new _format_action_as_text helper - Computes log-probabilities of action tokens under current policy - Computes reference policy log-probs via PEFT disable_adapter() with fallback to manual LoRA weight swapping - Returns GRPO loss: -advantage * log_prob + kl_coef * KL penalty

Also adds get_api_adapter() factory function to api_adapter.py, fixing the broken import in cot_warmup.py's generate_cot_annotations().

Additional review fixes from prior session: - Initialize _is_unsloth and _ref_lora_state in init - Remove dead else branch for task_id selection - Fix total_loss device placement - LoRA-only fallback save in checkpoint - TYPE regex accepts single quotes - Coordinate clamping in _parse_vlm_output_to_action

40 tests passing (10 new: 8 format_action + 1 roundtrip + 1 api_adapter).

refactor: deduplicate GRPO prompts via shared _build_agent_messages

Extract prompt construction into _build_agent_messages() which imports SYSTEM_PROMPT from next_action.py (the SFT training prompt). This ensures the GRPO agent uses the same prompt distribution the model was warm-started on, and guarantees _make_agent_fn and _compute_rollout_loss use identical prompts (critical for correct log-prob computation).

fix(grpo): address critical review findings in GRPO loss computation

C-01: Store raw model output on action._grpo_raw_text for accurate loss - C-02: Separate tokenization of prompt/action with concatenation to fix BPE boundary alignment - I-01: Prefer LoRA weight swapping over disable_adapter() for reference policy (captures initial LoRA state after SFT warm-start) - I-03: Per-step gradient accumulation via immediate backward() to prevent OOM from building computation graph over all rollout steps - I-04: Fix unescape order in TYPE parser (backslash before quotes) - M-03: Pass model_name through get_api_adapter to ApiVLMAdapter - M-07: Case-insensitive CLICK/TYPE regex in _parse_vlm_output_to_action - L-01: Extract DEFAULT_SCREEN_SIZE constant, replace all hardcoded values

fix(grpo): fix instruction propagation, screen size, weight swap safety

CR-01: Task instruction was never populated during GRPO rollouts. WAALiveAdapter._get_observation() does not populate raw_observation, so the agent prompt said "Goal: " with nothing after it. Fix: store instruction on Rollout dataclass (populated from env._current_task in collector), use it in both agent_fn and _compute_rollout_loss. - IM-01: Change DEFAULT_SCREEN_SIZE from 1920x1200 to 1920x1080 for consistency with baselines module and standard VM configurations. Add screen_size field to GRPOConfig so it is configurable. - IM-02: Add try/finally around LoRA weight swap in _compute_ref_log_probs. Without this, an exception during the reference forward pass permanently corrupts the model state.

fix(grpo): remove unused torch import in _setup_model

The import torch at line 121 was flagged by ruff (F401) as unused. The surrounding code only calls .detach().clone() on tensor objects, which does not require the torch module directly.

style(grpo): apply ruff formatting to GRPO module files

Run ruff format on cot_warmup.py, rollout_collector.py, and trainer.py to satisfy the CI ruff formatter check.

refactor(grpo): replace custom trainer with minimal TRL bridge

Replace 809-line custom GRPO trainer with ~280 lines that: - Use standard HuggingFace AutoModelForVision2Seq + AutoProcessor + PEFT LoraConfig instead of Unsloth monkey-patching - Implement standalone GRPO loss in ~15 lines of PyTorch (clipped surrogate) instead of custom policy gradient + KL penalty - Use beta=0.0 (no KL penalty, no reference model) per DAPO/Open- Reasoner-Zero literature, eliminating weight-swap complexity - Keep per-step backward to avoid OOM on long trajectories - Use standard model.save_pretrained() for checkpointing - Document WHY standalone GRPO math vs TRL GRPOTrainer (VLM multi-turn image pixel_values not stored in token IDs) and WHEN to switch

Preserves all public API: GRPOTrainer, _parse_vlm_output_to_action, _format_action_as_text, _build_agent_messages, DEFAULT_SCREEN_SIZE. All 50 tests pass (44 existing + 6 new for grpo_loss and trainer internals).

feat(grpo): add E2E tests with artifact generation and architecture docs

tests/test_grpo_e2e.py: 5 E2E tests (training loop, rollout collection, loss convergence, weight diff, mathematical properties) using tiny mock VLM. Produces 65+ artifacts (JSON traces, PNGs, checkpoints, summaries). - scripts/grpo_e2e_report.py: CLI report generator for test artifacts (text + optional HTML output). - docs/grpo_e2e_test_design.md: design rationale for E2E test approach - docs/grpo_architecture_analysis.md: analysis of custom vs TRL-based GRPO - docs/grpo_trl_rewrite_draft.py: TRL v0.29.0 integration research - docs/strategic_analysis_evals_ml_synergy.md: business/economics analysis

fix(grpo): address self-review findings (BUG-01, CLEAN-01 through -05)

Rename grpo_loss to policy_gradient_loss with honest docstring: single-epoch on-policy means ratio=1.0, clipping never fires, this is REINFORCE with group-relative advantages. Keep grpo_loss as backwards-compatible alias. - Add public aliases: parse_vlm_output_to_action, format_action_as_text (drop underscore prefix for public API) - Export policy_gradient_loss and public functions from init.py - Remove unused config fields: kl_coef (was 0.01 but never used with beta=0), max_seq_length (never referenced) - Fix model_name default: Qwen/Qwen2.5-VL-7B-Instruct (not unsloth variant) - Fix trivial test assertion: grad_norm > 0 (was >= 0, always true) - Update loss tests to verify gradient direction, not just loss sign - Add test_public_api_exports for new public names

56 tests pass (51 unit + 5 E2E).

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.11.2...v0.12.0

Assets 6

25 Feb 21:12

abrichr

v0.11.2

70a0c49

v0.11.2

v0.11.2 (2026-02-25)

Bug Fixes

docs: Require conventional commit format for PR titles (#32, 303f54f)

PR titles become squash merge commit messages. Without the fix:/feat: prefix, python-semantic-release skips the release. Document this requirement prominently in CLAUDE.md.

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

Documentation

Enforce branch protection rules (#30, afad981)

docs: add mandatory branch/PR rule to CLAUDE.md

Adds explicit instruction that all changes must go through feature branches and pull requests. enforce_admins has been enabled on GitHub to prevent admin bypass of branch protection.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

fix(modal): remove unused os import

Fixes ruff F401 lint error on modal_cloud.py.

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.11.1...v0.11.2

Assets 6

24 Feb 20:18

abrichr

v0.11.1

0c48a16

v0.11.1

v0.11.1 (2026-02-24)

Bug Fixes

modal: Fix inference container image and multi-modal message handling (88e4c09)
Pin transformers==4.57.3 (matches local, has Qwen3-VL support)
Add torchvision dependency (required by AutoVideoProcessor)
Add fallback: AutoModelForVision2Seq -> Qwen2_5_VLForConditionalGeneration
Add fallback: AutoProcessor -> Qwen2_5_VLProcessor
Reconstruct multi-modal messages with {"type": "image"} placeholders
for proper vision token generation in apply_chat_template
Rename container_idle_timeout -> scaledown_window (Modal API update)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.11.0...v0.11.1

Assets 6

24 Feb 19:16

abrichr

v0.11.0

e42ed75

v0.11.0

v0.11.0 (2026-02-24)

Features

modal: Add inference serving with call_inference API (57e5c5f)
Add _build_inference_app() for Modal GPU inference with PEFT adapter
Add upload_adapter_to_volume() for uploading adapters to Modal volume
Add call_inference() as the primary API for remote inference
Add 'serve' CLI command for interactive model serving
Container caches model in memory across calls (container_idle_timeout=600)
Support --no-adapter for zero-shot base model serving

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.10.1...v0.11.0

Assets 6

24 Feb 19:04

abrichr

v0.10.1

8185d25

v0.10.1

v0.10.1 (2026-02-24)

Bug Fixes

modal: Apply fixes from first successful Modal training run (120c903)
Add serialized=True to @app.function for non-global-scope support
Auto-create volume before upload, add --force for overwrites
Fix variable scoping (vol = training_volume) inside remote function
Add openadapt-ml[training] to container image dependencies
Use --jsonl flag in train subprocess for correct data path
Add modal to project dependencies
Update test to verify create+put two-call pattern

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.10.0...v0.10.1

Assets 6

24 Feb 18:35

abrichr

v0.10.0

25481ea

v0.10.0

v0.10.0 (2026-02-24)

Features

cloud: Add Vast.ai and Modal GPU providers (5812f89)

Vast.ai (~$0.17/hr A10): SSH+rsync marketplace model with full CLI (list, launch, terminate, train) matching lambda_labs.py pattern. Includes GPU search, --gpu-wait retry, auto-convert --demo-dir flow.

Modal ($30/mo free, $1.10/hr A10G): Python-native cloud with zero-ops training via decorated functions and Modal Volumes for data transfer. CLI: train, status, download, list-volumes.

Both support the same --demo-dir end-to-end pipeline as Lambda Labs.

53 new tests (34 Vast.ai + 19 Modal), all passing.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.9.0...v0.10.0

Assets 6

24 Feb 17:13

abrichr

v0.9.0

a5487bb

v0.9.0

v0.9.0 (2026-02-24)

Features

train: Add end-to-end pipeline automation with --demo-dir flag (b874018)

Add prepare_bundle() and generate_screenshot_mapping() to convert_demos.py for single-call demo conversion. Extend both train.py and lambda_labs.py train commands with --demo-dir, --captures-dir, --mapping flags so the full pipeline (mapping → conversion → bundle → upload → train) runs as one command. Add --gpu-wait for Lambda GPU availability retry loop.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Detailed Changes: v0.8.0...v0.9.0

Assets 6

Releases: OpenAdaptAI/openadapt-ml

v0.14.1

v0.14.1 (2026-03-04)

Bug Fixes

Uh oh!

v0.14.0

v0.14.0 (2026-03-04)

Features

Uh oh!

v0.13.0

v0.13.0 (2026-03-03)

Features

Uh oh!

v0.12.0

v0.12.0 (2026-03-03)

Features

Uh oh!

v0.11.2

v0.11.2 (2026-02-25)

Bug Fixes

Documentation

Uh oh!

v0.11.1

v0.11.1 (2026-02-24)

Bug Fixes

Uh oh!

v0.11.0

v0.11.0 (2026-02-24)

Features

Uh oh!

v0.10.1

v0.10.1 (2026-02-24)

Bug Fixes

Uh oh!

v0.10.0

v0.10.0 (2026-02-24)

Features

Uh oh!

v0.9.0

v0.9.0 (2026-02-24)

Features

Uh oh!