ONERB-5: prep first SFT Modal run#5
Conversation
|
Warning Review limit reachedYou’ve reached a temporary PR review limit under our Fair Usage Limits Policy. Next review available in: 47 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
PR Summary by QodoONERB-5: Add Modal SFT run-1 recipe, entrypoint, and smoke unit test
AI Description
Diagram
High-Level Assessment
Files changed (3)
|
Code Review by Qodo
1.
|
There was a problem hiding this comment.
Pull request overview
Prepares the codebase for the first SFT run on Modal by adding a Modal training entrypoint, a concrete run plan, and a small smoke test to validate NerDataset label alignment and batching behavior.
Changes:
- Add
scripts/modal_train.pyModal entrypoint supporting a one-step smoke run and a full training run with dataset caching to a Modal Volume. - Add
docs/plan/sft-run-1.mddocumenting commands, expected logs, dataset scale, and checkpoint layout for “SFT Run 1”. - Add
tests/test_train_smoke.pysmoke test covering dataset bucket indexing, label alignment, andner_collate_fnbatch shapes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tests/test_train_smoke.py | Adds a small fixture-backed smoke test for NerDataset/ner_collate_fn label alignment and batch shapes. |
| scripts/modal_train.py | Introduces Modal job entrypoint for smoke/full training, HF dataset materialization into a Modal Volume, and checkpoint/metadata writing. |
| docs/plan/sft-run-1.md | Documents the operational recipe for the first SFT run (commands, expected evidence, cost/throughput, checkpoints). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @app.function( | ||
| image=image, | ||
| gpu="H100", | ||
| secrets=[modal.Secret.from_name(HF_SECRET_NAME), modal.Secret.from_name(WANDB_SECRET_NAME)], | ||
| volumes={str(DATA_DIR): data_volume}, | ||
| timeout=2 * HOURS, | ||
| ) |
| @app.function( | ||
| image=image, | ||
| gpu="A100-80GB", | ||
| secrets=[modal.Secret.from_name(HF_SECRET_NAME), modal.Secret.from_name(WANDB_SECRET_NAME)], | ||
| volumes={str(DATA_DIR): data_volume}, | ||
| timeout=2 * HOURS, | ||
| ) |
| if SMOKE_TRAIN_PATH.exists() and SMOKE_VAL_PATH.exists(): | ||
| train_rows = sum(1 for _ in SMOKE_TRAIN_PATH.open("rb")) | ||
| val_rows = sum(1 for _ in SMOKE_VAL_PATH.open("rb")) | ||
| return SMOKE_TRAIN_PATH, SMOKE_VAL_PATH, { | ||
| "source": "modal_volume_cache", | ||
| "train_rows": train_rows, | ||
| "val_rows": val_rows, | ||
| } | ||
|
|
||
| from datasets import load_dataset | ||
|
|
||
| rows_needed = max(config.smoke_records, config.batch_size, 8) |
| if TRAIN_PATH.exists() and VAL_PATH.exists(): | ||
| return TRAIN_PATH, VAL_PATH, { | ||
| "source": "modal_volume_cache", | ||
| "train_rows": sum(1 for _ in TRAIN_PATH.open("rb")), | ||
| "val_rows": sum(1 for _ in VAL_PATH.open("rb")), | ||
| } |
| data_volume.commit() | ||
| return TRAIN_PATH, VAL_PATH, { | ||
| "source": config.dataset_id, | ||
| "train_rows": sum(1 for _ in TRAIN_PATH.open("rb")), | ||
| "val_rows": sum(1 for _ in VAL_PATH.open("rb")), | ||
| } |
| if SMOKE_TRAIN_PATH.exists() and SMOKE_VAL_PATH.exists(): | ||
| train_rows = sum(1 for _ in SMOKE_TRAIN_PATH.open("rb")) | ||
| val_rows = sum(1 for _ in SMOKE_VAL_PATH.open("rb")) | ||
| return SMOKE_TRAIN_PATH, SMOKE_VAL_PATH, { | ||
| "source": "modal_volume_cache", | ||
| "train_rows": train_rows, | ||
| "val_rows": val_rows, | ||
| } | ||
|
|
||
| from datasets import load_dataset | ||
|
|
||
| rows_needed = max(config.smoke_records, config.batch_size, 8) |
| if TRAIN_PATH.exists() and VAL_PATH.exists(): | ||
| return TRAIN_PATH, VAL_PATH, { | ||
| "source": "modal_volume_cache", | ||
| "train_rows": sum(1 for _ in TRAIN_PATH.open("rb")), | ||
| "val_rows": sum(1 for _ in VAL_PATH.open("rb")), | ||
| } |
| return TRAIN_PATH, VAL_PATH, { | ||
| "source": config.dataset_id, | ||
| "train_rows": sum(1 for _ in TRAIN_PATH.open("rb")), | ||
| "val_rows": sum(1 for _ in VAL_PATH.open("rb")), | ||
| } |
| "run_name": run_name, | ||
| "step": 1, | ||
| "checkpoint_path": str(step_dir), | ||
| "wandb_project": f"{config.wandb_entity}/{config.wandb_project}", |
|
ONERB-5 Modal smoke re-run after secrets became available did not reach the smoke step. Secrets visible by name under profile Exact command attempted: Run URL: https://modal.com/apps/oneiron-dev/main/ap-xQHJkO2LwetCfzdfrvj8oA Result: No Flag updated: |
|
ONERB-5 Modal smoke evidence (post-merge follow-up): the import blocker was fixed on main in 46d3c89 ( Run URL: https://modal.com/apps/oneiron-dev/main/ap-h07kyJ1GAAZ3dAa7efND5r W&B run |
No description provided.