-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
Convergence is disrupted at chunk boundaries when using warm clones, causing inflection points and convergence graph reversals. This issue occurs consistently across different model types and quantization settings, but only affects warm-started configurations.
To Reproduce
Steps to reproduce the behavior:
- Configure a DPO notebook experiment with chunks=4
- Set up warm clone configurations (as shown in runs 3, 5, 6, 7 from the attached screenshot)
- Execute the training run
- Monitor convergence plots during training
- Observe inflection points and convergence reversals at each chunk boundary
Expected Behavior
Convergence should remain smooth and monotonic across chunk boundaries, similar to the behavior observed in initial configs and clone modify configs (non-warm-started configurations). The convergence graph should not show inflection points or reversals at chunk transitions.
Screenshots
Screenshots show:
- Configuration definitions for various runs (first screenshot)
- Comparison between affected warm clone runs (3, 5, 6, 7) and unaffected configurations for various metrics
Environment
- OS: Ubuntu
- Python version: 3.12
- RapidFire AI version: 0.12.6
- Browser (if applicable): Chrome
Additional Context
- Issue occurs with both quantized and non-quantized models
- Affects both RapidFire model and Mistral base model
- Problem is isolated to warm clone configurations only
- Initial configurations and clone modify configurations (non-warm-started) converge smoothly without this issue
- Chunk size used: 4
Error Logs
No error logs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working