Skip to content

Conversation

@farukalpay
Copy link
Owner

Summary

  • add ddp_train.py example demonstrating deterministic DDP training on CIFAR-10
  • implement SHA-256 based subseed generation with Philox generators for unique RNG streams per rank/epoch/worker
  • print per-rank first batch labels and initial conv weights, plus final model_init subseed and digest

Testing

  • pre-commit run --files examples/ddp_train.py (hooks skipped: no files to check)
  • torchrun --standalone --nnodes=1 --nproc-per-node=2 examples/ddp_train.py --master-seed 314159 --run-id R42 --epochs 1 --batch-size 16

https://chatgpt.com/codex/tasks/task_e_68ac7ce9b2b88323b2a9f5ba870b9768

@farukalpay farukalpay merged commit 9841f5b into main Aug 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant