mnist: avoid concurrent MNIST downloads under mp.spawn#1411
mnist: avoid concurrent MNIST downloads under mp.spawn#1411lordaarush wants to merge 1 commit intopytorch:mainfrom
Conversation
✅ Deploy Preview for pytorch-examples-preview canceled.
|
|
Hi @lordaarush! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
This example downloads MNIST inside mp.spawn worker processes.
On a fresh run, multiple processes may attempt to download and
verify the dataset concurrently, which can lead to nondeterministic
checksum failures (e.g. "File not found or corrupted").
This PR downloads MNIST once in the main process before spawning
workers, and disables downloading inside workers. This avoids
the race condition while keeping behavior unchanged for users
who already have the dataset.
Fixes #1403