-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition when prepare pretrained model in distributed training #44
Comments
My current workaround is to set the env var |
I see, thanks for the feedback. I will find a way to make that better in the next release. Not sure we need to store the model gzipped anyway since they mostly contains a torch dump which is already compressed. |
Ok, I've added a |
Thanks for fixing this. Since the way I use this repo is to add ./pytorch_pretrained_bert in PYTHONPATH, so I think directly add the following import in
which is included in my PR: #58 |
combine multitask training + adversarial training for mat-roberta-qa
Hi,
I launched two processes per node to run distributed run_classifier.py. However, I am occasionally get below error:
It looks like a race-condition that two processes are simultaneously writing model file to
/root/.pytorch_pretrained_bert/
.Please help to advice any workaround. Thanks!
The text was updated successfully, but these errors were encountered: