-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orca: can't save tensorflow model on yarn cluster #7411
Comments
This seems needed to be fixed. Have we tested this before? @sgwhat |
of course, it has been tested before, need to be reproduced. |
|
Thanks. Yes, the problem has been solved by using |
To do (for myself): add this issue to orca known issue. |
But as when running in the cluster, users will most likely to save to remote storage, this is not high priority. |
|
#7623 |
Problem
When I want to save tensorflow model(which is a directory) to remote hdfs directory on yarn cluster(cluster_mode="yarn-client") using
estimator.save(remote_model_path)
, an error occurs showingmkdir: permission denied
.So I use
TemporaryDirectory
to save it to the local temporary directory and then put it to remote directory. But I still can't find the model saved in the temporary directory.However, I can save it successfully locally on my laptop using
estimator.save(os.path.join(model_dir, model_name))
withcluster_mode="local"
.est.save(remote_model_dir)
directlycode:
error message:
TemporaryDirectory
and then put it toremote_model_dir
code:
error message:
The error occurs in the line
put_local_dir_tree_to_remote(local_dir, remote_dir)
because it can't find the model directory saved afterestimator.save(local_dir)
.The text was updated successfully, but these errors were encountered: