-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tf2estimator on Pyspark support tensorboard #3959
Tf2estimator on Pyspark support tensorboard #3959
Conversation
if not fs.exists(remote_dir): | ||
fs.mkdir(remote_dir) | ||
cmd = 'hdfs dfs -put -f {}/* {}/'.format(local_dir, remote_dir) | ||
process = subprocess.Popen(cmd, shell=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to just use pyarrow or hadoop command instead of interleaving them together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using pyarrow to copy hdfs tree is a little complex. will change to use command all the ways.
put_local_dir_to_remote(os.path.dirname(replaced_checkpoint_path), | ||
original_checkpoint_dir) | ||
finally: | ||
shutil.rmtree(os.path.dirname(replaced_checkpoint_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If put_local_dir_to_remote
failed, maybe we should not delete replaced_checkpoint_path
otherwise user's training result will be lost.
How about printing a warning stating that there is an error and the checkpoint is located at xxx and then users will have a chance to get them manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Tf2estimator on Pyspark support tensorboard for fit and evaluate.