Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verbose Checkpoint ignored warning #3720

Open
yegortokmakov opened this issue Jun 9, 2020 · 4 comments
Open

Verbose Checkpoint ignored warning #3720

yegortokmakov opened this issue Jun 9, 2020 · 4 comments

Comments

@yegortokmakov
Copy link

Environment information

Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version a511de7ece215f0cfb622f2672563beee93515a9

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=7, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Darwin', nodename='YYY.XXX.com', release='18.7.0', version='Darwin Kernel Version 18.7.0: Mon Feb 10 21:08:45 PST 2020; root:xnu-4903.278.28~1/RELEASE_X86_64', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: '/Users/yegor/.local/share/virtualenvs/test-XHT2gPb6'

--- check: installed_packages
INFO: installed: tensorboard==2.2.2
INFO: installed: tensorflow==2.2.0
INFO: installed: tensorflow-estimator==2.2.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.2.2'

--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.2.0'
INFO: tensorflow.__git_version__: 'v2.2.0-rc4-8-g2b96f3662b'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/Users/yegor/.local/share/virtualenvs/test-XHT2gPb6/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 1024>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 1024>
Loopback infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0)), (<AddressFamily.AF_INET6: 30>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET6: 30>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'edge-bw-101.e-lax3.amazon.com'

--- check: stat_tensorboardinfo
INFO: directory: /var/folders/z4/sq18l1dx31s3msst1f8t37jnwl6w52/T/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=28189431, st_dev=16777220, st_nlink=2, st_uid=2033414306, st_gid=1896053708, st_size=64, st_atime=1591690635, st_mtime=1591696619, st_ctime=1591696619)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/Users/yegor/.local/share/virtualenvs/test-XHT2gPb6/lib/python3.7/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.9.0
astunparse==1.6.3
cachetools==4.1.0
certifi==2020.4.5.2
chardet==3.0.4
gast==0.3.3
google-auth==1.16.1
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.29.0
gviz-api==1.9.0
h5py==2.10.0
idna==2.9
importlib-metadata==1.6.1
Keras-Preprocessing==1.1.2
Markdown==3.2.2
numpy==1.18.5
oauthlib==3.1.0
opt-einsum==3.2.1
pip==20.1
protobuf==3.12.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
scipy==1.4.1
setuptools==46.1.3
six==1.15.0
tensorboard==2.2.2
tensorboard-plugin-profile==2.2.0
tensorboard-plugin-wit==1.6.0.post3
tensorflow==2.2.0
tensorflow-estimator==2.2.0
termcolor==1.1.0
urllib3==1.25.9
Werkzeug==1.0.1
wheel==0.34.2
wrapt==1.12.1
zipp==3.1.0

Issue description

The latest TB version generates a lot of log entries like this:

WARNING:tensorflow:FailedPreconditionError: AWS Credentials have not been set properly. Unable to access the specified S3 location
W0609 11:56:56.412047 123145556799488 checkpoint_management.py:295] FailedPreconditionError: AWS Credentials have not been set properly. Unable to access the specified S3 location
WARNING:tensorflow:s3://sagemaker-eu-west-1-XXX/logs/job_name/validation/../checkpoint: Checkpoint ignored

I do not write any checkpoints in my training script and the file above doesn't exist. Everything I think I want to see in TB is there, so I'm pretty sure AWS Credentials are set properly.

  1. Am I missing anything any TB feature by not writing checkpoints? I don't intend to use Projector plugin.
  2. How is TB coming up with this checkpoint path? I haven't specified it anywhere in the configuration. Or is it referenced somewhere in logs?
  3. Is there a recommended way to structure model artifacts, logs and checkpoints that TB relies on?
  4. Is it possible to make this error messages more descriptive?
@yegortokmakov yegortokmakov changed the title Verbose Verbose Checkpoint ignored warning Jun 9, 2020
@bileschi
Copy link
Collaborator

bileschi commented Jun 9, 2020

Hi @yegortokmakov , this error seems to be stemming from within TensorFlow. Can you share the command you used to run TensorBoard? Thanks :)

@yegortokmakov
Copy link
Author

hi @bileschi thanks for the reply!

This is the command: AWS_REGION={aws_region} tensorboard --logdir s3://{tensorflow_logs_path}

The logs do come from Tensorflow, but I could find references to "checkpoitns" only in projector plugin.

P.S. All code is available here: aws/amazon-sagemaker-examples#1267

@bileschi
Copy link
Collaborator

bileschi commented Jun 9, 2020

Thanks, will investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants