We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wandb used to work fine, but now there is some issue in the initialization.
Workaround for now: disable wandb
2023-02-17 22:39:17,767 (worker_0) : training ... 2023-02-17 22:39:24,279 (worker_7) : wandb: W&B API key is configured. Use `wandb login --relogin` to force relogin 2023-02-17 22:39:25,325 (worker_7) : wandb: - Waiting for wandb.init()... 2023-02-17 22:39:26,326 (worker_7) : wandb: \ Waiting for wandb.init()... [more of the same log] 2023-02-17 22:40:24,365 (worker_7) : init_wandb 2023-02-17 22:40:24,365 (worker_7) : wandb: ERROR Error communicating with wandb process wandb: ERROR For more info see: https://docs.wandb.ai/library/init#init-start-error 2023-02-17 22:40:24,367 (worker_7) : Traceback (most recent call last): 2023-02-17 22:40:24,367 (worker_7) : File "/app/toolkit_infiniband_example/run.py", line 34, in run 2023-02-17 22:40:24,367 (worker_7) : runnable.run() 2023-02-17 22:40:24,367 (worker_7) : File "/app/toolkit_infiniband_example/worker.py", line 89, in run 2023-02-17 22:40:24,367 (worker_7) : self._model.train() 2023-02-17 22:40:24,367 (worker_7) : File "/app/toolkit_infiniband_example/models/megatron_gpt.py", line 65, in train 2023-02-17 22:40:24,368 (worker_7) : pretrain(train_valid_test_datasets_provider, model_provider, 2023-02-17 22:40:24,368 (worker_7) : File "/app/megatron/training.py", line 155, in pretrain 2023-02-17 22:40:24,368 (worker_7) : iteration = train(forward_step_func, 2023-02-17 22:40:24,368 (worker_7) : File "/app/megatron/training.py", line 685, in train 2023-02-17 22:40:24,368 (worker_7) : init_wandb() 2023-02-17 22:40:24,368 (worker_7) : File "/app/megatron/initialize.py", line 244, in init_wandb 2023-02-17 22:40:24,368 (worker_7) : wandb.init( 2023-02-17 22:40:24,368 (worker_7) : File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1078, in init 2023-02-17 22:40:24,368 (worker_7) : run = wi.init() 2023-02-17 22:40:24,368 (worker_7) : File "/opt/conda/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 719, in init 2023-02-17 22:40:24,368 (worker_7) : raise UsageError(error_message) 2023-02-17 22:40:24,368 (worker_7) : wandb.errors.UsageError: Error communicating with wandb process 2023-02-17 22:40:24,368 (worker_7) : For more info see: https://docs.wandb.ai/library/init#init-start-error wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. 2023-02-17 22:42:33,930 (worker_7) : Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd1940af970>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/5288891/store/
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Wandb used to work fine, but now there is some issue in the initialization.
Workaround for now: disable wandb
The text was updated successfully, but these errors were encountered: