Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batchspawner spawning / keep-alive is instable #174

Closed
Hoeze opened this issue Mar 29, 2020 · 6 comments
Closed

Batchspawner spawning / keep-alive is instable #174

Hoeze opened this issue Mar 29, 2020 · 6 comments

Comments

@Hoeze
Copy link
Contributor

Hoeze commented Mar 29, 2020

Hi, I often end up in the following situation:

  • JupyterHub spawns new worker
  • Worker starts:
+ which jupyterhub-singleuser
/opt/anaconda/3-2019.10/bin/jupyterhub-singleuser
+ batchspawner-singleuser jupyterhub-singleuser --ip=0.0.0.0 --NotebookApp.default_url=/lab
[I 2020-03-29 18:18:43.199 SingleUserNotebookApp manager:48] [nb_conda_kernels] enabled, 84 kernels found
[I 2020-03-29 18:18:44.331 SingleUserNotebookApp extension:157] JupyterLab extension loaded from /opt/anaconda/3-2019.10/lib/python3.7/site-packages/jupyterlab
[I 2020-03-29 18:18:44.331 SingleUserNotebookApp extension:158] JupyterLab application directory is /opt/anaconda/3-2019.10/share/jupyter/lab
[I 2020-03-29 18:18:44.784 SingleUserNotebookApp __init__:31] [Jupytext Server Extension] Deriving a JupytextContentsManager from LargeFileManager
[I 2020-03-29 18:18:44.788 SingleUserNotebookApp singleuser:561] Starting jupyterhub-singleuser server version 1.1.0
[I 2020-03-29 18:18:44.800 SingleUserNotebookApp notebookapp:1924] Serving notebooks from local directory: /data/nasif12/home_if12/the_user
[I 2020-03-29 18:18:44.800 SingleUserNotebookApp notebookapp:1924] The Jupyter Notebook is running at:
[I 2020-03-29 18:18:44.800 SingleUserNotebookApp notebookapp:1924] http://lab-desk12:38375/jupyter/user/the_user/
[I 2020-03-29 18:18:44.800 SingleUserNotebookApp notebookapp:1925] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2020-03-29 18:18:44.820 SingleUserNotebookApp singleuser:542] Updating Hub with activity every 300 seconds

Also, after ~ 24h the jupyterlab-singleuser worker looses connection to the JupyterHub.
However, depending on the configuration I would like to have the jupyter instances running for more than one day.

Does somebody know why the connection between JupyterHub and the worker is so instable?
Could jupyterhub/jupyterhub#2727 be the solution?

@rkdarst
Copy link
Contributor

rkdarst commented Mar 29, 2020 via email

@Hoeze
Copy link
Contributor Author

Hoeze commented Mar 30, 2020

Thank you for your answer @rkdarst.
At some point, the singleuser notebook fails to update the jupyterhub:

[E 2020-03-30 06:33:48.787 SingleUserNotebookApp singleuser:523] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/opt/anaconda/3-2019.10/lib/python3.7/site-packages/jupyterhub/singleuser.py", line 521, in notify
        await client.fetch(req)
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2020-03-30 06:33:49.025 SingleUserNotebookApp singleuser:523] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/opt/anaconda/3-2019.10/lib/python3.7/site-packages/jupyterhub/singleuser.py", line 521, in notify
        await client.fetch(req)
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2020-03-30 06:33:49.242 SingleUserNotebookApp singleuser:523] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/opt/anaconda/3-2019.10/lib/python3.7/site-packages/jupyterhub/singleuser.py", line 521, in notify
        await client.fetch(req)
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2020-03-30 06:46:51.369 SingleUserNotebookApp singleuser:523] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/opt/anaconda/3-2019.10/lib/python3.7/site-packages/jupyterhub/singleuser.py", line 521, in notify
        await client.fetch(req)
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2020-03-30 06:46:59.569 SingleUserNotebookApp singleuser:523] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/opt/anaconda/3-2019.10/lib/python3.7/site-packages/jupyterhub/singleuser.py", line 521, in notify
        await client.fetch(req)
    tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
[E 2020-03-30 06:46:59.569 SingleUserNotebookApp singleuser:548] Error notifying Hub of activity
    Traceback (most recent call last):
      File "/opt/anaconda/3-2019.10/lib/python3.7/site-packages/jupyterhub/singleuser.py", line 546, in keep_activity_updated
        await self.notify_activity()
      File "/opt/anaconda/3-2019.10/lib/python3.7/site-packages/jupyterhub/singleuser.py", line 533, in notify_activity
        timeout=60,
      File "/opt/anaconda/3-2019.10/lib/python3.7/site-packages/jupyterhub/utils.py", line 177, in exponential_backoff
        raise TimeoutError(fail_message)
    TimeoutError: Failed to notify Hub of activity

Jupyterhub logs around the time of failure:

Mar 30 06:32:34 cn02 run_jupyterhub[34349]: 06:32:34.331 [ConfigProxy] ESC[32minfoESC[39m: 200 GET /api/routes
Mar 30 06:32:34 cn02 run_jupyterhub[34349]: [I 2020-03-30 06:32:34.333 JupyterHub proxy:320] Checking routes
Mar 30 06:33:04 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:04.421 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 5.19ms
Mar 30 06:33:05 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:05.392 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 3.79ms
Mar 30 06:33:06 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:06.063 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 3.67ms
Mar 30 06:33:07 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:07.550 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 3.65ms
Mar 30 06:33:13 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:13.540 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 4.07ms
Mar 30 06:33:18 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:18.016 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 3.60ms
Mar 30 06:33:19 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:19.193 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 3.63ms
Mar 30 06:33:21 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:21.782 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 3.69ms
Mar 30 06:33:36 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:36.793 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 3.60ms
Mar 30 06:33:48 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:48.786 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.82ms
Mar 30 06:33:49 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:49.024 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.44ms
Mar 30 06:33:49 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:49.241 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.48ms
Mar 30 06:33:49 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:49.319 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.50ms
Mar 30 06:33:50 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:50.814 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.35ms
Mar 30 06:33:51 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:51.806 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 3.66ms
Mar 30 06:33:52 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:52.730 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.55ms
Mar 30 06:33:59 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:59.179 JupyterHub log:174] 403 POST /jupyter/hub/api/users/kp/activity (@192.168.16.12) 3.92ms
Mar 30 06:33:59 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:33:59.649 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.60ms
Mar 30 06:34:14 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:34:14.658 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.45ms
Mar 30 06:34:29 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:34:29.667 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.97ms
Mar 30 06:34:43 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:34:43.944 JupyterHub log:174] 403 POST /jupyter/hub/api/users/hi/activity (@192.168.20.12) 3.84ms
Mar 30 06:37:19 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:37:19.858 JupyterHub log:174] 403 POST /jupyter/hub/api/users/n/activity (@192.168.16.12) 4.11ms
Mar 30 06:37:20 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:37:20.317 JupyterHub log:174] 403 POST /jupyter/hub/api/users/n/activity (@192.168.16.12) 4.29ms
Mar 30 06:37:20 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:37:20.748 JupyterHub log:174] 403 POST /jupyter/hub/api/users/n/activity (@192.168.16.12) 4.01ms
Mar 30 06:37:21 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:37:21.339 JupyterHub log:174] 403 POST /jupyter/hub/api/users/n/activity (@192.168.16.12) 3.84ms
Mar 30 06:37:27 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:37:27.973 JupyterHub log:174] 403 POST /jupyter/hub/api/users/n/activity (@192.168.16.12) 3.61ms
Mar 30 06:37:28 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:37:28.225 JupyterHub log:174] 403 POST /jupyter/hub/api/users/n/activity (@192.168.16.12) 3.68ms

@Hoeze
Copy link
Contributor Author

Hoeze commented Mar 30, 2020

Ah, I think this might be caused by #171:

Mar 30 06:28:37 cn02 run_jupyterhub[34349]: [E 2020-03-30 06:28:37.036 JupyterHub batchspawner:225] Subprocess returned exitcode 1
Mar 30 06:28:37 cn02 run_jupyterhub[34349]: [E 2020-03-30 06:28:37.036 JupyterHub batchspawner:226] slurm_load_jobs error: Unable to contact slurm controller (connect failure)
Mar 30 06:28:37 cn02 run_jupyterhub[34349]: [E 2020-03-30 06:28:37.036 JupyterHub batchspawner:279] Error querying job 301047
Mar 30 06:28:37 cn02 run_jupyterhub[34349]: [W 2020-03-30 06:28:37.037 JupyterHub base:1012] User <me> server stopped, with exit code: 1
Mar 30 06:28:37 cn02 run_jupyterhub[34349]: [I 2020-03-30 06:28:37.037 JupyterHub proxy:282] Removing user <me> from proxy (/jupyter/user/<me>/)

@rkdarst
Copy link
Contributor

rkdarst commented Jul 23, 2020

This will be fixed by #179.

@linhbngo
Copy link

linhbngo commented Oct 9, 2020

@Hoeze We ran into the same issue at our cluster (OpenPBS). Since PR #187 has not been merged yet, we side-stepped by set the JupyterHub Spawner's poll_interval configuration to 1800 seconds (default is 30 seconds), and it seemed to help.

@consideRatio
Copy link
Member

Closed by #187

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants