Description
Bug description
batchspawner-singleuser starts correctly on the compute node and communicates back to the hub to indicate this. JupyterHub shows:
Server ready at [/user/username/]
and redirects to /user/username/lab?
but this then shows a
503 : Service Unavailable
Your server appears to be down. Try restarting it [from the hub]
to the user.
There are some complexities with the networking (described below) that are likely related to the problem I'm seeing and it could also be a configuration issue. However given I'm seeing an exception I thought it best to report it as a bug.
Expected behaviour
The user is presented the JupyterHub session from the compute node.
Actual behaviour
The user is presented with a 503 error. The singleuser process on the compute node generates an uncaught exception traceback (see logs) but continues to run until the job is cancelled.
How to reproduce
- Login to JupyterHub.
- Select a job profile and click start.
- Job waits in queue, JupyterHub shows job status to user.
- Job starts, JupyterHub shows "waiting to connect"
- JupyterHub shows "Server ready...."
- Error shown
Your personal set up
-
JupyterHub has been manually installed on the server using miniconda with packages installed from condaforge and pypi.
-
Authentication to JupyterHub uses SAML Authenticator from https://github.com/ImperialCollegeLondon/jupyter_saml2authenticator.
-
Cluster is running PBSPro.
-
Networking is dual stack (ipv4/ipv6) with some complexities:
- User access to the Hub is over ipv4 and ipv6.
- Hub connectivity to PBSPro is over ipv4.
- Communication between the hub and the compute nodes needs to be over ipv6 as only this is routable between the JupyterHub server and compute nodes. However DNS has both ipv4 and ipv6 entries for the compute nodes and when JupyterHub was resolving the IP address of the compute nodes from exechosts regular expression match, it was resolving the ipv4 address. To work around this I wrote a wrapper script around
qstat
to replace the hostname (in the output from qstat) with the ipv6 address; there is probably a better way of doing this but it seems to work up until I reach the problem being reported.
-
OS: JupyterHub is running on CentOS 7.9, the compute nodes are running RHEL 8.5
-
Versions:
- Python: python 3.9.7
- JupyterHub: jupyterhub 2.2.2
- BatchSpawner: batchspawner 1.1.0
- wrapspawner 1.0.1
- jupyter-saml2authenticator 0.1.0.dev0
Full environment
# packages in environment at /opt/jupyterhub/miniconda/2022-04-10:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
alembic 1.7.7 pyhd8ed1ab_0 conda-forge
async_generator 1.10 py_0 conda-forge
attrs 21.4.0 pyhd8ed1ab_0 conda-forge
batchspawner 1.1.0 pypi_0 pypi
blinker 1.4 py_1 conda-forge
brotlipy 0.7.0 py39h27cfd23_1003
c-ares 1.18.1 h7f8727e_0
ca-certificates 2021.10.8 ha878542_0 conda-forge
certifi 2021.10.8 py39hf3d152e_2 conda-forge
certipy 0.1.3 py_0 conda-forge
cffi 1.15.0 py39hd667e15_1
charset-normalizer 2.0.4 pyhd3eb1b0_0
conda 4.12.0 py39hf3d152e_0 conda-forge
conda-content-trust 0.1.1 pyhd3eb1b0_0
conda-package-handling 1.7.3 py39h27cfd23_1
configurable-http-proxy 4.0.1 node14_0
cryptography 36.0.0 py39h9ce1e76_0
defusedxml 0.7.1 pypi_0 pypi
elementpath 2.5.0 pypi_0 pypi
entrypoints 0.4 pyhd8ed1ab_0 conda-forge
icu 68.1 h2531618_0
idna 3.3 pyhd3eb1b0_0
importlib-metadata 4.11.3 py39hf3d152e_1 conda-forge
importlib_resources 5.6.0 pyhd8ed1ab_0 conda-forge
jinja2 3.1.1 pyhd8ed1ab_0 conda-forge
jsonschema 4.4.0 pyhd8ed1ab_0 conda-forge
jupyter-saml2authenticator 0.1.0.dev0 pypi_0 pypi
jupyter_telemetry 0.1.0 pyhd8ed1ab_1 conda-forge
jupyterhub 2.2.2 pyhd8ed1ab_1 conda-forge
jupyterhub-base 2.2.2 pyhd8ed1ab_1 conda-forge
krb5 1.19.2 hac12032_0
ld_impl_linux-64 2.35.1 h7274673_9
libcurl 7.80.0 h0b77cf5_0
libedit 3.1.20210910 h7f8727e_0
libev 4.33 h516909a_1 conda-forge
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgomp 9.3.0 h5101ec6_17
libnghttp2 1.46.0 hce63b2e_0
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.3.0 hd4cf53a_17
libuv 1.40.0 h7b6447c_0
mako 1.2.0 pyhd8ed1ab_1 conda-forge
markupsafe 2.0.1 py39h3811e60_0 conda-forge
ncurses 6.3 h7f8727e_2
nodejs 14.8.0 hda19d22_0
oauthlib 3.2.0 pyhd8ed1ab_0 conda-forge
openssl 1.1.1n h7f8727e_0
packaging 21.3 pyhd8ed1ab_0 conda-forge
pamela 1.0.0 py_0 conda-forge
pip 21.2.4 py39h06a4308_0
prometheus_client 0.14.1 pyhd8ed1ab_0 conda-forge
psutil 5.8.0 py39h27cfd23_1
pycosat 0.6.3 py39h27cfd23_0
pycparser 2.21 pyhd3eb1b0_0
pycurl 7.44.1 py39h8f2d780_1
pyjwt 2.3.0 pyhd8ed1ab_1 conda-forge
pyopenssl 21.0.0 pyhd3eb1b0_1
pyparsing 3.0.7 pyhd8ed1ab_0 conda-forge
pyrsistent 0.18.0 py39heee7806_0
pysaml2 7.1.2 pypi_0 pypi
pysocks 1.7.1 py39h06a4308_0
python 3.9.7 h12debd9_1
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-json-logger 2.0.1 pyh9f0ad1d_0 conda-forge
python_abi 3.9 2_cp39 conda-forge
pytz 2022.1 pypi_0 pypi
readline 8.1.2 h7f8727e_1
requests 2.27.1 pyhd3eb1b0_0
ruamel.yaml 0.16.12 py39h27cfd23_1
ruamel.yaml.clib 0.2.6 py39h7f8727e_0
ruamel_yaml 0.15.100 py39h27cfd23_0
setuptools 58.0.4 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_0
sqlalchemy 1.3.23 py39h3811e60_0 conda-forge
sqlite 3.37.0 hc218d9a_0
tk 8.6.11 h1ccaba5_0
tornado 6.1 py39h3811e60_1 conda-forge
tqdm 4.62.3 pyhd3eb1b0_1
traitlets 5.1.1 pyhd8ed1ab_0 conda-forge
tzdata 2021e hda174b7_0
urllib3 1.26.7 pyhd3eb1b0_0
wheel 0.37.1 pyhd3eb1b0_0
wrapspawner 1.0.1 pypi_0 pypi
xmlschema 1.10.0 pypi_0 pypi
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
zipp 3.8.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.11 h7f8727e_4
Configuration
c.Application.log_level = 'DEBUG'
c.JupyterHub.cookie_secret_file = '/var/jupyterhub/jupyterhub_cookie_secret'
c.JupyterHub.data_files_path = '/opt/jupyterhub/miniconda/2022-04-10/share/jupyterhub'
c.JupyterHub.db_url = 'sqlite:////var/jupyterhub/jupyterhub.sqlite'
c.ConfigurableHTTPProxy.debug = True
c.JupyterHub.hub_connect_ip = 'hub:ipv6:address'
c.JupyterHub.hub_ip = 'hub:ipv6:address'
c.JupyterHub.ip = ''
c.JupyterHub.log_level = 'DEBUG'
c.ConfigurableHTTPProxy.pid_file = '/var/jupyterhub/jupyterhub-proxy.pid'
c.Spawner.debug = True
c = get_config()
c.JupyterHub.spawner_class = "wrapspawner.ProfilesSpawner"
c.Spawner.http_timeout = 6000
c.Spawner.start_timeout = 6000
import batchspawner
import batchspawner.api
c.PBSSpawner.batch_script = ''' #!/bin/sh
#PBS -l walltime={runtime}
#PBS -l select=1:ncpus={nprocs}:mem={memory}:ngpus={ngpus}:mpiprocs=1:ompthreads={nprocs}
#PBS -N jupyterhub
#PBS -v {keepvars}
#PBS -q {queue}
export RELEASE=2022-04-10
export PATH=/apps/jupyterhub/$RELEASE/bin:/bin:/usr/bin:/sbin:/usr/sbin:/opt/pbs/default/bin:/usr/local/bin
printenv
set -x
echo "{cmd}" > $HOME/.jupyterhub-lab.stdout 2>&1
cd $HOME
{cmd} --ip="::"
echo $?
'''
c.BatchSpawnerBase.req_nprocs = '4'
c.BatchSpawnerBase.req_runtime = '25:0:0'
c.BatchSpawnerBase.req_memory = '4gb'
c.BatchSpawnerBase.req_ngpus = '0'
c.BatchSpawnerBase.req_queue = 'interactive'
c.BatchSpawnerBase.batch_submit_cmd = '/opt/jupyterhub/pbs/bin/__qsub'
c.BatchSpawnerBase.batch_cancel_cmd = '/opt/jupyterhub/pbs/bin/__qdel {job_id}'
c.BatchSpawnerBase.batch_query_cmd = '/opt/jupyterhub/pbs/bin/__qstat {job_id}'
c.BatchSpawnerBase.state_exechost_re = 'exec_host = (.+)'
c.BatchSpawnerBase.state_pending_re = 'job_state = [QH]'
c.BatchSpawnerBase.state_running_re = 'job_state = R'
c.ProfilesSpawner.profiles = [
( "1 cores 8GB 8 hours", 'c1-1c-8g-8h', 'batchspawner.PBSSpawner',
{ "req_nprocs" : "1", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"8gb", "req_ngpus":"0" } ),
( "4 cores, 32GB, 8 hours", "c1-4c-16g-8h", "batchspawner.PBSSpawner",
{ "req_nprocs" : "4", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"32gb", "req_ngpus":"0" } ),
( "8 cores, 64GB, 8 hours", "c1-8c-64g-8h", "batchspawner.PBSSpawner",
{ "req_nprocs" : "8", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"64gb", "req_ngpus":"0" } ),
( "2 cores, 16GB, 8 hours, 1 GPU", "c1-2c-16g-8h-1gpu", "batchspawner.PBSSpawner",
{ "req_nprocs" : "2", "req_queue":"gpu", "req_runtime":"08:00:00", "req_memory":"16gb", "req_ngpus":"1" } ),
]
from jupyter_saml2authenticator import Saml2Authenticator
from traitlets import Bool
class CheckingSaml2Authenticator(Saml2Authenticator):
def validate_username(self, username):
if not super().validate_username(username):
return False
"""Check if the user exists on the system"""
import pwd
try:
pwd.getpwnam(username)
except KeyError:
return False
else:
return True
# For some reason this is not configurable in the parent class
delete_invalid_users = Bool(
default_value = True,
config = True,
help="Whether to delete users (from JupyterHub DB) who no longer validate",
)
c.JupyterHub.authenticator_class = CheckingSaml2Authenticator
c.CheckingSaml2Authenticator.saml2_metadata_url = 'REDACTED'
c.CheckingSaml2Authenticator.saml2_entity_id = 'REDACTED'
c.CheckingSaml2Authenticator.saml2_attribute_username = 'name'
c.CheckingSaml2Authenticator.delete_invalid_users = True
c.CheckingSaml2Authenticator.login_service = 'ID'
Logs
From the hub logs (I've redacted some items - hopefully not so much it's impossible to work out what is going on)
[I 2022-04-10 14:01:45.365 JupyterHub log:189] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-username&redirect_uri=%2Fuser%2Fusername%2Foauth_callback&response_type=code&state=[secret] -> /user/username/oauth_callback?code=[secret]&state=[secret] (username@::ffff:user.ipv4.address) 54.29ms
14:01:45.938 [ConfigProxy] debug: PROXY WEB /user/username/oauth_callback?code=redacted to http://[compute:ipv6:address]:40713
[D 2022-04-10 14:01:45.961 JupyterHub scopes:301] Authenticated with token <APIToken('REDACTED...', user='username', client_id='jupyterhub')>
[D 2022-04-10 14:01:45.962 oauthlib.oauth2.rfc6749.endpoints.token token:112] Dispatching grant_type authorization_code request to <oauthlib.oauth2.rfc6749.grant_types.authorization_code.AuthorizationCodeGrant object at REDACTED>.
[D 2022-04-10 14:01:45.962 JupyterHub provider:53] authenticate_client <oauthlib.Request SANITIZED>
[D 2022-04-10 14:01:45.974 oauthlib.oauth2.rfc6749.grant_types.authorization_code authorization_code:534] Using provided redirect_uri /user/username/oauth_callback
[D 2022-04-10 14:01:45.975 JupyterHub provider:112] confirm_redirect_uri: client_id=jupyterhub-user-username, redirect_uri=/user/username/oauth_callback
[D 2022-04-10 14:01:45.975 oauthlib.oauth2.rfc6749.grant_types.authorization_code authorization_code:302] Token request validation ok for <oauthlib.Request SANITIZED>.
[D 2022-04-10 14:01:45.975 JupyterHub provider:335] Saving bearer token {'access_token': 'REDACTED', 'expires_in': 1209600, 'token_type': 'Bearer', 'scope': '', 'refresh_token': 'REDACTED'}
[D 2022-04-10 14:01:46.000 JupyterHub provider:195] Deleting oauth code 4Ef... for jupyterhub-user-username
[I 2022-04-10 14:01:46.020 JupyterHub log:189] 200 POST /hub/api/oauth2/token (username@compute:ipv6:address) 64.17ms
[D 2022-04-10 14:01:46.036 JupyterHub base:281] Recording first activity for <APIToken('OSFd...', user='username', client_id='jupyterhub-user-username')>
[D 2022-04-10 14:01:46.055 JupyterHub scopes:301] Authenticated with token <APIToken('OSFd...', user='username', client_id='jupyterhub-user-username')>
[I 2022-04-10 14:01:46.066 JupyterHub log:189] 200 GET /hub/api/user (username@compute:ipv6:address) 32.94ms
14:01:46.637 [ConfigProxy] debug: PROXY WEB /user/username/lab? to http://[compute:ipv6:address]:40713
14:01:46.850 [ConfigProxy] error: 503 GET /user/username/lab? socket hang up
[D 2022-04-10 14:01:46.857 JupyterHub pages:586] No template for 503
[I 2022-04-10 14:01:46.872 JupyterHub log:189] 200 GET /hub/error/503?url=%2Fuser%2Fusername%2Flab%3F (@hub:ipv6:address) 16.33ms
and from the compute node:
[D 2022-04-10 14:01:46.062 SingleUserLabApp auth:395] Received request from Hub user {'admin': True, 'groups': [], 'name': 'username', 'kind': 'user', 'session_id': 'redacted', 'scopes': ['access:servers!server=username/']}
[I 2022-04-10 14:01:46.062 SingleUserLabApp auth:1132] Logged-in user {'admin': True, 'groups': [], 'name': 'username', 'kind': 'user', 'session_id': 'redacted', 'scopes': ['access:servers!server=username/']}
[D 2022-04-10 14:01:46.062 SingleUserLabApp auth:841] Setting oauth cookie for ::ffff:user.ipv4.address: jupyterhub-user-username, {'path': '/user/username/', 'httponly': True, 'secure': True}
[I 2022-04-10 14:01:46.063 SingleUserLabApp log:189] 302 GET /user/username/oauth_callback?code=[secret]&state=[secret] -> /user/username/lab? (@::ffff:user.ipv4.address) 119.27ms
[E 2022-04-10 14:01:46.643 SingleUserLabApp http1connection:67] Uncaught exception
Traceback (most recent call last):
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/http1connection.py", line 273, in _read_message
delegate.finish()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/httpserver.py", line 387, in finish
self.delegate.finish()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/routing.py", line 268, in finish
self.delegate.finish()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 2290, in finish
self.execute()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 2309, in execute
self.handler = self.handler_class(
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 227, in __init__
self.clear()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 328, in clear
self.set_default_headers()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyter_server/base/handlers.py", line 314, in set_default_headers
elif self.token_authenticated and "Access-Control-Allow-Origin" not in self.settings.get(
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyter_server/base/handlers.py", line 159, in token_authenticated
return self.login_handler.is_token_authenticated(self)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/singleuser/mixins.py", line 105, in is_token_authenticated
handler.get_current_user()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 1051, in get_current_user
self._hub_auth_user_cache = self.check_hub_user(user_model)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 978, in check_hub_user
if self.allow_all:
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 924, in allow_all
self.hub_scopes is None
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 916, in hub_scopes
return self.hub_auth.oauth_scopes or None
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 577, in __get__
return self.get(obj, cls)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 540, in get
default = obj.trait_defaults(self.name)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 1580, in trait_defaults
return self._get_trait_default_generator(names[0])(self)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 977, in __call__
return self.func(*args, **kwargs)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 357, in _default_scopes
return set(json.loads(env_scopes))
File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)