Description
Motivation
It would improve productivity if Jupyter (Lab/Notebook) allowed to click on file path in tracebacks (and elsewhere) to open the file (jupyterlab/jupyterlab#13277). The logic would be as follows:
- if path points to a file withing
root_dir
, the file should be opened on the frontend for edition - if path points to a file beyond
root_dir
we should either:- a) do nothing in security sensitive setups
- b) ask kernel to provide source of such file and display it as read-only - this is already implemented in ipykernel using debugger adapter protocol
source
request (this would be necessary for remote kernels) - c) have a custom server extension which would implement ContentsManager API allowing exposing specific files outside of
root_dir
based on block/allow list (see Additional scope for broader filesystem access below; this would not work for remote kernels)
Problem
It is currently impossible to distinguish between 1 and 2 (whether we are within root_dir
or outside of it).
For server started in root_dir = "~/server_root"
, we can expect the following traceback from ipykernel:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from a_file import test
File ~/server_root/a_file.py:1
----> 1 test
NameError: name 'test' is not defined
The problem is that frontend cannot tell whether ~/server_root/a_file.py
is within root_dir
or not.
This is the case even if frontend knows what the root_dir
is. For example if root_dir
is /home/my-username/server_root
, the frontend does not know what is the expansion of ~
in the kernel space (it may well be /home/another-username/
).
"Guessing" by trying both is not an option because we want to avoid false positives (turning file-like strings into paths that are broken URLs - mostly because everything can look like a path) and there are performance implications if we were guessing that way.
Proposed Solution
Create a new API endpoint which would tell the frontend whether the given file path is within the scope of the server, kernel, or neither. If the file is within the scope of the server, it would return the normalised path relative to root dir.
This could account for kernels which are spawned in a filesystem different from where the root_dir
resides - as far as I understand there are no restrictions on kernel location (see snippet below) - a path could be within scope of both kernel and server (when kernel is started within root_dir
), only one of them, or neither.
Examples
For simplicity, let's call the proposed endpoint /api/resolve
(although maybe it should be integrated with existing file ID manager, in which case it could be /api/fileid/resolve
). In pseudocode it would be described as:
class PathResolver(Protocol):
def resolve_path(self, path: str) -> str: ...
class ContentsManager(..., PathResolver): ...
class KernelManager(..., PathResolver): ...
def handle_resolve(self, path: str, kernel_uuid: str):
scopes = [
self.contents_manger,
self.multi_kernel_manager.get_kernel(kernel_uuid),
*self.get_additional_scopes(kernel_uuid)
]
return [
scope.resolve_path(path)
for scope in scopes
if hasattr(scope, 'resolve_path')
]
For a server spawned at ~/server_root
with a kernel spawned in the same location:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]
For a server spawned at ~/server_root
with a kernel spawned in ~/server_root/kernel
:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}]
# /api/resolve?path=~/server_root/kernel/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'kernel/test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]
For a server spawned at ~/server_root
with a kernel spawned in /tmp/kernel
:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}]
# /api/resolve?path=/tmp/kernel/test.py&kernel={uuid}
[{'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]
I am not opinionated on any particular JSON format, but I think it would be useful to return all matching resolutions and allow the frontend client to decide which one to use.
Additional context
Additional scope for exposing source
access
As noted in (2b) we could expose the source of files known by kernel (which is beyond its spawn cwd
) reusing existing DAP source
request. The /api/resolve
response could advertise that a path is known by the kernel's source
handler. Augmenting the first example:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}, {'scope': 'source', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}, {'scope': 'source', 'relative': 'test.py'}]
# /api/resolve?path=~/test.py&kernel={uuid}
[{'scope': 'source', 'relative': `/home/user/test.py`}]
# /api/resolve?path=/lib/python/library/test.py&kernel={uuid}
[{'scope': 'source', 'relative': `/lib/python/library/test.py`}]
Additional scope for broader filesystem
access
Per (2c) it would be desirable to enable implementation of custom scope provider that would allow tightly controlled access to filesystem beyond root_dir
. This would benefit other uses where access to files on filesystem is desirable (jupyter-lsp/jupyterlab-lsp#850).
A scope provider configured to expose files under ~/shared
with server (as in first example) spawned at ~/server_root
and kernel spawned in the same location would resolve the following:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/shared/test.py&kernel={uuid}
[{'scope': 'filesystem', 'relative': `~/shared/test.py`}] # filesystem is relative to filesystem root (de facto absolute)
# /api/resolve?path=~/not-allowed/test.py&kernel={uuid}
[]
The difference between filesystem
and source
scope is subtle but noticeable when:
- the kernel is running in different separate filesystem than server
- there are multiple contents managers
Impact on multiplexed content managers
A number of ways to provide multiple content managers was proposed over the years:
- Jupyter(Lab)
IDrive
frontend API which may be connected to alternative/api/conents
endpoint - jpmorganchase/jupyter-fs using
MetaManager
where drives are managed on the server side rather than frontend - viaduct-ai/hybridcontents - status not clear
- jupyter/jupyter-drive (
MixedContentsManager
) - deprecated
With the proposed solution:
- the
IDrive
would need to amended to allow providing an URL for alternative/api/resolve
. - the drive-aware meta-managers like
jupyter-fs
should be able to handle for/api/resolve
by overriding implementation ofContentsManager.resolve_path
to account for drive prefixes.
Impact on security by obscurity
The proposed solution would make it easier to find out root_dir
from the frontend because a user could check numerous paths and deduce root_dir
path from the server responses by brute-force. This is not a concern for majority of administrators as kernels are typically run locally hence not only know but also have access to full runtime path.