Description
Problem
We host a platform that provides access to sensitive research data, and Jupyter is one of the environments offered to researchers. For audit and security purposes, we need to capture which notebook cells users execute, but without logging output, to minimize exposure to sensitive results.
Jupyter Server currently does not provide a built-in way to:
- Capture executed code cells
- Reliably associate those executions with authenticated users across common deployment setups like JupyterHub or oauth2_proxy
This gap makes auditing or tracing activity infeasible in secure or compliance-sensitive environments.
Proposed Solution
Introduce a new configuration flag: ServerApp.log_cell_execution = True
When enabled:
- Log each incoming
execute_request
message from WebSocket clients. - Extract cell source code only—do not capture outputs.
- Log as structured JSON:
who
: user identitywhat
: executed codekernel_id
timestamp
(UTC, ISO 8601)
Identity resolution strategy:
- Use
current_user.username
if available and meaningful. - If the value appears to be an opaque internal ID (e.g. UUID-style), and the connection is from a trusted proxy (
127.0.0.1
), attempt to extract identity from headers:X-Auth-Request-User
X-Auth-Request-Email
- If headers are missing or empty, fall back to the original username.
Implementation hooks into the ZMQChannelsWebsocketConnection.handle_incoming_message()
method, where execute_request
messages are already parsed and routed. This allows us to log cell input at the exact point of entry, without impacting downstream kernel logic.
Additional context
This feature is scoped, backwards-compatible, and entirely opt-in.
It enables a range of downstream uses:
- Compliance with institutional audit requirements
- Improved incident response and forensics in multi-user Jupyter deployments
- Integration with log collection tools (e.g. ELK, CloudWatch) via JSON structure
We have a patch ready to submit as a pull request and welcome feedback on the approach.