Skip to content

Add opt-in structured logging of executed notebook cells with user identity #1533

Open
@irh-hdh

Description

@irh-hdh

Problem

We host a platform that provides access to sensitive research data, and Jupyter is one of the environments offered to researchers. For audit and security purposes, we need to capture which notebook cells users execute, but without logging output, to minimize exposure to sensitive results.

Jupyter Server currently does not provide a built-in way to:

  • Capture executed code cells
  • Reliably associate those executions with authenticated users across common deployment setups like JupyterHub or oauth2_proxy

This gap makes auditing or tracing activity infeasible in secure or compliance-sensitive environments.


Proposed Solution

Introduce a new configuration flag: ServerApp.log_cell_execution = True

When enabled:

  • Log each incoming execute_request message from WebSocket clients.
  • Extract cell source code only—do not capture outputs.
  • Log as structured JSON:
    • who: user identity
    • what: executed code
    • kernel_id
    • timestamp (UTC, ISO 8601)

Identity resolution strategy:

  1. Use current_user.username if available and meaningful.
  2. If the value appears to be an opaque internal ID (e.g. UUID-style), and the connection is from a trusted proxy (127.0.0.1), attempt to extract identity from headers:
    • X-Auth-Request-User
    • X-Auth-Request-Email
  3. If headers are missing or empty, fall back to the original username.

Implementation hooks into the ZMQChannelsWebsocketConnection.handle_incoming_message() method, where execute_request messages are already parsed and routed. This allows us to log cell input at the exact point of entry, without impacting downstream kernel logic.


Additional context

This feature is scoped, backwards-compatible, and entirely opt-in.

It enables a range of downstream uses:

  • Compliance with institutional audit requirements
  • Improved incident response and forensics in multi-user Jupyter deployments
  • Integration with log collection tools (e.g. ELK, CloudWatch) via JSON structure

We have a patch ready to submit as a pull request and welcome feedback on the approach.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions