Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
317 changes: 99 additions & 218 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,299 +1,180 @@
# Deepnote Toolkit

[![CI](https://github.com/deepnote/deepnote-toolkit/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/deepnote/deepnote-toolkit/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/deepnote/deepnote-toolkit/graph/badge.svg?token=JCRUJP2BB9)](https://codecov.io/gh/deepnote/deepnote-toolkit)

> [!WARNING]
> This code is distributed to user context, so we are treating it as a public repository - ensure no secrets are included in the codebase.
Welcome to the Deepnote Toolkit, our homegrown Python package managed by Poetry. It is an essential Python package that needs to be installed in the user's environment, which is operated by Deepnote. This package encapsulates all the code that needs to run in the user's space environment.

### Key features

- Deepnote component library
- Python kernel with scientific computing libraries
- SQL support with query caching
- Data visualization (Altair, Plotly)
- Streamlit apps support with auto-reload
- Language Server Protocol integration
- Git integration with SSH/HTTPS authentication
- Prometheus metrics collection
- Integration environment variables management

### Bundle types
<div align="center">

The toolkit consists of two main bundle types:
![Deepnote Toolkit cover image](/assets/deepnote-toolkit-cover-image.png)

1. **Kernel Bundle**: Libraries available to user code execution (pandas, numpy, etc.)
2. **Server Bundle**: Dependencies for running infrastructure services (Jupyter, Streamlit, LSP)

### How to setup?
[![CI](https://github.com/deepnote/deepnote-toolkit/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/deepnote/deepnote-toolkit/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/deepnote/deepnote-toolkit/graph/badge.svg?token=JCRUJP2BB9)](https://codecov.io/gh/deepnote/deepnote-toolkit)

#### Option 1: Using mise (Recommended)

[mise](https://mise.jdx.dev/) automatically manages Python, Java, and other tool versions:
[Website](https://deepnote.com/?utm_source=github&utm_medium=github&utm_campaign=github&utm_content=readme_main)[Docs](https://deepnote.com/docs?utm_source=github&utm_medium=github&utm_campaign=github&utm_content=readme_main)[Blog](https://deepnote.com/blog?utm_source=github&utm_medium=github&utm_campaign=github&utm_content=readme_main)[X](https://x.com/DeepnoteHQ)[Examples](https://deepnote.com/explore?utm_source=github&utm_medium=github&utm_campaign=github&utm_content=readme_main)[Community](https://github.com/deepnote/deepnote/discussions)

1. Install mise: [Getting started](https://mise.jdx.dev/getting-started.html)
2. Run setup:
</div>

```bash
mise install # Installs Python 3.12 and Java 11
mise run setup # Installs dependencies and pre-commit hooks
```
# Deepnote Toolkit

#### Option 2: Manual setup
The Deepnote Toolkit is a Python package that powers the [Deepnote notebook environment](https://github.com/deepnote/deepnote/). It provides the essential functionality that runs in user workspaces, enabling interactive data science workflows with SQL, visualizations, and integrations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove trailing whitespace.

Line 16 has trailing whitespace.

-The Deepnote Toolkit is a Python package that powers the [Deepnote notebook environment](https://github.com/deepnote/deepnote/). It provides the essential functionality that runs in user workspaces, enabling interactive data science workflows with SQL, visualizations, and integrations. 
+The Deepnote Toolkit is a Python package that powers the [Deepnote notebook environment](https://github.com/deepnote/deepnote/). It provides the essential functionality that runs in user workspaces, enabling interactive data science workflows with SQL, visualizations, and integrations.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
The Deepnote Toolkit is a Python package that powers the [Deepnote notebook environment](https://github.com/deepnote/deepnote/). It provides the essential functionality that runs in user workspaces, enabling interactive data science workflows with SQL, visualizations, and integrations.
The Deepnote Toolkit is a Python package that powers the [Deepnote notebook environment](https://github.com/deepnote/deepnote/). It provides the essential functionality that runs in user workspaces, enabling interactive data science workflows with SQL, visualizations, and integrations.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

16-16: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

🤖 Prompt for AI Agents
In README.md around line 16, remove the trailing whitespace at the end of that
line; open the file, delete any extra spaces or tabs after the sentence on line
16 (and optionally run a trim-whitespace linter or editor command to remove
trailing whitespace across the file), then save the file.


1. Install poetry: [Installation](https://python-poetry.org/docs/#installation)
2. Install Java 11 (required for PySpark tests):
- macOS: `brew install openjdk@11`
- Ubuntu/Debian: `sudo apt-get install openjdk-11-jdk`
- RHEL/Fedora: `sudo dnf install java-11-openjdk-devel`
3. Set up venv for development package:

```bash
# if python 3.10 is installed this should use
$ poetry env use 3.10
```
## Installation

4. Verify the virtual environment location:
The toolkit is automatically installed in Deepnote workspaces. For local development or testing:

```bash
$ poetry env info
```
```bash
pip install deepnote-toolkit
```

5. Install dependencies:
For server components (Jupyter, Streamlit, LSP):

```bash
$ poetry install
```
```bash
pip install deepnote-toolkit[server]
```

6. Install Poe Poetry addon:
## Features

```bash
$ poetry self add 'poethepoet[poetry_plugin]'
```
### Core capabilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add blank lines before subheadings.

Subheadings require blank lines above them per markdown style (MD022).

 ## Features
 
+
 ### Core capabilities
 - **SQL execution engine**: Multi-database SQL support with connection management, query templating via Jinja2, intelligent caching, and query chaining with CTE generation
 - **Interactive visualizations**: Vega-Lite charts with VegaFusion optimization, multi-layer support, and interactive selections
 - **Data processing**: Enhanced DataFrame utilities, data sanitization, and DuckDB in-memory analytics
 - **Jupyter integration**: Custom IPython kernel with scientific computing libraries (pandas, numpy, etc.)
 
+
 ### Developer tools
 - **CLI interface**: Command-line tools for server management and configuration
 - **Streamlit support**: Auto-reload development workflow for Streamlit applications
 - **Language server protocol**: Code intelligence and autocompletion support
 - **Runtime initialization**: Session persistence, environment variable management, and post-start hooks
 
+
 ### Infrastructure
 - **Git integration**: SSH/HTTPS authentication for repository access
 - **SSH tunneling**: Secure database connections through SSH tunnels
 - **Metrics collection**: Prometheus metrics for monitoring and observability
 - **Feature flags**: Dynamic feature toggling support

Also applies to: 41-41, 47-47

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

35-35: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
In README.md around lines 35, 41, and 47, the subheadings lack the required
blank line above them (MD022); fix by inserting a single blank line immediately
before each subheading line so each "###" header is preceded by an empty line,
then run a markdown linter to confirm MD022 passes.

- **SQL execution engine**: Multi-database SQL support with connection management, query templating via Jinja2, intelligent caching, and query chaining with CTE generation
- **Interactive visualizations**: Vega-Lite charts with VegaFusion optimization, multi-layer support, and interactive selections
- **Data processing**: Enhanced DataFrame utilities, data sanitization, and DuckDB in-memory analytics
- **Jupyter integration**: Custom IPython kernel with scientific computing libraries (pandas, numpy, etc.)

7. Install pre-commit hooks:
### Developer tools
- **CLI interface**: Command-line tools for server management and configuration
- **Streamlit support**: Auto-reload development workflow for Streamlit applications
- **Language server protocol**: Code intelligence and autocompletion support
- **Runtime initialization**: Session persistence, environment variable management, and post-start hooks

```bash
$ poetry poe setup-hooks
```
### Infrastructure
- **Git integration**: SSH/HTTPS authentication for repository access
- **SSH tunneling**: Secure database connections through SSH tunnels
- **Metrics collection**: Prometheus metrics for monitoring and observability
- **Feature flags**: Dynamic feature toggling support

8. Verify installation:
## Architecture

```bash
$ poetry poe lint
$ poetry poe format
```
The toolkit is organized into two deployment bundles:

### Setup troubleshooting
1. **Kernel bundle**: Core libraries available to user code (pandas, numpy, SQL drivers, visualization libraries)
2. **Server bundle**: Infrastructure services (Jupyter Server, Streamlit, Python LSP Server)

1. If `poetry install` fails with error `library 'ssl' not found`:
### Main modules

```bash
env LDFLAGS="-I/opt/homebrew/opt/openssl/include -L/opt/homebrew/opt/openssl/lib" poetry install
```
- **`deepnote_toolkit.sql`**: SQL execution, templating, caching, and query chaining
- **`deepnote_toolkit.chart`**: Vega-Lite chart rendering with VegaFusion optimization
- **`deepnote_toolkit.cli`**: Command-line interface for toolkit management
- **`deepnote_toolkit.ocelots`**: Deepnote component library for interactive UI elements
- **`deepnote_toolkit.runtime`**: Runtime initialization and session management
- **`deepnote_core`**: Core utilities shared across the toolkit

2. If `poetry install` fails installing `pymssql`, install `freetds` via homebrew.
## Usage

## CLI Quick Start
### CLI commands

The toolkit includes a pip-native CLI:
The toolkit provides a command-line interface for managing servers and configuration:

```bash
# Install the package with server components
poetry install --with server
# Run the CLI to see available commands
poetry run deepnote-toolkit --help
# Start Jupyter server on default port (8888)
poetry run deepnote-toolkit server
deepnote-toolkit server

# Start servers with custom configuration
poetry run deepnote-toolkit server --jupyter-port 9000
deepnote-toolkit server --jupyter-port 9000

# View/modify configuration
poetry run deepnote-toolkit config show
poetry run deepnote-toolkit config set server.jupyter_port 9000
deepnote-toolkit config show
deepnote-toolkit config set server.jupyter_port 9000
```

**Security Note**: The CLI will warn if Jupyter runs without authentication. For local development only. Set `DEEPNOTE_JUPYTER_TOKEN` for shared environments.

## Testing
**Security note**: The CLI will warn if Jupyter runs without authentication. For local development only. Set `DEEPNOTE_JUPYTER_TOKEN` for shared environments.

Tests run against all supported Python versions using nox in Docker for reproducible environments.

### Local Testing
## Development

#### Using mise (Recommended)
### Testing

```bash
# Run unit tests (no coverage by default)
mise run test
# Run unit tests with coverage
mise run test:coverage
The project uses nox for testing across multiple Python versions (3.9-3.12) in Docker containers.

# Run tests quickly without nox/coverage overhead
mise run test:quick tests/unit/test_file.py
mise run test:quick tests/unit/test_file.py::TestClass::test_method -v
**Quick testing with mise:**

# Pass custom arguments (including --coverage)
mise run test -- --coverage tests/unit/test_file.py
```bash
mise run test # Run unit tests
mise run test:coverage # Run with coverage
mise run test:quick tests/unit/ # Fast testing without nox overhead
```

#### Using nox directly
**Using nox directly:**

```bash
# Run unit tests without coverage
poetry run nox -s unit
# Run unit tests with coverage
poetry run nox -s unit -- --coverage
# Run specific test file
poetry run nox -s unit -- tests/unit/test_file.py
poetry run nox -s unit # Run unit tests
poetry run nox -s unit -- --coverage # With coverage
poetry run nox -s unit -- tests/unit/test_file.py # Specific file
```

#### Using Docker
```bash
# Run unit tests
TEST_TYPE="unit" TOOLKIT_VERSION="local-build" ./bin/test
# Run integration tests
TEST_TYPE="integration" TOOLKIT_VERSION="local-build" TOOLKIT_INDEX_URL="http://localhost:8000" ./bin/test
# Or use the test-local script for both unit tests and integration tests
./bin/test-local
**Using Docker:**

# Run a specific file with test-local
./bin/test-local tests/unit/test_file.py
# ... or specific test
./bin/test-local tests/unit/test_file.py::TestClass::test_method
```bash
./bin/test-local # Run all tests
./bin/test-local tests/unit/test_file.py # Specific file
```

### Test Coverage

- Unit tests for core functionality
- Integration tests for bundle installation
- Server startup tests
- Environment variable handling
### Test coverage

## Development Workflow
- Unit tests for SQL execution, charting, and utilities
- Integration tests for bundle installation and server startup
- Python 3.9-3.12 compatibility testing
- Coverage threshold: 55%

### Using in Deepnote Projects
### Local development with Docker

When you push a commit, a new version of `deepnote/jupyter-for-local` is built with your commit hash (shortened!). Use it in projects by updating `common.yml`:

```yaml
jupyter:
image: "deepnote/jupyter-for-local:SHORTENED_COMMIT_SHA"
```

Alternatively, to develop against local copy of toolkit, first run this command to build the image:
For local development with hot-reload:

```bash
# Build the development image
docker build \
--build-arg "FROM_PYTHON_TAG=3.11" \
-t deepnote/deepnote-toolkit-local-hotreload \
-f ./dockerfiles/jupyter-for-local-hotreload/Dockerfile .
```

And start container:

```bash
# To include server logs in the output add this argument
# -e WITH_SERVER_LOGS=1 \
# Some toolkit features (e.g. feature flags support) require
# DEEPNOTE_PROJECT_ID to be set to work correctly. Add this
# argument with your project id
# -e DEEPNOTE_PROJECT_ID=981af2c1-fe8b-41b7-94bf-006b74cf0641 \

# Start the container
docker run \
-v "$(pwd)":/deepnote-toolkit \
-v /tmp/deepnote-mounts:/deepnote-mounts:shared \
-p 8888:8888 \
-p 2087:2087 \
-p 8051:8051 \
-p 8888:8888 -p 2087:2087 -p 8051:8051 \
-w /deepnote-toolkit \
--add-host=localstack.dev.deepnote.org:host-gateway \
--rm \
--name deepnote-toolkit-local-hotreload-container \
deepnote/deepnote-toolkit-local-hotreload
```

This will start a container with Deepnote toolkit mounted inside and expose all required ports. If you change code that is executed in kernel (e.g. you updated DataFrame formatter), you need only to restart kernel from Deepnote's UI. If you updated code that is starts Jupyter itself, you need to restart container. And if you add/modify dependencies you need to rebuild image.
Now, you need to modify `common.yml` in the Deepnote app. First, replace `jupyter` service with noop image:
```yml
jupyter:
image: 'screwdrivercd/noop-container'
```
And change `JUPYTER_HOST` variable of executor to point to host machine:
```yml
executor:
environment:
JUPYTER_HOST: host.docker.internal
deepnote/deepnote-toolkit-local-hotreload
```

### Review Apps
**Hot-reload behavior:**
- Kernel code changes: Restart kernel from Jupyter UI
- Server code changes: Restart container
- Dependency changes: Rebuild image

Each PR creates a review app for testing. Access it via GitHub checks. Monitor logs in Grafana:
### Docker images

```
{pod="p-PROJECT_ID", container="notebook"}
```
### Adding Dependencies
- Kernel dependencies: Add to `[tool.poetry.dependencies]` in pyproject.toml
```bash
# Add a package to kernel bundle (available in notebooks)
$ poetry add pandas
# Add a package with specific version
$ poetry add "pandas>=2.0.0"
```
```bash
# Add a development dependency
$ poetry add --group dev pytest
```
After adding dependencies, run tests to verify compatibility:
```bash
$ ./bin/test-local
```
The repository includes three main Dockerfiles:

### Docker Usage
- **`builder.Dockerfile`**: Builds Python packages for versions 3.9-3.12, generates kernel and server bundles
- **`test.Dockerfile`**: Provides reproducible test environment for nox across Python versions
- **`jupyter-for-local.Dockerfile`**: Development environment with Jupyter integration for local testing

We use Docker to ensure reproducible environments due to Jupyter libraries' binary dependencies:
## Contributing

- `builder.Dockerfile`: Creates Python package bundles for different versions (3.9-3.12), generates kernel and server bundles, and packages the toolkit for distribution using Poetry.
See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, coding standards, and contribution guidelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify referenced documentation files exist.

The README links to CONTRIBUTING.md, LICENSE, and SECURITY.md. Confirm these files are present in the repository.

Also applies to: 167-167, 173-173


🏁 Script executed:

#!/bin/bash
# Verify supporting documentation files
for file in CONTRIBUTING.md LICENSE SECURITY.md; do
  if [ -f "$file" ]; then
    echo "$file exists"
  else
    echo "$file missing"
  fi
done

Length of output: 63


Create missing CONTRIBUTING.md file.

README.md references CONTRIBUTING.md (line 163), but the file does not exist. Create this file with development setup, coding standards, and contribution guidelines.

🤖 Prompt for AI Agents
In README.md around line 163, the document links to CONTRIBUTING.md which is
missing; create a CONTRIBUTING.md file at the repository root containing the
project's development setup (prerequisites, installation, running locally,
testing), coding standards (style guide, linting, commit message format), and
contribution guidelines (branching model, PR template checklist, review process,
issue reporting and template, and code of conduct), and add a brief note in
README.md if desired pointing to the new file; ensure the CONTRIBUTING.md is
clear, concise, and uses Markdown headings for each section.


- `test.Dockerfile`: Provides consistent test environment for running unit and integration tests across Python versions using nox. Used both locally and in CI/CD pipeline.
## License

- `jupyter-for-local.Dockerfile`: Creates development environment with Jupyter integration, used for local development from docker-compose used in main monorepo.
Apache License 2.0 - See [LICENSE](LICENSE) for details.

### Production Releases
## Support

To release a new version to production:
- **Documentation**: [docs.deepnote.com](https://docs.deepnote.com)
- **Issues**: [GitHub Issues](https://github.com/deepnote/deepnote-toolkit/issues)
- **Security**: See [SECURITY.md](SECURITY.md) for reporting vulnerabilities

1. Merge your changes to main. This will automatically trigger a GitHub Actions workflow that runs the test suite and a staging deployment.
2. Trigger a new [GitHub Release](https://github.com/deepnote/deepnote-toolkit/releases) in the GitHub UI.
3. Monitor [the GitHub Actions workflows](https://github.com/deepnote/deepnote-toolkit/actions) and ensure a successful production deployment.

Note: The production release pipeline automatically creates two PRs in the ops and app-config repositories:
<div align="center">

- A staging PR that updates staging values and is auto-merged
- A production PR that updates production values and requires manual approval and merge
Built with 💙 by the data-driven team

Important: Always test the changes in the staging environment before approving and merging the production PR to ensure everything works as expected.
</div>
Binary file added assets/deepnote-toolkit-cover-image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.