Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ lint-snippets:
uv pip install docstring_parser_fork --reinstall
uv run mypy --config-file mypy.ini docs/website docs/tools --exclude docs/tools/lint_setup --exclude docs/website/docs_processed --exclude docs/website/versioned_docs/
uv run ruff check
uv run flake8 --max-line-length=200 docs/website docs/tools --exclude docs/website/.dlt-repo
uv run flake8 --max-line-length=200 docs/website docs/tools --exclude docs/website/.dlt-repo,docs/website/node_modules

lint-and-test-snippets: lint-snippets
cd docs/website/docs && uv run pytest --ignore=node_modules --ignore hub/features/transformations/transformation-snippets.py
Expand Down
15 changes: 14 additions & 1 deletion docs/tools/check_embedded_snippets.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,20 @@


SNIPPET_MARKER = "```"
ALLOWED_LANGUAGES = ["py", "toml", "json", "yaml", "text", "sh", "bat", "sql", "hcl", "dbml", "dot"]
ALLOWED_LANGUAGES = [
"py",
"toml",
"json",
"yaml",
"text",
"sh",
"bat",
"sql",
"hcl",
"dbml",
"dot",
"mermaid",
]

LINT_TEMPLATE = "./lint_setup/template.py"
LINT_FILE = "./lint_setup/lint_me.py"
Expand Down
4 changes: 2 additions & 2 deletions docs/website/docs/hub/core-concepts/profiles-dlthub.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ They are hidden behind a feature flag, which means you need to manually enable t
To activate these features, create the `.dlt/.workspace` file in your project directory; this tells `dlt` to switch from the classic project mode to the new Workspace mode.
:::

Profiles are part of the [dltHub Workspace](../workspace/intro) feature.
Profiles are part of the [dltHub Workspace](../workspace/overview.md) feature.
To use them, first install `dlt` with Workspace support:

```sh
Expand Down Expand Up @@ -234,6 +234,6 @@ You’ll see your pipeline connected to the remote MotherDuck dataset and ready

## Next steps

* [Configure the workspace.](../workspace/intro)
* [Configure the workspace.](../workspace/overview.md)
* [Deploy your pipeline.](../../walkthroughs/deploy-a-pipeline)
* [Monitor and debug pipelines.](../../general-usage/pipeline#monitor-the-loading-progress)
24 changes: 2 additions & 22 deletions docs/website/docs/hub/features/quality/data-quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,33 +8,13 @@ keywords: ["dlthub", "data quality", "contracts"]
🚧 This feature is under development. Interested in becoming an early tester? [Join dltHub early access](https://info.dlthub.com/waiting-list).
:::

dltHub will allow you to define data validation rules at the YAML level or using Pydantic models. This ensures your data meets expected quality standards at the ingestion step.

## Example: Defining a quality contract in YAML

You can specify quality contracts to enforce constraints on your data, such as expected value ranges and nullability.

```yaml
engine_version: 10
name: scd_type_3
tables:
customers:
columns:
category:
data_type: bigint
nullable: false
quality_contracts:
expect_column_max_to_be_between:
min_value: 1
max_value: 100
```
dltHub will allow you to define data validation rules in Python. This ensures your data meets expected quality standards at the ingestion step.

## Key features
With dltHub, you will be able to:

* Define data tests and quality contracts using YAML configuration or Pydantic models.
* Define data tests and quality contracts in Python.
* Apply both row-level and batch-level validation.
* Enforce constraints on distributions, boundaries, and expected values.

Stay tuned for updates as we expand these capabilities! 🚀

132 changes: 106 additions & 26 deletions docs/website/docs/hub/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,41 +3,121 @@ title: Introduction
description: Introduction to dltHub
---

# What is dltHub?
## What is dltHub?

![dltHub](/img/slot-machine-gif.gif)
dltHub is an LLM-native data engineering platform that lets any Python developer build, run, and operate production-grade data pipelines, and deliver end-user-ready insights without managing infrastructure.

dltHub is a commercial extension to the open-source data load tool (dlt). It augments it with a set of features like transformations, data validations,
iceberg with full catalog support and provides a yaml interface to define data platforms. dltHub features include:
dltHub is built around the open-source library [dlt](../intro.md). It uses the same core concepts (sources, destinations, pipelines) and extends the extract-and-load focus of `dlt` with:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the diagram you've built for LLM_native data platform book? it was very useful


- [@dlt.hub.transformation](features/transformations/index.md) - powerful Python decorator to build transformation pipelines and notebooks
- [dbt transformations](features/transformations/dbt-transformations.md): a staging layer for data transformations, combining a local cache with schema enforcement, debugging tools, and integration with existing data workflows.
- [Iceberg support](ecosystem/iceberg.md)
- [Secure data access and sharing](features/data-access.md)
- [AI workflows](features/ai.md): agents to augment your data engineering team.
* Enhanced developer experience
* Transformations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should mention workspace? ie "Extended developer experience" or something like that.

* Data quality
* AI-assisted (“agentic”) workflows
* Managed runtime

To get started with dltHub, install the library using pip (Python 3.9-3.12):
dltHub supports both local and managed cloud development. A single developer can deploy and operate pipelines, transformations, and notebooks directly from a dltHub Workspace, using a single command.
The dltHub Runtime, customizable pipeline dashboard, and validation tools make it straightforward to monitor, troubleshoot, and keep data reliable throughout the whole end-to-end data workflow:

```sh
pip install dlthub
```mermaid
flowchart LR
A[Create a pipeline] --> B[Ensure data quality]
B --> C[Create reports & transformations]
C --> D[Deploy Workspace]
D --> E[Maintain data quality]
E --> F[Share]
```

You can try out any features by self-issuing a trial license. You can use such license for evaluation, development and testing.
Trial license are issued off-line using `dlt license` command:
In practice, this means any Python developer can:

1. Display a list of available features
```sh
dlt license scopes
```
* Build and customize data pipelines quickly (with LLM help when desired).
* Derisk data insights by keeping data quality high with checks, tests, and alerts.
* Ship fresh dashboards, reports, and data apps.
* Scale the data workflows easily without babysitting infra, schema drift, and silent failures.

2. Issue license for the feature you want to test.

```sh
dlt license issue dlthub.transformation
```

The command above will enable access to new `@dlt.hub.transformation` decorator. Note that you may
self issue licenses several times and the command above will carry-over features from previously issued license.
:::tip
Want to see it end-to-end? Watch the dltHub [Workspace demo](https://youtu.be/rmpiFSCV8aA).
:::

To get started quickly, follow the [installation instructions](getting-started/installation.md).

## Overview

### Key capabilities

1. **[LLM-native workflow](../dlt-ecosystem/llm-tooling/llm-native-workflow)**: accelerate pipeline authoring and maintenance with guided prompts and copilot experiences.

2. **[Transformations](features/transformations/index.md)**: write Python or SQL transformations with `@dlt.hub.transformation`, orchestrated within your pipeline.

3. **[Data quality](features/quality/data-quality.md)**: define correctness rules, run checks, and fail fast with actionable messages.

4. **[Data apps & sharing](../general-usage/dataset-access/marimo)**: build lightweight, shareable data apps and notebooks for consumers.

5. **[AI agentic support](features/mcp-server.md)**: use MCP servers to analyze pipelines and datasets.
6. **Managed runtime**: deploy and run with a single command—no infra to provision or patch.
7. **[Storage choice](ecosystem/iceberg.md)**: pick managed Iceberg-based lakehouse, DuckLake, or bring your own storage.

### How dltHub fits with dlt (OSS)

dltHub embraces the dlt library, not replaces it:
* dlt (OSS): Python library focused on extract & load with strong typing and schema handling.
* dltHub: Adds transformations, quality, agentic tooling, managed runtime, and storage choices, so you can move from local dev to production seamlessly.

If you like the dlt developer experience, dltHub gives you everything around it to run in production with less toil.

## dltHub products
dltHub consists of three main products. You can use them together or compose them based on your needs.

### Workspace

**[Workspace](workspace/overview.md) [Public preview]** - the unified environment for building, running, and maintaining data workflows end-to-end.

* Scaffolding and LLM helpers for faster pipeline creation.
* Integrated transformations (@dlt.hub.transformation decorator).
* Data quality rules, test runs, and result surfacing.
* Notebook and data apps (e.g., Marimo) for sharing insights.
* Visual dashboards for pipeline health and run history.

### Runtime [Private preview]

**Runtime** - a managed cloud runtime operated by dltHub:

* Scalable execution for pipelines and transformations.
* APIs, web interfaces, and auxiliary services.
* Secure, multi-tenant infrastructure with upgrades and patching handled for you.

:::tip
Prefer full control? See [Enterprise](#tiers--licensing) below for self-managed options.
:::

### Storage

**[Storage](ecosystem/iceberg.md) [In development]**. Choose where your data lives:

* Managed lakehouse: Iceberg open table format (or DuckLake) managed by dltHub.
* Bring your own storage: connect to your own lake/warehouse when needed.

## Tiers & licensing

Some of the features described in this documentation are free to use. Others require a paid plan. Latest pricing & full feature matrix can be found live on our website.
Most features support a self-guided trial right after install, check the [installation instructions](getting-started/installation.md) for more information.

| Tier | Best for | Runtime | Typical use case | Notes | Availability |
| --------------------- | ------------------------------------------------------------------------------------------ | ------------------------------ | ---------------------------------------------------------------------------- | ---------------------------------------------- |-----------------|
| **dltHub Basic** | Solo developers or small teams owning a **single pipeline + dataset + reports** end-to-end | Managed dltHub Runtime | Set up a pipeline quickly, add tests and transformations, share a simple app | Optimized for velocity with minimal setup | Private preview |
| **dltHub Scale** | Data teams building **composable data platforms** with governance and collaboration | Managed dltHub Runtime | Multiple pipelines, shared assets, team workflows, observability | Team features and extended governance | Alpha |
| **dltHub Enterprise** | Organizations needing **enterprise controls** or **self-managed runtime** | Managed or self-hosted Runtime | On-prem/VPC deployments, custom licensing, advanced security | Enterprise features and deployment flexibility | In developement |


### Who is dltHub for?

* Python developers who want production outcomes without becoming infra experts.
* Lean data teams standardizing on dlt and wanting integrated quality, transforms, and sharing.
* Organizations that prefer managed operations but need open formats and portability.

3. Do not forget to read our [EULA](EULA.md) and [Special Terms](EULA.md#specific-terms-for-the-self-issued-trial-license-self-issued-trial-terms)
for self issued licenses.
:::note
* You can start on Basic and upgrade to Scale or Enterprise later, no code rewrites.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that tbh. I think Scale will be more opinionated on how to write code. It will be more declarative. But what we can do for sure: anything that works in OSS will work in dlthub without code changes.

* We favor open formats and portable storage (e.g., Iceberg), whether you choose our managed lakehouse or bring your own.
* For exact features and pricing, check the site; this section is meant to help you choose a sensible starting point.
:::
2 changes: 1 addition & 1 deletion docs/website/docs/hub/workspace/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ This adds support for AI-assisted workflows and the `dlt ai` command.

**dlt Workspace** is a unified environment for developing, running, and maintaining data pipelines — from local development to production.

[More about dlt Workspace ->](../workspace/intro)
[More about dlt Workspace ->](../workspace/overview.md)


## Step 1: Initialize a custom pipeline
Expand Down
Loading
Loading