-
Couldn't load subscription status.
- Fork 353
Add dlthub intro docs #3241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dlthub intro docs #3241
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,41 +3,121 @@ title: Introduction | |
| description: Introduction to dltHub | ||
| --- | ||
|
|
||
| # What is dltHub? | ||
| ## What is dltHub? | ||
|
|
||
|  | ||
| dltHub is an LLM-native data engineering platform that lets any Python developer build, run, and operate production-grade data pipelines, and deliver end-user-ready insights without managing infrastructure. | ||
|
|
||
| dltHub is a commercial extension to the open-source data load tool (dlt). It augments it with a set of features like transformations, data validations, | ||
| iceberg with full catalog support and provides a yaml interface to define data platforms. dltHub features include: | ||
| dltHub is built around the open-source library [dlt](../intro.md). It uses the same core concepts (sources, destinations, pipelines) and extends the extract-and-load focus of `dlt` with: | ||
|
|
||
| - [@dlt.hub.transformation](features/transformations/index.md) - powerful Python decorator to build transformation pipelines and notebooks | ||
| - [dbt transformations](features/transformations/dbt-transformations.md): a staging layer for data transformations, combining a local cache with schema enforcement, debugging tools, and integration with existing data workflows. | ||
| - [Iceberg support](ecosystem/iceberg.md) | ||
| - [Secure data access and sharing](features/data-access.md) | ||
| - [AI workflows](features/ai.md): agents to augment your data engineering team. | ||
| * Enhanced developer experience | ||
| * Transformations | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should mention workspace? ie "Extended developer experience" or something like that. |
||
| * Data quality | ||
| * AI-assisted (“agentic”) workflows | ||
| * Managed runtime | ||
|
|
||
| To get started with dltHub, install the library using pip (Python 3.9-3.12): | ||
| dltHub supports both local and managed cloud development. A single developer can deploy and operate pipelines, transformations, and notebooks directly from a dltHub Workspace, using a single command. | ||
| The dltHub Runtime, customizable pipeline dashboard, and validation tools make it straightforward to monitor, troubleshoot, and keep data reliable throughout the whole end-to-end data workflow: | ||
|
|
||
| ```sh | ||
| pip install dlthub | ||
| ```mermaid | ||
| flowchart LR | ||
| A[Create a pipeline] --> B[Ensure data quality] | ||
| B --> C[Create reports & transformations] | ||
| C --> D[Deploy Workspace] | ||
| D --> E[Maintain data quality] | ||
| E --> F[Share] | ||
| ``` | ||
|
|
||
| You can try out any features by self-issuing a trial license. You can use such license for evaluation, development and testing. | ||
| Trial license are issued off-line using `dlt license` command: | ||
| In practice, this means any Python developer can: | ||
|
|
||
| 1. Display a list of available features | ||
| ```sh | ||
| dlt license scopes | ||
| ``` | ||
| * Build and customize data pipelines quickly (with LLM help when desired). | ||
| * Derisk data insights by keeping data quality high with checks, tests, and alerts. | ||
| * Ship fresh dashboards, reports, and data apps. | ||
| * Scale the data workflows easily without babysitting infra, schema drift, and silent failures. | ||
|
|
||
| 2. Issue license for the feature you want to test. | ||
|
|
||
| ```sh | ||
| dlt license issue dlthub.transformation | ||
| ``` | ||
|
|
||
| The command above will enable access to new `@dlt.hub.transformation` decorator. Note that you may | ||
| self issue licenses several times and the command above will carry-over features from previously issued license. | ||
| :::tip | ||
| Want to see it end-to-end? Watch the dltHub [Workspace demo](https://youtu.be/rmpiFSCV8aA). | ||
| ::: | ||
|
|
||
| To get started quickly, follow the [installation instructions](getting-started/installation.md). | ||
|
|
||
| ## Overview | ||
|
|
||
| ### Key capabilities | ||
|
|
||
| 1. **[LLM-native workflow](../dlt-ecosystem/llm-tooling/llm-native-workflow)**: accelerate pipeline authoring and maintenance with guided prompts and copilot experiences. | ||
|
|
||
| 2. **[Transformations](features/transformations/index.md)**: write Python or SQL transformations with `@dlt.hub.transformation`, orchestrated within your pipeline. | ||
|
|
||
| 3. **[Data quality](features/quality/data-quality.md)**: define correctness rules, run checks, and fail fast with actionable messages. | ||
|
|
||
| 4. **[Data apps & sharing](../general-usage/dataset-access/marimo)**: build lightweight, shareable data apps and notebooks for consumers. | ||
|
|
||
| 5. **[AI agentic support](features/mcp-server.md)**: use MCP servers to analyze pipelines and datasets. | ||
| 6. **Managed runtime**: deploy and run with a single command—no infra to provision or patch. | ||
| 7. **[Storage choice](ecosystem/iceberg.md)**: pick managed Iceberg-based lakehouse, DuckLake, or bring your own storage. | ||
|
|
||
| ### How dltHub fits with dlt (OSS) | ||
|
|
||
| dltHub embraces the dlt library, not replaces it: | ||
| * dlt (OSS): Python library focused on extract & load with strong typing and schema handling. | ||
| * dltHub: Adds transformations, quality, agentic tooling, managed runtime, and storage choices, so you can move from local dev to production seamlessly. | ||
|
|
||
| If you like the dlt developer experience, dltHub gives you everything around it to run in production with less toil. | ||
|
|
||
| ## dltHub products | ||
| dltHub consists of three main products. You can use them together or compose them based on your needs. | ||
|
|
||
| ### Workspace | ||
|
|
||
| **[Workspace](workspace/overview.md) [Public preview]** - the unified environment for building, running, and maintaining data workflows end-to-end. | ||
|
|
||
| * Scaffolding and LLM helpers for faster pipeline creation. | ||
| * Integrated transformations (@dlt.hub.transformation decorator). | ||
| * Data quality rules, test runs, and result surfacing. | ||
| * Notebook and data apps (e.g., Marimo) for sharing insights. | ||
| * Visual dashboards for pipeline health and run history. | ||
|
|
||
| ### Runtime [Private preview] | ||
|
|
||
| **Runtime** - a managed cloud runtime operated by dltHub: | ||
|
|
||
| * Scalable execution for pipelines and transformations. | ||
| * APIs, web interfaces, and auxiliary services. | ||
| * Secure, multi-tenant infrastructure with upgrades and patching handled for you. | ||
|
|
||
| :::tip | ||
| Prefer full control? See [Enterprise](#tiers--licensing) below for self-managed options. | ||
| ::: | ||
|
|
||
| ### Storage | ||
|
|
||
| **[Storage](ecosystem/iceberg.md) [In development]**. Choose where your data lives: | ||
|
|
||
| * Managed lakehouse: Iceberg open table format (or DuckLake) managed by dltHub. | ||
| * Bring your own storage: connect to your own lake/warehouse when needed. | ||
|
|
||
| ## Tiers & licensing | ||
|
|
||
| Some of the features described in this documentation are free to use. Others require a paid plan. Latest pricing & full feature matrix can be found live on our website. | ||
| Most features support a self-guided trial right after install, check the [installation instructions](getting-started/installation.md) for more information. | ||
|
|
||
| | Tier | Best for | Runtime | Typical use case | Notes | Availability | | ||
| | --------------------- | ------------------------------------------------------------------------------------------ | ------------------------------ | ---------------------------------------------------------------------------- | ---------------------------------------------- |-----------------| | ||
| | **dltHub Basic** | Solo developers or small teams owning a **single pipeline + dataset + reports** end-to-end | Managed dltHub Runtime | Set up a pipeline quickly, add tests and transformations, share a simple app | Optimized for velocity with minimal setup | Private preview | | ||
| | **dltHub Scale** | Data teams building **composable data platforms** with governance and collaboration | Managed dltHub Runtime | Multiple pipelines, shared assets, team workflows, observability | Team features and extended governance | Alpha | | ||
| | **dltHub Enterprise** | Organizations needing **enterprise controls** or **self-managed runtime** | Managed or self-hosted Runtime | On-prem/VPC deployments, custom licensing, advanced security | Enterprise features and deployment flexibility | In developement | | ||
|
|
||
|
|
||
| ### Who is dltHub for? | ||
|
|
||
| * Python developers who want production outcomes without becoming infra experts. | ||
| * Lean data teams standardizing on dlt and wanting integrated quality, transforms, and sharing. | ||
| * Organizations that prefer managed operations but need open formats and portability. | ||
|
|
||
| 3. Do not forget to read our [EULA](EULA.md) and [Special Terms](EULA.md#specific-terms-for-the-self-issued-trial-license-self-issued-trial-terms) | ||
| for self issued licenses. | ||
| :::note | ||
| * You can start on Basic and upgrade to Scale or Enterprise later, no code rewrites. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure about that tbh. I think Scale will be more opinionated on how to write code. It will be more declarative. But what we can do for sure: anything that works in OSS will work in dlthub without code changes. |
||
| * We favor open formats and portable storage (e.g., Iceberg), whether you choose our managed lakehouse or bring your own. | ||
| * For exact features and pricing, check the site; this section is meant to help you choose a sensible starting point. | ||
| ::: | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the diagram you've built for LLM_native data platform book? it was very useful