Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/website/docs/release-notes/1.16.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
title: "Release highlights: 1.16"
description: Release highlights provide a concise overview of the most important new features, improvements, and fixes in a software update, helping users quickly understand what's changed and how it impacts their workflow.
keywords: [dlt, data-pipelines, etl, release-notes, data-engineering]
---

# Release highlights: 1.16

## Smarter timestamp handling

This release brings a major cleanup and unification of how dlt handles timestamps and timezones. Both **tz-aware** and **naive** timestamps are now processed consistently across normalizers and destinations. dlt now also preserves **the exact timestamp type** when using incremental cursors, preventing subtle mismatches during reloads.

It also fixes several tricky edge cases — such as **nanosecond precision** in MSSQL — and ensures that all `time` types behave as documented (always naive, in UTC). Destinations now explicitly declare which timestamp formats they support, so dlt adjusts automatically.

Example:

```py
@dlt.resource(
name='my_table',
columns={
"my_column": {
"data_type": "timestamp",
"timezone": True}}
)
def my_resource():
...

# Output:
# naive timestamp → UTC tz-aware
# "2024-05-01 12:00:00" → "2024-05-01 12:00:00+00:00"

# tz-aware timestamp with timezone=False → UTC converted, then naive
# "2024-05-01 12:00:00+02:00" → "2024-05-01 10:00:00"
```

The result: predictable, precise timestamp behavior across all sources, transformations, and destinations.

[Read more →](../general-usage/schema#handling-of-timestamp-and-time-zones)

---

## dlt Education now in docs

Our free **education courses** are now part of the official documentation and can be launched directly in **Google Colab**. Try them [here](../tutorial/education).

![Screenshot 2025-10-14 at 11.03.38.png](https://storage.googleapis.com/dlt-blog-images/release-highlights/Screenshot%202025-10-14%20at%2011.03.38.png)

---

## Visualization updates

- New beta dashboard

The Streamlit app is now replaced by the **dlt Dashboard**. Run `dlt dashboard` to explore your pipelines with an improved interface and real-time insights. Learn more [here](../general-usage/dashboard).

![dashboard-overview.png](https://storage.googleapis.com/dlt-blog-images/release-highlights/dashboard-overview.png)

- Export schema graphs with `dlt.Schema.to_dot()`

![477615341-202f1937-8697-4bc9-acf1-e7b6ac7e9970.png](https://storage.googleapis.com/dlt-blog-images/release-highlights/477615341-202f1937-8697-4bc9-acf1-e7b6ac7e9970.png)

---

## Shout-out to new contributors

Big thanks to our newest contributors:

* [@Jinso-o](https://github.com/Jinso-o) — [#2986](https://github.com/dlt-hub/dlt/pull/2986)
* [@tpulmano](https://github.com/tpulmano) — [#2978](https://github.com/dlt-hub/dlt/pull/2978)

---

**Full release notes**

[View the complete list of changes →](https://github.com/dlt-hub/dlt/releases)

173 changes: 173 additions & 0 deletions docs/website/docs/release-notes/1.17.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
---
title: "Release highlights: 1.17"
description: Release highlights provide a concise overview of the most important new features, improvements, and fixes in a software update, helping users quickly understand what's changed and how it impacts their workflow.
keywords: [dlt, data-pipelines, etl, release-notes, data-engineering]
---

# Release highlights: 1.17

## New: DuckLake destination

You can now use the **DuckLake** destination — supporting all bucket and catalog combinations. It’s a great fit for lightweight data lakes and local development setups.

[Read the docs →](../dlt-ecosystem/destinations/ducklake)

---

## Custom metrics in pipelines

You can now **collect custom metrics** directly inside your resources and transform steps. This makes it easy to track things like page counts, skipped rows, or API calls — right where the data is extracted.

Use `dlt.current.resource_metrics()` to store custom values while your resource runs. These metrics are automatically merged into the pipeline trace and visible in the run summary.

Example:

```py
import dlt
from dlt.sources.helpers.rest_client import RESTClient
from dlt.sources.helpers.rest_client.paginators import JSONLinkPaginator

client = RESTClient(
base_url="https://pokeapi.co/api/v2",
paginator=JSONLinkPaginator(next_url_path="next"),
data_selector="results",
)

@dlt.resource
def get_pokemons():
metrics = dlt.current.resource_metrics()
metrics["page_count"] = 0
for page in client.paginate("/pokemon", params={"limit": 100}):
metrics["page_count"] += 1
yield page

pipeline = dlt.pipeline("get_pokemons", destination="duckdb")
load_info = pipeline.run(get_pokemons)
print("Custom metrics:", pipeline.last_trace.last_extract_info.metrics)
```

Custom metrics are grouped together with performance and transform stats under `resource_metrics`, so you can view them easily in traces or dashboards.

[Read more →](../general-usage/resource#collect-custom-metrics)

---

## Limit your data loads for testing

When working with large datasets, you can now **limit how much data a resource loads** using the new `add_limit` method. This is perfect for sampling a few records to preview your data or test transformations faster.

Example:

```py
import itertools
import dlt

# Load only the first 10 items from an infinite stream
r = dlt.resource(itertools.count(), name="infinity").add_limit(10)
```

You can also:

- Count **rows** instead of yields:

```py
my_resource().add_limit(10, count_rows=True)
```

- Or stop extraction after a **set time**:

```py
my_resource().add_limit(max_time=10)
```

It’s a simple but powerful way to test pipelines quickly without pulling millions of rows.

---

## Incremental loading for filesystem

You can now use **incremental loading** with the `filesystem` source even easier — ideal for tracking updated or newly added files in S3 or local folders.
dlt detects file changes (using fields like `modification_date`) and loads only what’s new.

Example:

```py
import dlt
from dlt.sources.filesystem import filesystem, read_parquet

filesystem_resource = filesystem(
bucket_url="s3://my-bucket/files",
file_glob="**/*.parquet",
incremental=dlt.sources.incremental("modification_date")
)
pipeline = dlt.pipeline("my_pipeline", destination="duckdb")
pipeline.run((filesystem_resource | read_parquet()).with_name("table_name"))
```

You can also **split large incremental loads** into smaller chunks:

- **Partition loading** – divide your files into ranges and load each independently (even in parallel).
- **Split loading** – process files sequentially in small batches using `row_order`, `files_per_page`, or `add_limit()`.

This makes it easy to backfill large file collections efficiently and resume incremental updates without reloading everything.

[Learn more →](../dlt-ecosystem/verified-sources/filesystem/basic#5-incremental-loading)

---

## Split and partition large SQL loads

When working with huge tables, you can now **split incremental loads into smaller chunks** or **partition backfills** into defined ranges.
This makes data appear faster and allows you to retry only failed chunks instead of reloading everything.

**Split loading**

If your source returns data in a deterministic order (for example, ordered by `created_at`), you can combine `incremental` with `add_limit()` to process batches sequentially:

```py
import dlt
from dlt.sources.sql_database import sql_table

pipeline = dlt.pipeline("split_load", destination="duckdb")

messages = sql_table(
table="chat_message",
incremental=dlt.sources.incremental(
"created_at",
row_order="asc", # required for split loading
range_start="open" # disables deduplication
),
)

# Load one-minute chunks until done
while not pipeline.run(messages.add_limit(max_time=60)).is_empty:
pass

```

**Partitioned backfills**

You can also load large datasets in **parallel partitions** using `initial_value` and `end_value`. Each range runs independently, helping you rebuild large tables safely and efficiently.

Together, these methods make incremental loading more flexible and robust for both testing and production-scale pipelines.

[Read more →](../dlt-ecosystem/verified-sources/sql_database/advanced#split-or-partition-long-incremental-loads)

---

## Shout-out to new contributors

Big thanks to our newest contributors:

* [@rik-adegeest](https://github.com/rik-adegeest) — [#3070](https://github.com/dlt-hub/dlt/pull/3070)
* [@AndreiBondarenko](https://github.com/AndreiBondarenko) — [#3086](https://github.com/dlt-hub/dlt/pull/3086)
* [@alkaline-0](https://github.com/alkaline-0) — [#3096](https://github.com/dlt-hub/dlt/pull/3096)
* [@ianedmundson1](https://github.com/ianedmundson1) — [#3043](https://github.com/dlt-hub/dlt/pull/3043)
* [@chulkilee](https://github.com/chulkilee) — [#3120](https://github.com/dlt-hub/dlt/pull/3120)

---

**Full release notes**

[View the complete list of changes →](https://github.com/dlt-hub/dlt/releases)

4 changes: 3 additions & 1 deletion docs/website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,9 @@ const sidebars = {
items: [
'release-notes/1.12.1',
'release-notes/1.13-1.14',
'release-notes/1.15'
'release-notes/1.15',
'release-notes/1.16',
'release-notes/1.17'
]
},
{
Expand Down
Loading