Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions website/docs/features/data-acceleration/data-refresh.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,20 @@ datasets:

If late arriving data or clock-skew needs to be accounted for, an optional overlap can also be specified. See [`acceleration.refresh_append_overlap`](/docs/reference/spicepod/datasets#accelerationrefresh_append_overlap).

Datasets that are partitioned by a less-granular time-column (e.g. day, month, year) can also use the `time_partition_column` parameter in addition to the `time_column` parameter to specify the time-column to use for efficient partition pruning.

Example:

```yaml
datasets:
- from: databricks:my_dataset
name: accelerated_dataset
time_column: created_at
time_format: iso8601
time_partition_column: created_at_day
time_partition_format: date
```

### Changes (CDC)

Datasets configured with acceleration `refresh_mode: changes` requires a [Change Data Capture (CDC)](/docs/features/cdc/index.md) supported data connector. Initial CDC support in Spice is supported by the [Debezium data connector](/docs/components/data-connectors/debezium.md).
Expand Down
9 changes: 9 additions & 0 deletions website/docs/reference/spicepod/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Optional. The format of the `time_column`. The following values are supported:
- `unix_seconds` - Unix timestamp in seconds. E.g. `1718756687`.
- `unix_millis` - Unix timestamp in milliseconds. E.g. `1718756687000`.
- `ISO8601` - [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format.
- `date` - Date in YYYY-MM-DD format. E.g. `2024-01-01`.

Spice emits a warning if the `time_column` from the data source is incompatible with the `time_format` config.

Expand All @@ -159,6 +160,14 @@ Spice emits a warning if the `time_column` from the data source is incompatible

:::

## `time_partition_column`

(Optional) Specify the column that represents the physical partitioning of the dataset when using append-based acceleration. When the defined `time_column` is a fine-grained timestamp and the dataset is physically partitioned by a coarser granularity (for example, by date), setting `time_partition_column` to the partition column (e.g. date_col) improves partition pruning, excludes irrelevant partitions during refreshes, and optimizes scan efficiency.

## `time_partition_format`

(Optional) Define the format of the `time_partition_column`. For instance, if the physical partitions follow a date format (YYYY-MM-DD), set this value to `date`. The same format options as `time_format` are supported for `time_partition_column`.

## `unsupported_type_action`

Optional. Specifies the action to take when a data type that is not supported by the data connector is encountered.
Expand Down
Loading