Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 30 additions & 34 deletions spiceaidocs/content/en/reference/Spicepod/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ datasets:
`spicepod.yaml`
```yaml
datasets:
- from: databricks.com/spiceai/datasets
- from: databricks:spiceai.datasets.specific_table
name: uniswap_eth_usd
params:
environment: prod
Expand All @@ -44,63 +44,59 @@ datasets:
retention: 30m
```

`spicepod.yaml`
```yaml
datasets:
- from: local/Users/phillip/data/test.parquet
name: test
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
```

Relative path example:

`spicepod.yaml`
```yaml
datasets:
- from: datasets/uniswap_v2_eth_usdc
- from: datasets/eth_recent_transactions
```

`datasets/uniswap_v2_eth_usdc/dataset.yaml`
`datasets/eth_recent_transactions/dataset.yaml`
```yaml
name: spiceai.uniswap_v2_eth_usdc
from: spiceai:spice.ai/eth.recent_transactions
name: eth_recent_transactions
type: overwrite
source: spice.ai
auth: spice.ai
acceleration:
enabled: true
refresh: 1h
```

## `name`
## `from`

The name of the dataset. This is used to reference the dataset in the pod manifest, as well as in external data sources.
The `from` field is a string that represents the Uniform Resource Identifier (URI) for the dataset. This URI is composed of two parts: a prefix indicating the source of the dataset, and the actual link to the dataset.

## `type`
The syntax for the `from` field is as follows:

The type of dataset. The following types are supported:
```yaml
from: <source>:<link>
```

- `overwrite` - Overwrites the dataset with the contents of the dataset source.
- `append` - Appends new data from dataset source to the dataset.
Where:

- `<source>`: The source of the dataset

Currently supported sources:
- `spiceai`
- `dremio`
- `databricks`
- `s3`
- `postgres`

## `source`
If the source is not specified, it defaults to `spiceai`.

The source of the dataset. The following sources are supported:
- `<link>`: The actual link to the dataset.

- `spice.ai`
- `dremio` (coming soon)
- `databricks` (coming soon)
## `name`

## `auth`
The name of the dataset. This is used to reference the dataset in the pod manifest, as well as in external data sources.

Optional. The authentication profile to use to connect to the dataset source. Use `spice login` to create a new authentication profile.
## `type`

If not specified, the default profile for the data source is used.
The type of dataset. The following types are supported:

- `overwrite` - Overwrites the dataset with the contents of the dataset source.
- `append` - Appends new data from dataset source to the dataset.

## `acceleration`

Expand Down