Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 0 additions & 32 deletions spiceaidocs/content/en/concepts/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,35 +25,3 @@ A `Pod` is a package of configuration and data used to train and deploy Spice.ai
A `Pod manifest` is a YAML file that describes how to connect data with a learning environment.

A Pod is constructed from the following components:

### Dataspace

A [dataspace]({{<ref "concepts/dataspaces">}}) is a specification on how the Spice.ai runtime and AI engine loads, processes and interacts with data from a single source. A dataspace may contain a single data connector and data processor. There may be multiple dataspace definitions within a pod. The fields specified in the union of dataspaces are used as inputs to the neural networks that Spice.ai trains.

A dataspace that doesn't contain a data connector/processor means that the observation data for this dataspace will be provided by calling [POST /pods/{pod}/observations]({{<ref api>}}).

### Data Connector

A [data connector]({{<ref "reference/pod#data-connector">}}) is a reuseable component that contains logic to fetch or ingest data from an external source. Spice.ai provides a general interface that anyone can implement to create a data connector, see the [data-components-contrib](https://github.com/spiceai/data-components-contrib/tree/trunk/dataconnectors) repo for more information.

### Data Processor

A [data processor]({{<ref "reference/pod#data-processor">}}) is a reusable component, composable with a data connector that contains logic to process raw connector data into [observations]({{<ref "api#observations">}}) and state Spice.ai can use.

Spice.ai provides a general interface that anyone can implement to create a data processor, see the [data-components-contrib](https://github.com/spiceai/data-components-contrib/tree/trunk/dataprocessors) repo for more information.

### Actions

[Actions]({{<ref "reference/pod#actions">}}) are the set of actions the Spice.ai runtime can recommend for a pod.

### Recommendations

To intelligently adapt its behavior, an application should query the Spice.ai runtime for which [action]({{<ref "reference/pod#actions">}}) it recommends to take given a specified time. The result of this query is a [recommendation]({{<ref "concepts/recommendations">}}).

If a time is not specified, the resulting recommendation query time will default to the time of the most recently ingested observation.

### Training Rewards

[Training Rewards]({{<ref "reference/pod#rewards">}}) are code definitions in Python that tell the Spice.ai AI Engine how to train the neural networks to achieve the desired goal. A reward is defined for each action specified in the pod.

In the future we will expand the languages we support for writing the reward functions in. [Let us know](mailto:hey@spiceai.io) which language you want to be able to write your reward functions in!
72 changes: 0 additions & 72 deletions spiceaidocs/content/en/concepts/rewards/_index.md

This file was deleted.

71 changes: 0 additions & 71 deletions spiceaidocs/content/en/concepts/rewards/external.md

This file was deleted.

2 changes: 0 additions & 2 deletions spiceaidocs/content/en/concepts/time/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@ params:

If not provided in the manifest, Spicepods will default to a period of **3 days**, intervals of **1 min**, and granularity of **10 seconds**. The period epoch will default to a dynamic epoch of the current time minus the period. In this mode, the period becomes a sliding window over time.

See reference documentation for [Spicepod params]({{<ref "reference/pod#params">}}).

### Period

The `period` defines the entire timespan the Spicepod will use for learning and decision-making.
Expand Down
14 changes: 13 additions & 1 deletion spiceaidocs/content/en/reference/Spicepod/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ metadata:

## `datasets`

A Spicepod can contain one or more [datasets](https://docs.spice.ai/reference/specifications/dataset-and-view-yaml-specification) referenced by relative path.
A Spicepod can contain one or more [datasets]({{<ref "reference/Spicepod/datasets">}}) referenced by relative path.

**Example**

Expand All @@ -60,6 +60,18 @@ datasets:
dependsOn: datasets/uniswap_eth_usdc
```

A dataset defined inline.

```yaml
datasets:
- name: spiceai.uniswap_v2_eth_usdc
type: overwrite
source: spice.ai
acceleration:
enabled: true
refresh: 1h
```

## `functions`

A Spicepod can contain one or more [functions](https://docs.spice.ai/reference/specifications/spice-functions-yaml-specification) referenced by relative path.
Expand Down
125 changes: 125 additions & 0 deletions spiceaidocs/content/en/reference/Spicepod/datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
type: docs
title: "Datasets"
linkTitle: "Datasets"
description: 'Datasets YAML reference'
weight: 80
---

A Spicepod can contain one or more datasets referenced by relative path, or defined inline.

# `datasets`

Inline example:

`spicepod.yaml`
```yaml
datasets:
- from: spice.ai/eth/beacon/eigenlayer
name: strategy_manager_deposits
params:
app: goerli-app
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
```

`spicepod.yaml`
```yaml
datasets:
- from: databricks.com/spiceai/datasets
name: uniswap_eth_usd
params:
environment: prod
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
```

`spicepod.yaml`
```yaml
datasets:
- from: local/Users/phillip/data/test.parquet
name: test
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
```

Relative path example:

`spicepod.yaml`
```yaml
datasets:
- from: datasets/uniswap_v2_eth_usdc
```

`datasets/uniswap_v2_eth_usdc/dataset.yaml`
```yaml
name: spiceai.uniswap_v2_eth_usdc
type: overwrite
source: spice.ai
auth: spice.ai
acceleration:
enabled: true
refresh: 1h
```

## `name`

The name of the dataset. This is used to reference the dataset in the pod manifest, as well as in external data sources.

## `type`

The type of dataset. The following types are supported:

- `overwrite` - Overwrites the dataset with the contents of the dataset source.
- `append` - Appends new data from dataset source to the dataset.

## `source`

The source of the dataset. The following sources are supported:

- `spice.ai`
- `dremio` (coming soon)
- `databricks` (coming soon)

## `auth`

Optional. The authentication profile to use to connect to the dataset source. Use `spice login` to create a new authentication profile.

If not specified, the default profile for the data source is used.

## `acceleration`

Optional. Accelerate queries to the dataset by caching data locally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't use the word caching here (or even locally).


## `acceleration.enabled`

Optional. Enable or disable acceleration.

## `acceleration.refresh`

Optional. The interval to refresh the data for the dataset if the dataset type is overwrite. Specified as a [duration literal]({{<ref "reference/duration">}}).

For `append` datasets, the refresh interval not used.

i.e. `1h` for 1 hour, `1m` for 1 minute, `1s` for 1 second, etc.

## `acceleration.retention`

Optional. Only supported for `append` datasets. Specifies how long to retain data updates from the data source before they are deleted. Specified as a [duration literal]({{<ref "reference/duration">}}).

If not specified, the default retention is to keep all data.
Loading