ML docs (#159)

Jeadie · phillipleblanc · web-flow · commit c7b474aa199e · 2024-03-28T07:55:54.000+09:00
* draft of ML docs

* improvements and POST /predict

* code formatting for bash

* and JSON

* Apply suggestions from code review

* Update spiceaidocs/docs/machine-learning/index.md

---------

Co-authored-by: Phillip LeBlanc &lt;phillip@spiceai.io&gt;
diff --git a/spiceaidocs/docs/machine-learning/index.md b/spiceaidocs/docs/machine-learning/index.md
@@ -10,3 +10,28 @@ sidebar_position: 8
 The Spice ML runtime is in its early preview phase and is subject to modifications.
 
 :::
+
+Machine learning models can be added to the Spice runtime similarly to datasets. The Spice runtime will load it, just like a dataset. 
+```yaml
+name: my_spicepod
+version: v1beta1
+kind: Spicepod
+
+models:
+  - from: file:/model_path.onnx
+    name: my_model_name
+    datasets:
+      - my_inference_view
+
+datasets:
+  - from: localhost
+    name: my_inference_view
+    sql_ref: inference.sql
+
+    # All your other datasets
+  - from: spice.ai/eth.recent_blocks
+    name: eth_recent_blocks
+    acceleration:
+        enabled: true
+        refresh_mode: append
+```
diff --git a/spiceaidocs/docs/machine-learning/inference/index.md b/spiceaidocs/docs/machine-learning/inference/index.md
@@ -4,3 +4,101 @@ sidebar_label: 'Machine Learning Inference'
 description: ''
 sidebar_position: 2
 ---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+The Spice ML runtime currently supports prediction via an API in the Spice runtime. 
+
+### GET `/v1/models/:name/predict`
+```shell
+curl "http://localhost:3000/v1/models/my_model_name/predict"
+```
+Where: 
+ - `name`: References the name provided in the `spicepod.yaml`.
+
+
+#### Response
+<Tabs>
+  <TabItem value="Success" label="Success" default>
+    ```json
+    {
+        "status": "Success",
+        "model_name": "my_model_name",
+        "model_version": "1.0",
+        "lookback": 30,
+        "prediction": [0.45, 0.50, 0.55],
+        "duration_ms": 123
+    }
+    ```
+  </TabItem>
+  <TabItem value="Bad Request" label="Bad Request">
+    ```json
+    {
+        "status": "BadRequest",
+        "error_message": "You have me a bad request :(",
+        "model_name": "my_model_name",
+        "lookback": 30,
+        "duration_ms": 12
+    }
+    ```
+  </TabItem>
+  <TabItem value="Internal Error" label="Internal Error">
+    ```json
+    {
+        "status": "InternalError",
+        "error_message": "Oops, the server couldn't predict",
+        "model_name": "my_model_name",
+        "lookback": 30,
+        "duration_ms": 12
+    }
+    ```
+  </TabItem>
+</Tabs>
+
+### POST `/v1/predict`
+It's also possible to run multiple prediction models in parallel, useful for ensembling or A/B testing. 
+```shell
+curl --request POST \
+  --url http://localhost:3000/v1/predict \
+  --data '{
+    "predictions": [
+        {
+            "model_name": "drive_stats_a"
+        },
+        {
+            "model_name": "drive_stats_b"
+        }
+    ]
+}'
+```
+Where:
+  - Each `model_name` provided references a model `name` in the Spicepod.
+
+#### 
+```json
+{
+    "duration_ms": 81,
+    "predictions": [{
+        "status": "Success",
+        "model_name": "drive_stats_a",
+        "model_version": "1.0",
+        "lookback": 30,
+        "prediction": [0.45, 0.5, 0.55],
+        "duration_ms": 42
+    }, {
+        "status": "Success",
+        "model_name": "drive_stats_b",
+        "model_version": "1.0",
+        "lookback": 30,
+        "prediction": [0.43, 0.51, 0.53],
+        "duration_ms": 42
+    }]
+}
+```
+
+## Limitations
+- Univariate predictions only
+- Multiple covariates 
+- Covariate and output variate must have a fixed time frequency.
+- No support for discrete or exogenous variables.
diff --git a/spiceaidocs/docs/machine-learning/model-deployment/huggingface.md b/spiceaidocs/docs/machine-learning/model-deployment/huggingface.md
@@ -2,4 +2,37 @@
 title: "Huggingface"
 sidebar_label: "Huggingface"
 sidebar_position: 1
----
+---
+
+To define a model component from HuggingFace, specify it in the `from` key.
+
+### Example
+```yaml
+models:
+  - from: huggingface:huggingface.co/spiceai/darts:latest
+    name: hf_model
+    files:
+      - model.onnx
+    datasets:
+      - taxi_trips
+```
+
+### `from` Format
+The `from` key follows the following regex format:
+```regex
+\A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\z
+```
+#### Examples
+- `huggingface:username/modelname`: Implies the latest version of `modelname` hosted by `username`.
+- `huggingface:huggingface.co/username/modelname:revision`: Specifies a particular `revision` of `modelname` by `username`, including the optional domain.
+
+#### Specification
+1. **Prefix:** The value must start with `huggingface:`.
+2. **Domain (Optional):** Optionally includes `huggingface.co/` immediately after the prefix. Currently no other Huggingface compatible services are supported. 
+3. **Organization/User:** The HuggingFace organisation (`org`).
+4. **Model Name:** After a `/`, the model name (`model`).
+5. **Revision (Optional):** A colon (`:`) followed by the git-like revision identifier (`revision`).
+
+
+### Limitations
+- ONNX format support only
diff --git a/spiceaidocs/docs/machine-learning/model-deployment/index.md b/spiceaidocs/docs/machine-learning/model-deployment/index.md
@@ -4,3 +4,19 @@ sidebar_label: 'ML Model Deployment'
 description: ''
 sidebar_position: 1
 ---
+
+Models can be loaded from a variety of sources: 
+- Local filesystem: Local ONNX files.
+- HuggingFace: Models Hosted on HuggingFace.
+- SpiceAI: Models trained on the Spice.AI Cloud Platform
+
+A model component, within a Spicepod, has the following format. 
+
+
+| field             | Description                                                         |
+| ----------------- | ------------------------------------------------------------------- | 
+| `name`            | Unique, readable name for the model within the Spicepod.            | 
+| `from`            | Source-specific address to uniquely identify a model              | 
+| `datasets`        | Datasets that the model depends on for inference                    | 
+| `files` (HF only) | Specify an individual file within the HuggingFace repository to use | 
+ 
diff --git a/spiceaidocs/docs/machine-learning/model-deployment/local.md b/spiceaidocs/docs/machine-learning/model-deployment/local.md
@@ -0,0 +1,16 @@
+---
+title: "Local"
+sidebar_label: "Local"
+sidebar_position: 3
+---
+
+Local models can be used by specifying the file's path in `from` key.
+
+### Example
+```yaml
+models:
+  - from: file:/absolute/path/to/my/model.onnx
+    name: local_model
+    datasets:
+      - taxi_trips
+```
diff --git a/spiceaidocs/docs/machine-learning/model-deployment/spiceai.md b/spiceaidocs/docs/machine-learning/model-deployment/spiceai.md
@@ -2,4 +2,45 @@
 title: "SpiceAI"
 sidebar_label: "SpiceAI"
 sidebar_position: 2
----
+---
+
+### Example
+To run a model trained on the Spice.AI platform, specify it in the `from` key.
+```yaml
+models:
+  - from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats
+    name: drive_stats
+    datasets:
+      - drive_stats_inferencing
+```
+
+This configuration allows for specifying models hosted by Spice AI, including their versions or specific training run IDs.
+```yaml
+models:
+  - from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:latest # Git-like tagging
+    name: drive_stats_a
+    datasets:
+      - drive_stats_inferencing
+
+  - from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:60cb80a2-d59b-45c4-9b68-0946303bdcaf # Specific training run ID
+    name: drive_stats_b
+    datasets:
+      - drive_stats_inferencing
+```
+
+### `from` Format
+The from key must conform to the following regex format:
+```regex
+\A(?:spice\.ai\/)?(?<org>[\w\-]+)\/(?<app>[\w\-]+)(?:\/models)?\/(?<model>[\w\-]+):(?<version>[\w\d\-\.]+)\z
+```
+
+#### Examples
+- `spice.ai/lukekim/smart/models/drive_stats:latest`: Refers to the latest version of the drive_stats model in the smart application by the user or organization lukekim.
+- `spice.ai/lukekim/smart/drive_stats:60cb80a2-d59b-45c4-9b68-0946303bdcaf`: Specifies a model with a unique training run ID.
+
+#### Specification
+1. **Prefix (Optional):** The value must start with `spice.ai/`.
+1. **Organization/User:** The name of the organization or user (`org`) hosting the model.
+1. **Application Name**: The name of the application (`app`) which the model belongs to.
+4. **Model Name:** The name of the model (`model`).
+5. **Version (Optional):** A colon (`:`) followed by the version identifier (`version`), which could be a semantic version, `latest` for the most recent version, or a specific training run ID.
diff --git a/spiceaidocs/docusaurus.config.ts b/spiceaidocs/docusaurus.config.ts
@@ -141,6 +141,7 @@ const config: Config = {
     prism: {
       theme: prismThemes.github,
       darkTheme: prismThemes.dracula,
+      additionalLanguages: ['bash', 'json'],
     },
     algolia: {
       appId: '0SP8I8JTL8',