Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions website/docs/components/embeddings/huggingface.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ sidebar_position: 3

To use an embedding model from HuggingFace with Spice, specify the `huggingface` path in the `from` field of your configuration. The model and its related files will be automatically downloaded, loaded, and served locally by Spice.

The following parameters are specific to HuggingFace models:

| Parameter | Description | Default |
| ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| `hf_token` | The Huggingface access token. | - |
| `pooling` | The [pooling method](https://huggingface.co/docs/text-embeddings-inference/en/cli_arguments) for embedding models. Supported values are `cls`, `mean`, `splade`, `last_token` | - |

Here is an example configuration in `spicepod.yaml`:

```yaml
Expand Down
119 changes: 72 additions & 47 deletions website/docs/components/models/huggingface.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,36 +7,48 @@ sidebar_position: 4

To use a model hosted on HuggingFace, specify the `huggingface.co` path in the `from` field and, when needed, the files to include.

### Example: Load a ML model to predict taxi trips outcomes
```yaml
models:
- from: huggingface:huggingface.co/spiceai/darts:latest
name: hf_model
files:
- path: model.onnx
datasets:
- taxi_trips
```
## Configuration

### Example: Load a LLM model to generate text
```yaml
models:
- from: huggingface:huggingface.co/microsoft/Phi-3.5-mini-instruct
name: phi
```
### `from`

### Example: Load a private model
```yaml
models:
- name: llama_3.2_1B
from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
params:
hf_token: ${ secrets:HF_TOKEN }
The `from` key takes the form of `huggingface:model_path`. Below shows 2 common example of `from` key configuration.

- `huggingface:username/modelname`: Implies the latest version of `modelname` hosted by `username`.
- `huggingface:huggingface.co/username/modelname:revision`: Specifies a particular `revision` of `modelname` by `username`, including the optional domain.

The `from` key follows the following regex format.

```regex
\A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\z
```
For more details on authentication, see [below](#access-tokens).

The `from` key consists of five components:

1. **Prefix:** The value must start with `huggingface:`.
2. **Domain (Optional):** Optionally includes `huggingface.co/` immediately after the prefix. Currently no other Huggingface compatible services are supported.
3. **Organization/User:** The HuggingFace organization (`org`).
4. **Model Name:** After a `/`, the model name (`model`).
5. **Revision (Optional):** A colon (`:`) followed by the git-like revision identifier (`revision`).

### `name`

The model name. This will be used as the model ID within Spice and Spice's endpoints (i.e. `http://localhost:8090/v1/models`). This can be set to the same value as the model ID in the `from` field.

### `params`

| Param | Description | Default |
| --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| `hf_token` | The Huggingface access token. | - |
| `model_type` | The architecture to load the model as. Supported values: `mistral`, `gemma`, `mixtral`, `llama`, `phi2`, `phi3`, `qwen2`, `gemma2`, `starcoder2`, `phi3.5moe`, `deepseekv2`, `deepseekv3` | - |
| `tools` | Which [tools] should be made available to the model. Set to `auto` to use all available tools. | - |
| `system_prompt` | An additional system prompt used for all chat completions to this model. | - |

### `files`

The specific file path for Huggingface model. For example, GGUF model formats require a specific file path, other varieties (e.g. `.safetensors`) are inferred.

#### Example

### Example: Load a GGUF model
```yaml
models:
- from: huggingface:huggingface.co/lmstudio-community/Qwen2.5-Coder-3B-Instruct-GGUF
Expand All @@ -45,37 +57,44 @@ models:
- path: Qwen2.5-Coder-3B-Instruct-Q3_K_L.gguf
```

:::note
Only GGUF model formats require a specific file path, other varieties (e.g. `.safetensors`) are inferred.
:::
## Access Tokens

## `from` Format
Access tokens can be provided for Huggingface models in two ways:

The `from` key follows the following regex format:
1. In the Huggingface token cache (i.e. `~/.cache/huggingface/token`). Default.
1. Via [model params](#params).

```regex
\A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\z
```yaml
models:
- name: llama_3.2_1B
from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
params:
hf_token: ${ secrets:HF_TOKEN }
```

### Examples
## Examples

- `huggingface:username/modelname`: Implies the latest version of `modelname` hosted by `username`.
- `huggingface:huggingface.co/username/modelname:revision`: Specifies a particular `revision` of `modelname` by `username`, including the optional domain.
### Load a ML model to predict taxi trips outcomes

### Specification

1. **Prefix:** The value must start with `huggingface:`.
2. **Domain (Optional):** Optionally includes `huggingface.co/` immediately after the prefix. Currently no other Huggingface compatible services are supported.
3. **Organization/User:** The HuggingFace organization (`org`).
4. **Model Name:** After a `/`, the model name (`model`).
5. **Revision (Optional):** A colon (`:`) followed by the git-like revision identifier (`revision`).
```yaml
models:
- from: huggingface:huggingface.co/spiceai/darts:latest
name: hf_model
files:
- path: model.onnx
datasets:
- taxi_trips
```

### Access Tokens
### Load a LLM model to generate text

Access tokens can be provided for Huggingface models in two ways:
```yaml
models:
- from: huggingface:huggingface.co/microsoft/Phi-3.5-mini-instruct
name: phi
```

1. In the Huggingface token cache (i.e. `~/.cache/huggingface/token`). Default.
1. Via model params (see below).
### Load a private model

```yaml
models:
Expand All @@ -85,8 +104,14 @@ models:
hf_token: ${ secrets:HF_TOKEN }
```

For more details on authentication, see [access tokens](#access-tokens).

:::warning[Limitations]

- The throughput, concurrency & latency of a locally hosted model will vary based on the underlying hardware and model size. Spice supports [Apple metal](../../installation.md#metal-support) and [CUDA](../../installation.md#cuda-support) for accelerated inference.
- ML models currently only support ONNX file format.
:::
:::

## Cookbook

- Use the Llama family of models locally from HuggingFace using Spice. [Running Llama3 Locally](https://github.com/spiceai/cookbook/blob/trunk/llama/README.md)
Loading