Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
248 changes: 248 additions & 0 deletions Agent-Skill/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
---
name: feast-user-guide
description: Guide for working with Feast (Feature Store) — defining features, configuring feature_store.yaml, retrieving features online/offline, using the CLI, and building RAG retrieval pipelines. Use when the user asks about creating entities, feature views, on-demand feature views, stream feature views, feature services, data sources, feature_store.yaml configuration, feast apply/materialize commands, online or historical feature retrieval, or vector-based document retrieval with Feast.
---

# Feast User Guide

## Quick Start

A Feast project requires:
1. A `feature_store.yaml` config file
2. Python files defining entities, data sources, feature views, and feature services
3. Running `feast apply` to register definitions

```bash
feast init my_project
cd my_project
feast apply
```

## Core Concepts

### Entity
An entity is a collection of semantically related features (e.g., a customer, a driver). Entities have join keys used to look up features.

```python
from feast import Entity
from feast.value_type import ValueType

driver = Entity(
name="driver_id",
description="Driver identifier",
value_type=ValueType.INT64,
)
```

### Data Sources
Data sources describe where raw feature data lives.

```python
from feast import FileSource, BigQuerySource, KafkaSource, PushSource, RequestSource
from feast.data_format import ParquetFormat

# Batch source (file)
driver_stats_source = FileSource(
name="driver_stats_source",
path="data/driver_stats.parquet",
timestamp_field="event_timestamp",
created_timestamp_column="created",
)

# Request source (for on-demand features)
input_request = RequestSource(
name="vals_to_add",
schema=[Field(name="val_to_add", dtype=Float64)],
)
```

### FeatureView
Maps features from a data source to entities with a schema, TTL, and online/offline settings.

```python
from feast import FeatureView, Field
from feast.types import Float32, Int64, String
from datetime import timedelta

driver_hourly_stats = FeatureView(
name="driver_hourly_stats",
entities=[driver],
ttl=timedelta(days=365),
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int64),
],
online=True,
source=driver_stats_source,
)
```

### OnDemandFeatureView
Computes features at request time from other feature views and/or request data.

```python
from feast import on_demand_feature_view
import pandas as pd

@on_demand_feature_view(
sources=[driver_hourly_stats, input_request],
schema=[Field(name="conv_rate_plus_val", dtype=Float64)],
mode="pandas",
)
def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df["conv_rate_plus_val"] = inputs["conv_rate"] + inputs["val_to_add"]
return df
```

### FeatureService
Groups features from multiple views for retrieval.

```python
from feast import FeatureService

driver_fs = FeatureService(
name="driver_ranking",
features=[driver_hourly_stats, transformed_conv_rate],
)
```

## Feature Retrieval

### Online (low-latency)
```python
from feast import FeatureStore

store = FeatureStore(repo_path=".")

features = store.get_online_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
],
entity_rows=[{"driver_id": 1001}, {"driver_id": 1002}],
).to_dict()
```

### Historical (training data with point-in-time joins)
```python
entity_df = pd.DataFrame({
"driver_id": [1001, 1002],
"event_timestamp": [datetime(2023, 1, 1), datetime(2023, 1, 2)],
})

training_df = store.get_historical_features(
entity_df=entity_df,
features=["driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate"],
).to_df()
```

Or use a FeatureService:
```python
training_df = store.get_historical_features(
entity_df=entity_df,
features=driver_fs,
).to_df()
```

## Materialization

Load features from offline store into online store:

```bash
# Full materialization over a time range
feast materialize 2023-01-01T00:00:00 2023-12-31T23:59:59

# Incremental (from last materialized timestamp)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
```

Python API:
```python
from datetime import datetime
store.materialize(start_date=datetime(2023, 1, 1), end_date=datetime(2023, 12, 31))
store.materialize_incremental(end_date=datetime.utcnow())
```

## CLI Commands

| Command | Purpose |
|---------|---------|
| `feast init [DIR]` | Create new feature repository |
| `feast apply` | Register/update feature definitions |
| `feast plan` | Preview changes without applying |
| `feast materialize START END` | Materialize features to online store |
| `feast materialize-incremental END` | Incremental materialization |
| `feast entities list` | List registered entities |
| `feast feature-views list` | List feature views |
| `feast feature-services list` | List feature services |
| `feast on-demand-feature-views list` | List on-demand feature views |
| `feast teardown` | Remove infrastructure resources |
| `feast version` | Show SDK version |

Options: `--chdir` / `-c` (run in different directory), `--feature-store-yaml` / `-f` (override config path).

## Vector Search / RAG

Define a feature view with vector fields for similarity search:

```python
from feast.types import Array, Float32

wiki_passages = FeatureView(
name="wiki_passages",
entities=[passage_entity],
schema=[
Field(name="passage_text", dtype=String),
Field(
name="embedding",
dtype=Array(Float32),
vector_index=True,
vector_length=384,
vector_search_metric="COSINE",
),
],
source=passages_source,
online=True,
)
```

Retrieve similar documents:
```python
results = store.retrieve_online_documents(
feature="wiki_passages:embedding",
query=query_embedding,
top_k=5,
)
```

## feature_store.yaml Minimal Config

```yaml
project: my_project
registry: data/registry.db
provider: local
online_store:
type: sqlite
path: data/online_store.db
```

## Common Imports

```python
from feast import (
Entity, FeatureView, OnDemandFeatureView, FeatureService,
Field, FileSource, RequestSource, FeatureStore,
)
from feast.on_demand_feature_view import on_demand_feature_view
from feast.types import Float32, Float64, Int64, String, Bool, Array
from feast.value_type import ValueType
from datetime import timedelta
```

## Detailed References

- **Feature definitions** (all types, parameters, patterns): See [references/feature-definitions.md](references/feature-definitions.md)
- **Configuration** (feature_store.yaml, all store types, auth): See [references/configuration.md](references/configuration.md)
- **Retrieval & RAG** (online/offline retrieval, vector search, RAG retriever): See [references/retrieval-and-rag.md](references/retrieval-and-rag.md)
Loading