Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
226 changes: 226 additions & 0 deletions docs/reference/offline-stores/hybrid_offline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# Hybrid Offline Store

## Overview

The Hybrid Offline Store is a specialized store that routes operations to different underlying offline stores based on the data source type. This enables you to use multiple types of data sources (e.g., BigQuery, Redshift, Snowflake, File) within the same Feast deployment.

## When to Use

Consider using the Hybrid Offline Store when:

- You have data spread across multiple data platforms
- You want to gradually migrate from one data source to another
- Different teams in your organization use different data storage technologies
- You need to optimize cost by using specialized stores for specific workloads

## Configuration

### Setting Up the Hybrid Offline Store

To use the Hybrid Offline Store, you need to configure it in your `feature_store.yaml` file:

```yaml
project: my_project
registry: registry.db
provider: local
offline_store:
type: feast.infra.offline_stores.hybrid_offline_store.HybridOfflineStore
offline_stores:
- type: bigquery
dataset: feast_dataset
project_id: gcp_project_id
- type: redshift
cluster_id: my_redshift_cluster
region: us-west-2
user: admin
database: feast
s3_staging_location: s3://feast-bucket/staging
- type: file
path: /data/feast
```

### Supported Offline Stores

The Hybrid Offline Store supports all of Feast's offline stores:

- BigQuery
- Redshift
- Snowflake
- File (Parquet, CSV)
- Postgres
- Spark
- Trino
- Custom offline stores

## Usage

### Defining Feature Views with Different Source Types

When using the Hybrid Offline Store, you can define feature views with different source types:

```python
# BigQuery source
bq_source = BigQuerySource(
table="my_table",
event_timestamp_column="timestamp",
created_timestamp_column="created_ts",
)

# File source
file_source = FileSource(
path="/data/transactions.parquet",
event_timestamp_column="timestamp",
created_timestamp_column="created_ts",
)

# Define feature views with different sources
driver_stats_bq = FeatureView(
name="driver_stats_bq",
entities=[driver],
ttl=timedelta(days=1),
source=bq_source,
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int32),
],
)

driver_stats_file = FeatureView(
name="driver_stats_file",
entities=[driver],
ttl=timedelta(days=1),
source=file_source,
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int32),
],
)
```

### How Routing Works

The Hybrid Offline Store routes operations to the appropriate underlying store based on the source type:

1. The source type is determined by examining the class name of the data source (e.g., `BigQuerySource`, `FileSource`).
2. The source store type is extracted from the class name (e.g., `bigquery`, `file`).
3. The operation is delegated to the matching offline store configuration.

## Limitations

- All feature views used in a single `get_historical_features` call must have the same source type.
- Custom data sources must follow the naming convention `{SourceType}Source` (e.g., `BigQuerySource`, `FileSource`).
- Each offline store configuration must have a unique `type` value.

## Implementation Details

The Hybrid Offline Store acts as a router, delegating operations to the appropriate underlying store. Key operations that are routed include:

- `get_historical_features`: Retrieves historical feature values for training or batch scoring
- `pull_latest_from_table_or_query`: Pulls the latest feature values from a table or query
- `pull_all_from_table_or_query`: Pulls all feature values from a table or query for a specified time range
- `offline_write_batch`: Writes a batch of feature values to the offline store

## Troubleshooting

### Common Issues

#### Feature Views with Different Source Types

If you encounter an error like:
```
ValueError: All feature views must have the same source type
```

This means you're trying to retrieve historical features for feature views with different source types in a single call. Split the call into multiple calls, one per source type.

#### Offline Store Configuration Not Found

If you encounter an error like:
```
ValueError: No offline store configuration found for source type 'X'
```

Make sure you've included an offline store configuration for each source type you're using.

## Examples

### Complete Feature Repository Example

```python
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource, BigQuerySource, FeatureStore
from feast.types import Float32, Int32, Int64

# Define an entity
driver = Entity(
name="driver_id",
value_type=Int64,
description="Driver ID",
)

# BigQuery source
bq_source = BigQuerySource(
table="my_project.my_dataset.driver_stats",
event_timestamp_column="event_timestamp",
)

# File source
file_source = FileSource(
path="/data/driver_activity.parquet",
event_timestamp_column="event_timestamp",
)

# Feature view with BigQuery source
driver_stats_bq = FeatureView(
name="driver_stats_bq",
entities=[driver],
ttl=timedelta(days=1),
source=bq_source,
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int32),
],
)

# Feature view with File source
driver_activity_file = FeatureView(
name="driver_activity_file",
entities=[driver],
ttl=timedelta(days=1),
source=file_source,
schema=[
Field(name="active_hours", dtype=Float32),
Field(name="driving_days", dtype=Int32),
],
)

# Apply feature views
fs = FeatureStore(repo_path="../../how-to-guides/customizing-feast")
fs.apply([driver, driver_stats_bq, driver_activity_file])

# Get historical features from BigQuery source
training_df_bq = fs.get_historical_features(
entity_df=entity_df,
features=[
"driver_stats_bq:conv_rate",
"driver_stats_bq:acc_rate",
"driver_stats_bq:avg_daily_trips",
],
).to_df()

# Get historical features from File source
training_df_file = fs.get_historical_features(
entity_df=entity_df,
features=[
"driver_activity_file:active_hours",
"driver_activity_file:driving_days",
],
).to_df()
```

## Conclusion

The Hybrid Offline Store provides flexibility in working with multiple data sources while maintaining the simplicity of the Feast API. By routing operations to the appropriate underlying store, it allows you to leverage the strengths of different data platforms within a single Feast deployment.
111 changes: 111 additions & 0 deletions docs/reference/online-stores/hybrid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Hybrid online store

## Description

The HybridOnlineStore allows routing online feature operations to different online store backends based on a configurable tag (such as `tribe`, `team`, or `project`) on the FeatureView. This enables a single Feast deployment to support multiple online store backends, each configured independently and selected dynamically at runtime.

## Getting started

To use the HybridOnlineStore, install Feast with all required online store dependencies (e.g., Bigtable, Cassandra, etc.) for the stores you plan to use. For example:

```
pip install 'feast[gcp,cassandra]'
```

## Example

{% code title="feature_store.yaml" %}
```yaml
project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
type: hybrid_online_store.HybridOnlineStore
routing_tag: team # or any tag name you want to use in FeatureView's for routing
online_stores:
- type: bigtable
conf:
project_id: my_gcp_project
instance: my_bigtable_instance
- type: cassandra
conf:
hosts:
- cassandra1.example.com
- cassandra2.example.com
keyspace: feast_keyspace
username: feast_user
password: feast_password
```
{% endcode %}

### Setting the Routing Tag in FeatureView

To enable routing, add a tag to your FeatureView that matches the `routing_tag` specified in your `feature_store.yaml`. For example, if your `routing_tag` is `team`, add a `team` tag to your FeatureView:

```yaml
tags:
team: bigtable # This tag determines which online store is used
```

The value of this tag (e.g., `bigtable`) should match the type or identifier of the online store you want to use for this FeatureView. The HybridOnlineStore will route all online operations for this FeatureView to the corresponding backend.

### Example FeatureView

{% code title="feature_view" %}
```yaml
name: user_features
entities:
- name: user_id
join_keys: ["user_id"]
ttl: null
schema:
- name: age
dtype: int64
- name: country
dtype: string
online: true
source:
path: data/user_features.parquet
event_timestamp_column: event_timestamp
created_timestamp_column: created_timestamp
tags:
team: bigtable # This tag determines which online store is used
```
{% endcode %}

The `team` tag in the FeatureView's `tags` field determines which online store backend is used for this FeatureView. In this example, all online operations for `user_features` will be routed to the Bigtable online store, as specified by the tag value and the `routing_tag` in your `feature_store.yaml`.

The HybridOnlineStore will route requests to the correct online store based on the value of the tag specified by `routing_tag`.

The full set of configuration options for each online store is available in their respective documentation:
- [BigtableOnlineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.online_stores.bigtable.BigtableOnlineStoreConfig)
- [CassandraOnlineStoreConfig](https://rtd.feast.dev/en/master/#feast.infra.online_stores.cassandra_online_store.cassandra_online_store.CassandraOnlineStoreConfig)

For a full explanation of configuration options, please refer to the documentation for each online store backend you configure in the `online_stores` list.

Storage specifications can be found at [docs/specs/online_store_format.md](../../specs/online_store_format.md).

## Functionality Matrix

The set of functionality supported by online stores is described in detail [here](overview.md#functionality). Below is a matrix indicating which functionality is supported by the HybridOnlineStore.

| | HybridOnlineStore |
|-----------------------------------------------------------|-------------------|
| write feature values to the online store | yes |
| read feature values from the online store | yes |
| update infrastructure (e.g. tables) in the online store | yes |
| teardown infrastructure (e.g. tables) in the online store | yes |
| generate a plan of infrastructure changes | no |
| support for on-demand transforms | yes |
| readable by Python SDK | yes |
| readable by Java | no |
| readable by Go | no |
| support for entityless feature views | yes |
| support for concurrent writing to the same key | yes |
| support for ttl (time to live) at retrieval | no |
| support for deleting expired data | no |
| collocated by feature view | yes |
| collocated by feature service | no |
| collocated by entity key | yes |

To compare this set of functionality against other online stores, please see the full [functionality matrix](overview.md#functionality-matrix).
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from feast.infra.offline_stores.hybrid_offline_store.hybrid_offline_store import (
HybridOfflineStore,
HybridOfflineStoreConfig,
)

__all__ = ["HybridOfflineStore", "HybridOfflineStoreConfig"]
Loading