Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ repos:
stages: [commit]
language: system
types: [python]
exclude: '_pb2\.py$'
entry: bash -c 'uv run ruff check --fix "$@" && uv run ruff format "$@"' --
pass_filenames: true

Expand All @@ -24,6 +25,7 @@ repos:
stages: [commit]
language: system
types: [python]
exclude: '_pb2\.py$'
entry: bash -c 'uv run ruff check "$@" && uv run ruff format --check "$@"' --
pass_filenames: true

Expand Down
38 changes: 36 additions & 2 deletions docs/getting-started/concepts/feast-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,44 @@ To make this possible, Feast itself has a type system for all the types it is ab

Feast's type system is built on top of [protobuf](https://github.com/protocolbuffers/protobuf). The messages that make up the type system can be found [here](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto), and the corresponding python classes that wrap them can be found [here](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/types.py).

Feast supports primitive data types (numerical values, strings, bytes, booleans and timestamps). The only complex data type Feast supports is Arrays, and arrays cannot contain other arrays.
Feast supports the following categories of data types:

- **Primitive types**: numerical values (`Int32`, `Int64`, `Float32`, `Float64`), `String`, `Bytes`, `Bool`, and `UnixTimestamp`.
- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`.
- **Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`.
- **Map types**: dictionary-like structures with string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`.
- **JSON type**: opaque JSON data stored as a string at the proto level but semantically distinct from `String` — backends use native JSON types (`jsonb`, `VARIANT`, etc.), e.g. `Json`, `Array(Json)`.
- **Struct type**: schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation, e.g. `Struct({"name": String, "age": Int32})`.

For a complete reference with examples, see [Type System](../../reference/type-system.md).

Each feature or schema field in Feast is associated with a data type, which is stored in Feast's [registry](registry.md). These types are also used to ensure that Feast operates on values correctly (e.g. making sure that timestamp columns used for [point-in-time correct joins](point-in-time-joins.md) actually have the timestamp type).

As a result, each system that feast interacts with needs a way to translate data types from the native platform, into a feast type. E.g., Snowflake SQL types are converted to Feast types [here](https://rtd.feast.dev/en/master/feast.html#feast.type_map.snowflake_python_type_to_feast_value_type). The onus is therefore on authors of offline or online store connectors to make sure that this type mapping happens correctly.
As a result, each system that Feast interacts with needs a way to translate data types from the native platform into a Feast type. E.g., Snowflake SQL types are converted to Feast types [here](https://rtd.feast.dev/en/master/feast.html#feast.type_map.snowflake_python_type_to_feast_value_type). The onus is therefore on authors of offline or online store connectors to make sure that this type mapping happens correctly.

### Backend Type Mapping for Complex Types

Map, JSON, and Struct types are supported across all major Feast backends:

| Backend | Native Type | Feast Type |
|---------|-------------|------------|
| PostgreSQL | `jsonb` | `Map`, `Json`, `Struct` |
| PostgreSQL | `jsonb[]` | `Array(Map)` |
| Snowflake | `VARIANT`, `OBJECT` | `Map` |
| Snowflake | `JSON` | `Json` |
| Redshift | `SUPER` | `Map` |
| Redshift | `json` | `Json` |
| BigQuery | `JSON` | `Json` |
| BigQuery | `STRUCT`, `RECORD` | `Struct` |
| Spark | `map<string,string>` | `Map` |
| Spark | `array<map<string,string>>` | `Array(Map)` |
| Spark | `struct<...>` | `Struct` |
| Spark | `array<struct<...>>` | `Array(Struct(...))` |
| MSSQL | `nvarchar(max)` | `Map`, `Json`, `Struct` |
| DynamoDB | Proto bytes | `Map`, `Json`, `Struct` |
| Redis | Proto bytes | `Map`, `Json`, `Struct` |
| Milvus | `VARCHAR` (serialized) | `Map`, `Json`, `Struct` |

**Note**: When the backend native type is ambiguous (e.g., `jsonb` could be `Map`, `Json`, or `Struct`), the **schema-declared Feast type takes precedence**. The backend-to-Feast type mappings above are only used for schema inference when no explicit type is provided.

**Note**: Feast currently does *not* support a null type in its type system.
38 changes: 38 additions & 0 deletions docs/getting-started/concepts/feature-view.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Feature views consist of:
* (optional, but recommended) a schema specifying one or more [features](feature-view.md#field) (without this, Feast will infer the schema by reading from the data source)
* (optional, but recommended) metadata (for example, description, or other free-form metadata via `tags`)
* (optional) a TTL, which limits how far back Feast will look when generating historical datasets
* (optional) `enable_validation=True`, which enables schema validation during materialization (see [Schema Validation](#schema-validation) below)

Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment. Feature views generally contain features that are properties of a specific object, in which case that object is defined as an entity and included in the feature view.

Expand Down Expand Up @@ -159,6 +160,43 @@ Feature names must be unique within a [feature view](feature-view.md#feature-vie

Each field can have additional metadata associated with it, specified as key-value [tags](https://rtd.feast.dev/en/master/feast.html#feast.field.Field).

## Schema Validation

Feature views support an optional `enable_validation` parameter that enables schema validation during materialization and historical feature retrieval. When enabled, Feast verifies that:

- All declared feature columns are present in the input data.
- Column data types match the expected Feast types (mismatches are logged as warnings).

This is useful for catching data quality issues early in the pipeline. To enable it:

```python
from feast import FeatureView, Field
from feast.types import Int32, Int64, Float32, Json, Map, String, Struct

validated_fv = FeatureView(
name="validated_features",
entities=[driver],
schema=[
Field(name="trips_today", dtype=Int64),
Field(name="rating", dtype=Float32),
Field(name="preferences", dtype=Map),
Field(name="config", dtype=Json), # opaque JSON data
Field(name="address", dtype=Struct({"street": String, "city": String, "zip": Int32})), # typed struct
],
source=my_source,
enable_validation=True, # enables schema checks
)
```

**JSON vs Map vs Struct**: These three complex types serve different purposes:
- **`Map`**: Schema-free dictionary with string keys. Use when the keys and values are dynamic.
- **`Json`**: Opaque JSON data stored as a string. Backends use native JSON types (`jsonb`, `VARIANT`). Use for configuration blobs or API responses where you don't need field-level typing.
- **`Struct`**: Schema-aware structured type with named, typed fields. Persisted through the registry via Field tags. Use when you know the exact structure and want type safety.

Validation is supported in all compute engines (Local, Spark, and Ray). When a required column is missing, a `ValueError` is raised. Type mismatches are logged as warnings but do not block execution, allowing for safe gradual adoption.

The `enable_validation` parameter is also available on `BatchFeatureView` and `StreamFeatureView`, as well as their respective decorators (`@batch_feature_view` and `@stream_feature_view`).

## \[Alpha] On demand feature views

On demand feature views allows data scientists to use existing features and request time data (features only available at request time) to transform and create new features. Users define python transformation logic which is executed in both the historical retrieval and online retrieval paths.
Expand Down
6 changes: 6 additions & 0 deletions docs/how-to-guides/dbt-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,12 @@ Feast automatically maps dbt/warehouse column types to Feast types:
| `TIMESTAMP`, `DATETIME` | `UnixTimestamp` |
| `BYTES`, `BINARY` | `Bytes` |
| `ARRAY<type>` | `Array(type)` |
| `JSON`, `JSONB` | `Map` (or `Json` if declared in schema) |
| `VARIANT`, `OBJECT` | `Map` |
| `SUPER` | `Map` |
| `MAP<string,string>` | `Map` |
| `STRUCT`, `RECORD` | `Struct` (BigQuery) |
| `struct<...>` | `Struct` (Spark) |

Snowflake `NUMBER(precision, scale)` types are handled specially:
- Scale > 0: `Float64`
Expand Down
32 changes: 32 additions & 0 deletions docs/specs/offline_store_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,12 @@ Here's how Feast types map to Pandas types for Feast APIs that take in or return
| DOUBLE\_LIST | `list[float]`|
| FLOAT\_LIST | `list[float]`|
| BOOL\_LIST | `list[bool]`|
| MAP | `dict` (`Dict[str, Any]`)|
| MAP\_LIST | `list[dict]` (`List[Dict[str, Any]]`)|
| JSON | `object` (parsed Python dict/list/str)|
| JSON\_LIST | `list[object]`|
| STRUCT | `dict` (`Dict[str, Any]`)|
| STRUCT\_LIST | `list[dict]` (`List[Dict[str, Any]]`)|

Note that this mapping is non-injective, that is more than one Pandas type may corresponds to one Feast type (but not vice versa). In these cases, when converting Feast values to Pandas, the **first** Pandas type in the table above is used.

Expand Down Expand Up @@ -78,6 +84,12 @@ Here's how Feast types map to BigQuery types when using BigQuery for offline sto
| DOUBLE\_LIST | `ARRAY<FLOAT64>`|
| FLOAT\_LIST | `ARRAY<FLOAT64>`|
| BOOL\_LIST | `ARRAY<BOOL>`|
| MAP | `JSON` / `STRUCT` |
| MAP\_LIST | `ARRAY<JSON>` / `ARRAY<STRUCT>` |
| JSON | `JSON` |
| JSON\_LIST | `ARRAY<JSON>` |
| STRUCT | `STRUCT` / `RECORD` |
| STRUCT\_LIST | `ARRAY<STRUCT>` |

Values that are not specified by the table above will cause an error on conversion.

Expand All @@ -94,3 +106,23 @@ https://docs.snowflake.com/en/user-guide/python-connector-pandas.html#snowflake-
| INT32 | `INT8 / UINT8 / INT16 / UINT16 / INT32 / UINT32` |
| INT64 | `INT64 / UINT64` |
| DOUBLE | `FLOAT64` |
| MAP | `VARIANT` / `OBJECT` |
| JSON | `JSON` / `VARIANT` |

#### Redshift Types
Here's how Feast types map to Redshift types when using Redshift for offline storage:

| Feast Type | Redshift Type |
|-------------|--|
| Event Timestamp | `TIMESTAMP` / `TIMESTAMPTZ` |
| BYTES | `VARBYTE` |
| STRING | `VARCHAR` |
| INT32 | `INT4` / `SMALLINT` |
| INT64 | `INT8` / `BIGINT` |
| DOUBLE | `FLOAT8` / `DOUBLE PRECISION` |
| FLOAT | `FLOAT4` / `REAL` |
| BOOL | `BOOL` |
| MAP | `SUPER` |
| JSON | `json` / `SUPER` |

Note: Redshift's `SUPER` type stores semi-structured JSON data. During materialization, Feast automatically handles `SUPER` columns that are exported as JSON strings by parsing them back into Python dictionaries before converting to `MAP` proto values.
5 changes: 4 additions & 1 deletion protos/feast/core/FeatureView.proto
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ message FeatureView {
FeatureViewMeta meta = 2;
}

// Next available id: 17
// Next available id: 18
// TODO(adchia): refactor common fields from this and ODFV into separate metadata proto
message FeatureViewSpec {
// Name of the feature view. Must be unique. Not updated.
Expand Down Expand Up @@ -89,6 +89,9 @@ message FeatureViewSpec {

// The transformation mode (e.g., "python", "pandas", "spark", "sql", "ray")
string mode = 16;

// Whether schema validation is enabled during materialization
bool enable_validation = 17;
}

message FeatureViewMeta {
Expand Down
5 changes: 4 additions & 1 deletion protos/feast/core/StreamFeatureView.proto
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ message StreamFeatureView {
FeatureViewMeta meta = 2;
}

// Next available id: 20
// Next available id: 21
message StreamFeatureViewSpec {
// Name of the feature view. Must be unique. Not updated.
string name = 1;
Expand Down Expand Up @@ -99,5 +99,8 @@ message StreamFeatureViewSpec {
// Hop size for tiling (e.g., 5 minutes). Determines the granularity of pre-aggregated tiles.
// If not specified, defaults to 5 minutes. Only used when enable_tiling is true.
google.protobuf.Duration tiling_hop_size = 19;

// Whether schema validation is enabled during materialization
bool enable_validation = 20;
}

8 changes: 8 additions & 0 deletions protos/feast/types/Value.proto
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ message ValueType {
FLOAT_SET = 27;
BOOL_SET = 28;
UNIX_TIMESTAMP_SET = 29;
JSON = 32;
JSON_LIST = 33;
STRUCT = 34;
STRUCT_LIST = 35;
}
}

Expand Down Expand Up @@ -88,6 +92,10 @@ message Value {
FloatSet float_set_val = 27;
BoolSet bool_set_val = 28;
Int64Set unix_timestamp_set_val = 29;
string json_val = 32;
StringList json_list_val = 33;
Map struct_val = 34;
MapList struct_list_val = 35;
}
}

Expand Down
4 changes: 4 additions & 0 deletions sdk/python/feast/batch_feature_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ def __init__(
feature_transformation: Optional[Transformation] = None,
batch_engine: Optional[Dict[str, Any]] = None,
aggregations: Optional[List[Aggregation]] = None,
enable_validation: bool = False,
):
if not flags_helper.is_test():
warnings.warn(
Expand Down Expand Up @@ -136,6 +137,7 @@ def __init__(
source=source, # type: ignore[arg-type]
sink_source=sink_source,
mode=mode,
enable_validation=enable_validation,
)

def get_feature_transformation(self) -> Optional[Transformation]:
Expand Down Expand Up @@ -169,6 +171,7 @@ def batch_feature_view(
description: str = "",
owner: str = "",
schema: Optional[List[Field]] = None,
enable_validation: bool = False,
):
"""
Creates a BatchFeatureView object with the given user-defined function (UDF) as the transformation.
Expand Down Expand Up @@ -199,6 +202,7 @@ def decorator(user_function):
schema=schema,
udf=user_function,
udf_string=udf_string,
enable_validation=enable_validation,
)
functools.update_wrapper(wrapper=batch_feature_view_obj, wrapped=user_function)
return batch_feature_view_obj
Expand Down
30 changes: 29 additions & 1 deletion sdk/python/feast/driver_test_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,10 +136,38 @@ def create_driver_hourly_stats_df(drivers, start_date, end_date) -> pd.DataFrame
df_all_drivers["conv_rate"] = np.random.random(size=rows).astype(np.float32)
df_all_drivers["acc_rate"] = np.random.random(size=rows).astype(np.float32)
df_all_drivers["avg_daily_trips"] = np.random.randint(0, 1000, size=rows).astype(
np.int32
np.int64
)
df_all_drivers["created"] = pd.to_datetime(pd.Timestamp.now(tz=None).round("ms"))

# Complex type columns for Map, Json, and Struct examples
import json as _json

df_all_drivers["driver_metadata"] = [
{
"vehicle_type": np.random.choice(["sedan", "suv", "truck"]),
"rating": str(round(np.random.uniform(3.0, 5.0), 1)),
}
for _ in range(len(df_all_drivers))
]
df_all_drivers["driver_config"] = [
_json.dumps(
{
"max_distance_km": int(np.random.randint(10, 200)),
"preferred_zones": list(
np.random.choice(
["north", "south", "east", "west"], size=2, replace=False
)
),
}
)
for _ in range(len(df_all_drivers))
]
df_all_drivers["driver_profile"] = [
{"name": f"driver_{driver_id}", "age": str(int(np.random.randint(25, 60)))}
for driver_id in df_all_drivers["driver_id"]
]

# Create duplicate rows that should be filtered by created timestamp
# TODO: These duplicate rows area indirectly being filtered out by the point in time join already. We need to
# inject a bad row at a timestamp where we know it will get joined to the entity dataframe, and then test that
Expand Down
11 changes: 11 additions & 0 deletions sdk/python/feast/feature_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ class FeatureView(BaseFeatureView):
owner: str
materialization_intervals: List[Tuple[datetime, datetime]]
mode: Optional[Union["TransformationMode", str]]
enable_validation: bool

def __init__(
self,
Expand All @@ -123,6 +124,7 @@ def __init__(
tags: Optional[Dict[str, str]] = None,
owner: str = "",
mode: Optional[Union["TransformationMode", str]] = None,
enable_validation: bool = False,
):
"""
Creates a FeatureView object.
Expand All @@ -148,11 +150,14 @@ def __init__(
primary maintainer.
mode (optional): The transformation mode for feature transformations. Only meaningful
when transformations are applied. Choose from TransformationMode enum values.
enable_validation (optional): If True, enables schema validation during materialization
to check that data conforms to the declared feature types. Default is False.

Raises:
ValueError: A field mapping conflicts with an Entity or a Feature.
"""
self.name = name
self.enable_validation = enable_validation
self.entities = [e.name for e in entities] if entities else [DUMMY_ENTITY_NAME]
self.ttl = ttl
schema = schema or []
Expand Down Expand Up @@ -279,6 +284,7 @@ def __copy__(self):
online=self.online,
offline=self.offline,
sink_source=self.batch_source if self.source_views else None,
enable_validation=self.enable_validation,
)

# This is deliberately set outside of the FV initialization as we do not have the Entity objects.
Expand Down Expand Up @@ -307,6 +313,7 @@ def __eq__(self, other):
or sorted(self.entity_columns) != sorted(other.entity_columns)
or self.source_views != other.source_views
or self.materialization_intervals != other.materialization_intervals
or self.enable_validation != other.enable_validation
):
return False

Expand Down Expand Up @@ -473,6 +480,7 @@ def to_proto_spec(
source_views=source_view_protos,
feature_transformation=feature_transformation_proto,
mode=mode_str,
enable_validation=self.enable_validation,
)

def to_proto_meta(self):
Expand Down Expand Up @@ -642,6 +650,9 @@ def _from_proto_internal(
f"Entities: {feature_view.entities} vs Entity Columns: {feature_view.entity_columns}"
)

# Restore enable_validation from proto field.
feature_view.enable_validation = feature_view_proto.spec.enable_validation

# FeatureViewProjections are not saved in the FeatureView proto.
# Create the default projection.
feature_view.projection = FeatureViewProjection.from_feature_view_definition(
Expand Down
Loading
Loading