feast-dev · ntkathole · Feb 16, 2026 · Feb 16, 2026 · Feb 17, 2026
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -14,6 +14,7 @@ repos:
         stages: [commit]
         language: system
         types: [python]
+        exclude: '_pb2\.py$'
         entry: bash -c 'uv run ruff check --fix "$@" && uv run ruff format "$@"' --
         pass_filenames: true
 
@@ -24,6 +25,7 @@ repos:
         stages: [commit]
         language: system
         types: [python]
+        exclude: '_pb2\.py$'
         entry: bash -c 'uv run ruff check "$@" && uv run ruff format --check "$@"' --
         pass_filenames: true
 

@@ -5,10 +5,44 @@ To make this possible, Feast itself has a type system for all the types it is ab
 
 Feast's type system is built on top of [protobuf](https://github.com/protocolbuffers/protobuf). The messages that make up the type system can be found [here](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto), and the corresponding python classes that wrap them can be found [here](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/types.py).
 
-Feast supports primitive data types (numerical values, strings, bytes, booleans and timestamps). The only complex data type Feast supports is Arrays, and arrays cannot contain other arrays.
+Feast supports the following categories of data types:
+
+- **Primitive types**: numerical values (`Int32`, `Int64`, `Float32`, `Float64`), `String`, `Bytes`, `Bool`, and `UnixTimestamp`.
+- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`.
+- **Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`.
+- **Map types**: dictionary-like structures with string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`.
+- **JSON type**: opaque JSON data stored as a string at the proto level but semantically distinct from `String` — backends use native JSON types (`jsonb`, `VARIANT`, etc.), e.g. `Json`, `Array(Json)`.
+- **Struct type**: schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation, e.g. `Struct({"name": String, "age": Int32})`.
+
+For a complete reference with examples, see [Type System](../../reference/type-system.md).
 
 Each feature or schema field in Feast is associated with a data type, which is stored in Feast's [registry](registry.md). These types are also used to ensure that Feast operates on values correctly (e.g. making sure that timestamp columns used for [point-in-time correct joins](point-in-time-joins.md) actually have the timestamp type).
 
-As a result, each system that feast interacts with needs a way to translate data types from the native platform, into a feast type. E.g., Snowflake SQL types are converted to Feast types [here](https://rtd.feast.dev/en/master/feast.html#feast.type_map.snowflake_python_type_to_feast_value_type). The onus is therefore on authors of offline or online store connectors to make sure that this type mapping happens correctly.
+As a result, each system that Feast interacts with needs a way to translate data types from the native platform into a Feast type. E.g., Snowflake SQL types are converted to Feast types [here](https://rtd.feast.dev/en/master/feast.html#feast.type_map.snowflake_python_type_to_feast_value_type). The onus is therefore on authors of offline or online store connectors to make sure that this type mapping happens correctly.
+
+### Backend Type Mapping for Complex Types
+
+Map, JSON, and Struct types are supported across all major Feast backends:
+
+| Backend | Native Type | Feast Type |
+|---------|-------------|------------|
+| PostgreSQL | `jsonb` | `Map`, `Json`, `Struct` |
+| PostgreSQL | `jsonb[]` | `Array(Map)` |
+| Snowflake | `VARIANT`, `OBJECT` | `Map` |
+| Snowflake | `JSON` | `Json` |
+| Redshift | `SUPER` | `Map` |
+| Redshift | `json` | `Json` |
+| BigQuery | `JSON` | `Json` |
+| BigQuery | `STRUCT`, `RECORD` | `Struct` |
+| Spark | `map<string,string>` | `Map` |
+| Spark | `array<map<string,string>>` | `Array(Map)` |
+| Spark | `struct<...>` | `Struct` |
+| Spark | `array<struct<...>>` | `Array(Struct(...))` |
+| MSSQL | `nvarchar(max)` | `Map`, `Json`, `Struct` |
+| DynamoDB | Proto bytes | `Map`, `Json`, `Struct` |
+| Redis | Proto bytes | `Map`, `Json`, `Struct` |
+| Milvus | `VARCHAR` (serialized) | `Map`, `Json`, `Struct` |
+
+**Note**: When the backend native type is ambiguous (e.g., `jsonb` could be `Map`, `Json`, or `Struct`), the **schema-declared Feast type takes precedence**. The backend-to-Feast type mappings above are only used for schema inference when no explicit type is provided.
 
 **Note**: Feast currently does *not* support a null type in its type system.
@@ -24,6 +24,7 @@ Feature views consist of:
 * (optional, but recommended) a schema specifying one or more [features](feature-view.md#field) (without this, Feast will infer the schema by reading from the data source)
 * (optional, but recommended) metadata (for example, description, or other free-form metadata via `tags`)
 * (optional) a TTL, which limits how far back Feast will look when generating historical datasets
+* (optional) `enable_validation=True`, which enables schema validation during materialization (see [Schema Validation](#schema-validation) below)
 
 Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment. Feature views generally contain features that are properties of a specific object, in which case that object is defined as an entity and included in the feature view.
 
@@ -159,6 +160,43 @@ Feature names must be unique within a [feature view](feature-view.md#feature-vie
 
 Each field can have additional metadata associated with it, specified as key-value [tags](https://rtd.feast.dev/en/master/feast.html#feast.field.Field).
 
+## Schema Validation
+
+Feature views support an optional `enable_validation` parameter that enables schema validation during materialization and historical feature retrieval. When enabled, Feast verifies that:
+
+- All declared feature columns are present in the input data.
+- Column data types match the expected Feast types (mismatches are logged as warnings).
+
+This is useful for catching data quality issues early in the pipeline. To enable it:
+
+```python
+from feast import FeatureView, Field
+from feast.types import Int32, Int64, Float32, Json, Map, String, Struct
+
+validated_fv = FeatureView(
+    name="validated_features",
+    entities=[driver],
+    schema=[
+        Field(name="trips_today", dtype=Int64),
+        Field(name="rating", dtype=Float32),
+        Field(name="preferences", dtype=Map),
+        Field(name="config", dtype=Json),  # opaque JSON data
+        Field(name="address", dtype=Struct({"street": String, "city": String, "zip": Int32})),  # typed struct
+    ],
+    source=my_source,
+    enable_validation=True,  # enables schema checks
+)
+```
+
+**JSON vs Map vs Struct**: These three complex types serve different purposes:
+- **`Map`**: Schema-free dictionary with string keys. Use when the keys and values are dynamic.
+- **`Json`**: Opaque JSON data stored as a string. Backends use native JSON types (`jsonb`, `VARIANT`). Use for configuration blobs or API responses where you don't need field-level typing.
+- **`Struct`**: Schema-aware structured type with named, typed fields. Persisted through the registry via Field tags. Use when you know the exact structure and want type safety.
+
+Validation is supported in all compute engines (Local, Spark, and Ray). When a required column is missing, a `ValueError` is raised. Type mismatches are logged as warnings but do not block execution, allowing for safe gradual adoption.
+
+The `enable_validation` parameter is also available on `BatchFeatureView` and `StreamFeatureView`, as well as their respective decorators (`@batch_feature_view` and `@stream_feature_view`).
+
 ## \[Alpha] On demand feature views
 
 On demand feature views allows data scientists to use existing features and request time data (features only available at request time) to transform and create new features. Users define python transformation logic which is executed in both the historical retrieval and online retrieval paths.

@@ -289,6 +289,12 @@ Feast automatically maps dbt/warehouse column types to Feast types:
 | `TIMESTAMP`, `DATETIME` | `UnixTimestamp` |
 | `BYTES`, `BINARY` | `Bytes` |
 | `ARRAY<type>` | `Array(type)` |
+| `JSON`, `JSONB` | `Map` (or `Json` if declared in schema) |
+| `VARIANT`, `OBJECT` | `Map` |
+| `SUPER` | `Map` |
+| `MAP<string,string>` | `Map` |
+| `STRUCT`, `RECORD` | `Struct` (BigQuery) |
+| `struct<...>` | `Struct` (Spark) |
 
 Snowflake `NUMBER(precision, scale)` types are handled specially:
 - Scale > 0: `Float64`

@@ -49,6 +49,12 @@ Here's how Feast types map to Pandas types for Feast APIs that take in or return
 | DOUBLE\_LIST | `list[float]`|
 | FLOAT\_LIST | `list[float]`|
 | BOOL\_LIST | `list[bool]`|
+| MAP | `dict` (`Dict[str, Any]`)|
+| MAP\_LIST | `list[dict]` (`List[Dict[str, Any]]`)|
+| JSON | `object` (parsed Python dict/list/str)|
+| JSON\_LIST | `list[object]`|
+| STRUCT | `dict` (`Dict[str, Any]`)|
+| STRUCT\_LIST | `list[dict]` (`List[Dict[str, Any]]`)|
 
 Note that this mapping is non-injective, that is more than one Pandas type may corresponds to one Feast type (but not vice versa). In these cases, when converting Feast values to Pandas, the **first** Pandas type in the table above is used.
 
@@ -78,6 +84,12 @@ Here's how Feast types map to BigQuery types when using BigQuery for offline sto
 | DOUBLE\_LIST | `ARRAY<FLOAT64>`|
 | FLOAT\_LIST | `ARRAY<FLOAT64>`|
 | BOOL\_LIST | `ARRAY<BOOL>`|
+| MAP | `JSON` / `STRUCT` |
+| MAP\_LIST | `ARRAY<JSON>` / `ARRAY<STRUCT>` |
+| JSON | `JSON` |
+| JSON\_LIST | `ARRAY<JSON>` |
+| STRUCT | `STRUCT` / `RECORD` |
+| STRUCT\_LIST | `ARRAY<STRUCT>` |
 
 Values that are not specified by the table above will cause an error on conversion.
 
@@ -94,3 +106,23 @@ https://docs.snowflake.com/en/user-guide/python-connector-pandas.html#snowflake-
 | INT32 | `INT8 / UINT8 / INT16 / UINT16 / INT32 / UINT32` |
 | INT64 | `INT64 / UINT64` |
 | DOUBLE | `FLOAT64` |
+| MAP | `VARIANT` / `OBJECT` |
+| JSON | `JSON` / `VARIANT` |
+
+#### Redshift Types
+Here's how Feast types map to Redshift types when using Redshift for offline storage:
+
+| Feast Type | Redshift Type |
+|-------------|--|
+| Event Timestamp | `TIMESTAMP` / `TIMESTAMPTZ` |
+| BYTES | `VARBYTE` |
+| STRING | `VARCHAR` |
+| INT32 | `INT4` / `SMALLINT` |
+| INT64 | `INT8` / `BIGINT` |
+| DOUBLE | `FLOAT8` / `DOUBLE PRECISION` |
+| FLOAT | `FLOAT4` / `REAL` |
+| BOOL | `BOOL` |
+| MAP | `SUPER` |
+| JSON | `json` / `SUPER` |
+
+Note: Redshift's `SUPER` type stores semi-structured JSON data. During materialization, Feast automatically handles `SUPER` columns that are exported as JSON strings by parsing them back into Python dictionaries before converting to `MAP` proto values.
@@ -36,7 +36,7 @@ message FeatureView {
     FeatureViewMeta meta = 2;
 }
 
-// Next available id: 17
+// Next available id: 18
 // TODO(adchia): refactor common fields from this and ODFV into separate metadata proto
 message FeatureViewSpec {
     // Name of the feature view. Must be unique. Not updated.
@@ -89,6 +89,9 @@ message FeatureViewSpec {
 
     // The transformation mode (e.g., "python", "pandas", "spark", "sql", "ray")
     string mode = 16;
+
+    // Whether schema validation is enabled during materialization
+    bool enable_validation = 17;
 }
 
 message FeatureViewMeta {

@@ -37,7 +37,7 @@ message StreamFeatureView {
     FeatureViewMeta meta = 2;
 }
 
-// Next available id: 20
+// Next available id: 21
 message StreamFeatureViewSpec {
     // Name of the feature view. Must be unique. Not updated.
     string name = 1;
@@ -99,5 +99,8 @@ message StreamFeatureViewSpec {
     // Hop size for tiling (e.g., 5 minutes). Determines the granularity of pre-aggregated tiles.
     // If not specified, defaults to 5 minutes. Only used when enable_tiling is true.
     google.protobuf.Duration tiling_hop_size = 19;
+
+    // Whether schema validation is enabled during materialization
+    bool enable_validation = 20;
 }
 
@@ -53,6 +53,10 @@ message ValueType {
     FLOAT_SET = 27;
     BOOL_SET = 28;
     UNIX_TIMESTAMP_SET = 29;
+    JSON = 32;
+    JSON_LIST = 33;
+    STRUCT = 34;
+    STRUCT_LIST = 35;
   }
 }
 
@@ -88,6 +92,10 @@ message Value {
     FloatSet float_set_val = 27;
     BoolSet bool_set_val = 28;
     Int64Set unix_timestamp_set_val = 29;
+    string json_val = 32;
+    StringList json_list_val = 33;
+    Map struct_val = 34;
+    MapList struct_list_val = 35;
   }
 }
 

@@ -97,6 +97,7 @@ def __init__(
         feature_transformation: Optional[Transformation] = None,
         batch_engine: Optional[Dict[str, Any]] = None,
         aggregations: Optional[List[Aggregation]] = None,
+        enable_validation: bool = False,
     ):
         if not flags_helper.is_test():
             warnings.warn(
@@ -136,6 +137,7 @@ def __init__(
             source=source,  # type: ignore[arg-type]
             sink_source=sink_source,
             mode=mode,
+            enable_validation=enable_validation,
         )
 
     def get_feature_transformation(self) -> Optional[Transformation]:
@@ -169,6 +171,7 @@ def batch_feature_view(
     description: str = "",
     owner: str = "",
     schema: Optional[List[Field]] = None,
+    enable_validation: bool = False,
 ):
     """
     Creates a BatchFeatureView object with the given user-defined function (UDF) as the transformation.
@@ -199,6 +202,7 @@ def decorator(user_function):
             schema=schema,
             udf=user_function,
             udf_string=udf_string,
+            enable_validation=enable_validation,
         )
         functools.update_wrapper(wrapper=batch_feature_view_obj, wrapped=user_function)
         return batch_feature_view_obj

@@ -136,10 +136,38 @@ def create_driver_hourly_stats_df(drivers, start_date, end_date) -> pd.DataFrame
     df_all_drivers["conv_rate"] = np.random.random(size=rows).astype(np.float32)
     df_all_drivers["acc_rate"] = np.random.random(size=rows).astype(np.float32)
     df_all_drivers["avg_daily_trips"] = np.random.randint(0, 1000, size=rows).astype(
-        np.int32
+        np.int64
     )
     df_all_drivers["created"] = pd.to_datetime(pd.Timestamp.now(tz=None).round("ms"))
 
+    # Complex type columns for Map, Json, and Struct examples
+    import json as _json
+
+    df_all_drivers["driver_metadata"] = [
+        {
+            "vehicle_type": np.random.choice(["sedan", "suv", "truck"]),
+            "rating": str(round(np.random.uniform(3.0, 5.0), 1)),
+        }
+        for _ in range(len(df_all_drivers))
+    ]
+    df_all_drivers["driver_config"] = [
+        _json.dumps(
+            {
+                "max_distance_km": int(np.random.randint(10, 200)),
+                "preferred_zones": list(
+                    np.random.choice(
+                        ["north", "south", "east", "west"], size=2, replace=False
+                    )
+                ),
+            }
+        )
+        for _ in range(len(df_all_drivers))
+    ]
+    df_all_drivers["driver_profile"] = [
+        {"name": f"driver_{driver_id}", "age": str(int(np.random.randint(25, 60)))}
+        for driver_id in df_all_drivers["driver_id"]
+    ]
+
     # Create duplicate rows that should be filtered by created timestamp
     # TODO: These duplicate rows area indirectly being filtered out by the point in time join already. We need to
     #  inject a bad row at a timestamp where we know it will get joined to the entity dataframe, and then test that

@@ -107,6 +107,7 @@ class FeatureView(BaseFeatureView):
     owner: str
     materialization_intervals: List[Tuple[datetime, datetime]]
     mode: Optional[Union["TransformationMode", str]]
+    enable_validation: bool
 
     def __init__(
         self,
@@ -123,6 +124,7 @@ def __init__(
         tags: Optional[Dict[str, str]] = None,
         owner: str = "",
         mode: Optional[Union["TransformationMode", str]] = None,
+        enable_validation: bool = False,
     ):
         """
         Creates a FeatureView object.
@@ -148,11 +150,14 @@ def __init__(
                 primary maintainer.
             mode (optional): The transformation mode for feature transformations. Only meaningful
                 when transformations are applied. Choose from TransformationMode enum values.
+            enable_validation (optional): If True, enables schema validation during materialization
+                to check that data conforms to the declared feature types. Default is False.
 
         Raises:
             ValueError: A field mapping conflicts with an Entity or a Feature.
         """
         self.name = name
+        self.enable_validation = enable_validation
         self.entities = [e.name for e in entities] if entities else [DUMMY_ENTITY_NAME]
         self.ttl = ttl
         schema = schema or []
@@ -279,6 +284,7 @@ def __copy__(self):
             online=self.online,
             offline=self.offline,
             sink_source=self.batch_source if self.source_views else None,
+            enable_validation=self.enable_validation,
         )
 
         # This is deliberately set outside of the FV initialization as we do not have the Entity objects.
@@ -307,6 +313,7 @@ def __eq__(self, other):
             or sorted(self.entity_columns) != sorted(other.entity_columns)
             or self.source_views != other.source_views
             or self.materialization_intervals != other.materialization_intervals
+            or self.enable_validation != other.enable_validation
         ):
             return False
 
@@ -473,6 +480,7 @@ def to_proto_spec(
             source_views=source_view_protos,
             feature_transformation=feature_transformation_proto,
             mode=mode_str,
+            enable_validation=self.enable_validation,
         )
 
     def to_proto_meta(self):
@@ -642,6 +650,9 @@ def _from_proto_internal(
                 f"Entities: {feature_view.entities} vs Entity Columns: {feature_view.entity_columns}"
             )
 
+        # Restore enable_validation from proto field.
+        feature_view.enable_validation = feature_view_proto.spec.enable_validation
+
         # FeatureViewProjections are not saved in the FeatureView proto.
         # Create the default projection.
         feature_view.projection = FeatureViewProjection.from_feature_view_definition(