Skip to content

Commit

Permalink
merge master
Browse files Browse the repository at this point in the history
Signed-off-by: Christopher Camenares <ccamenares@gmail.com>
  • Loading branch information
camenares committed Jul 16, 2024
2 parents ea3360e + 38cae16 commit 5f34a59
Show file tree
Hide file tree
Showing 17 changed files with 55 additions and 87 deletions.
4 changes: 2 additions & 2 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@
* [Azure Synapse + Azure SQL (contrib)](reference/data-sources/mssql.md)
* [Offline stores](reference/offline-stores/README.md)
* [Overview](reference/offline-stores/overview.md)
* [File](reference/offline-stores/file.md)
* [Dask](reference/offline-stores/dask.md)
* [Snowflake](reference/offline-stores/snowflake.md)
* [BigQuery](reference/offline-stores/bigquery.md)
* [Redshift](reference/offline-stores/redshift.md)
Expand Down Expand Up @@ -119,7 +119,7 @@
* [Feature servers](reference/feature-servers/README.md)
* [Python feature server](reference/feature-servers/python-feature-server.md)
* [\[Alpha\] Go feature server](reference/feature-servers/go-feature-server.md)
* [Offline Feature Server](reference/feature-servers/offline-feature-server)
* [Offline Feature Server](reference/feature-servers/offline-feature-server.md)
* [\[Beta\] Web UI](reference/alpha-web-ui.md)
* [\[Beta\] On demand feature view](reference/beta-on-demand-feature-view.md)
* [\[Alpha\] Vector Database](reference/alpha-vector-database.md)
Expand Down
1 change: 0 additions & 1 deletion docs/reference/feature-servers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ Feast users can choose to retrieve features from a feature server, as opposed to

{% content-ref url="go-feature-server.md" %}
[go-feature-server.md](go-feature-server.md)
=======
{% endcontent-ref %}

{% content-ref url="offline-feature-server.md" %}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# File offline store
# Dask offline store

## Description

The file offline store provides support for reading [FileSources](../data-sources/file.md).
It uses Dask as the compute engine.
The Dask offline store provides support for reading [FileSources](../data-sources/file.md).

{% hint style="warning" %}
All data is downloaded and joined using Python and therefore may not scale to production workloads.
Expand All @@ -17,28 +16,28 @@ project: my_feature_repo
registry: data/registry.db
provider: local
offline_store:
type: file
type: dask
```
{% endcode %}
The full set of configuration options is available in [FileOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.file.FileOfflineStoreConfig).
The full set of configuration options is available in [DaskOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.dask.DaskOfflineStoreConfig).
## Functionality Matrix
The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the file offline store.
Below is a matrix indicating which functionality is supported by the dask offline store.
| | File |
| | Dask |
| :-------------------------------- | :-- |
| `get_historical_features` (point-in-time correct join) | yes |
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
| `pull_all_from_table_or_query` (retrieve a saved dataset) | yes |
| `offline_write_batch` (persist dataframes to offline store) | yes |
| `write_logged_features` (persist logged features to offline store) | yes |

Below is a matrix indicating which functionality is supported by `FileRetrievalJob`.
Below is a matrix indicating which functionality is supported by `DaskRetrievalJob`.

| | File |
| | Dask |
| --------------------------------- | --- |
| export to dataframe | yes |
| export to arrow table | yes |
Expand Down
6 changes: 3 additions & 3 deletions docs/reference/offline-stores/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ The first three of these methods all return a `RetrievalJob` specific to an offl

## Functionality Matrix

There are currently four core offline store implementations: `FileOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`.
There are currently four core offline store implementations: `DaskOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`.
There are several additional implementations contributed by the Feast community (`PostgreSQLOfflineStore`, `SparkOfflineStore`, and `TrinoOfflineStore`), which are not guaranteed to be stable or to match the functionality of the core implementations.
Details for each specific offline store, such as how to configure it in a `feature_store.yaml`, can be found [here](README.md).

Below is a matrix indicating which offline stores support which methods.

| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
| `get_historical_features` | yes | yes | yes | yes | yes | yes | yes |
| `pull_latest_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes |
Expand All @@ -42,7 +42,7 @@ Below is a matrix indicating which offline stores support which methods.

Below is a matrix indicating which `RetrievalJob`s support what functionality.

| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB |
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB |
| --------------------------------- | --- | --- | --- | --- | --- | --- | --- | --- |
| export to dataframe | yes | yes | yes | yes | yes | yes | yes | yes |
| export to arrow table | yes | yes | yes | yes | yes | yes | yes | yes |
Expand Down
8 changes: 0 additions & 8 deletions sdk/python/docs/source/feast.infra.contrib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,6 @@ feast.infra.contrib package
Submodules
----------

feast.infra.contrib.azure\_provider module
------------------------------------------

.. automodule:: feast.infra.contrib.azure_provider
:members:
:undoc-members:
:show-inheritance:

feast.infra.contrib.grpc\_server module
---------------------------------------

Expand Down
2 changes: 0 additions & 2 deletions sdk/python/docs/source/feast.infra.feature_servers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ Subpackages
.. toctree::
:maxdepth: 4

feast.infra.feature_servers.aws_lambda
feast.infra.feature_servers.gcp_cloudrun
feast.infra.feature_servers.local_process
feast.infra.feature_servers.multicloud

Expand Down
12 changes: 6 additions & 6 deletions sdk/python/docs/source/feast.infra.offline_stores.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,18 @@ feast.infra.offline\_stores.bigquery\_source module
:undoc-members:
:show-inheritance:

feast.infra.offline\_stores.duckdb module
-----------------------------------------
feast.infra.offline\_stores.dask module
---------------------------------------

.. automodule:: feast.infra.offline_stores.duckdb
.. automodule:: feast.infra.offline_stores.dask
:members:
:undoc-members:
:show-inheritance:

feast.infra.offline\_stores.file module
---------------------------------------
feast.infra.offline\_stores.duckdb module
-----------------------------------------

.. automodule:: feast.infra.offline_stores.file
.. automodule:: feast.infra.offline_stores.duckdb
:members:
:undoc-members:
:show-inheritance:
Expand Down
1 change: 0 additions & 1 deletion sdk/python/docs/source/feast.infra.registry.contrib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ Subpackages
:maxdepth: 4

feast.infra.registry.contrib.azure
feast.infra.registry.contrib.postgres

Module contents
---------------
Expand Down
24 changes: 0 additions & 24 deletions sdk/python/docs/source/feast.infra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,6 @@ Subpackages
Submodules
----------

feast.infra.aws module
----------------------

.. automodule:: feast.infra.aws
:members:
:undoc-members:
:show-inheritance:

feast.infra.gcp module
----------------------

.. automodule:: feast.infra.gcp
:members:
:undoc-members:
:show-inheritance:

feast.infra.infra\_object module
--------------------------------

Expand All @@ -51,14 +35,6 @@ feast.infra.key\_encoding\_utils module
:undoc-members:
:show-inheritance:

feast.infra.local module
------------------------

.. automodule:: feast.infra.local
:members:
:undoc-members:
:show-inheritance:

feast.infra.passthrough\_provider module
----------------------------------------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,9 @@ def build_point_in_time_query(
full_feature_names: bool = False,
) -> str:
"""Build point-in-time query between each feature view table and the entity dataframe for PostgreSQL"""
template = Environment(loader=BaseLoader()).from_string(source=query_template)
template = Environment(autoescape=True, loader=BaseLoader()).from_string(
source=query_template
)

final_output_feature_names = list(entity_df_columns)
final_output_feature_names.extend(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,20 +39,20 @@
from feast.saved_dataset import SavedDatasetStorage
from feast.utils import _get_requested_feature_views_to_features_dict

# FileRetrievalJob will cast string objects to string[pyarrow] from dask version 2023.7.1
# DaskRetrievalJob will cast string objects to string[pyarrow] from dask version 2023.7.1
# This is not the desired behavior for our use case, so we set the convert-string option to False
# See (https://github.com/dask/dask/issues/10881#issuecomment-1923327936)
dask.config.set({"dataframe.convert-string": False})


class FileOfflineStoreConfig(FeastConfigBaseModel):
"""Offline store config for local (file-based) store"""
class DaskOfflineStoreConfig(FeastConfigBaseModel):
"""Offline store config for dask store"""

type: Literal["file"] = "file"
type: Union[Literal["dask"], Literal["file"]] = "dask"
""" Offline store type selector"""


class FileRetrievalJob(RetrievalJob):
class DaskRetrievalJob(RetrievalJob):
def __init__(
self,
evaluation_function: Callable,
Expand Down Expand Up @@ -122,7 +122,7 @@ def supports_remote_storage_export(self) -> bool:
return False


class FileOfflineStore(OfflineStore):
class DaskOfflineStore(OfflineStore):
@staticmethod
def get_historical_features(
config: RepoConfig,
Expand All @@ -133,7 +133,7 @@ def get_historical_features(
project: str,
full_feature_names: bool = False,
) -> RetrievalJob:
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
for fv in feature_views:
assert isinstance(fv.batch_source, FileSource)

Expand Down Expand Up @@ -283,7 +283,7 @@ def evaluate_historical_retrieval():

return entity_df_with_features.persist()

job = FileRetrievalJob(
job = DaskRetrievalJob(
evaluation_function=evaluate_historical_retrieval,
full_feature_names=full_feature_names,
on_demand_feature_views=OnDemandFeatureView.get_requested_odfvs(
Expand All @@ -309,7 +309,7 @@ def pull_latest_from_table_or_query(
start_date: datetime,
end_date: datetime,
) -> RetrievalJob:
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
assert isinstance(data_source, FileSource)

# Create lazy function that is only called from the RetrievalJob object
Expand Down Expand Up @@ -372,7 +372,7 @@ def evaluate_offline_job():
return source_df[list(columns_to_extract)].persist()

# When materializing a single feature view, we don't need full feature names. On demand transforms aren't materialized
return FileRetrievalJob(
return DaskRetrievalJob(
evaluation_function=evaluate_offline_job,
full_feature_names=False,
)
Expand All @@ -387,10 +387,10 @@ def pull_all_from_table_or_query(
start_date: datetime,
end_date: datetime,
) -> RetrievalJob:
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
assert isinstance(data_source, FileSource)

return FileOfflineStore.pull_latest_from_table_or_query(
return DaskOfflineStore.pull_latest_from_table_or_query(
config=config,
data_source=data_source,
join_key_columns=join_key_columns
Expand All @@ -410,7 +410,7 @@ def write_logged_features(
logging_config: LoggingConfig,
registry: BaseRegistry,
):
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
destination = logging_config.destination
assert isinstance(destination, FileLoggingDestination)

Expand Down Expand Up @@ -441,7 +441,7 @@ def offline_write_batch(
table: pyarrow.Table,
progress: Optional[Callable[[int], Any]],
):
assert isinstance(config.offline_store, FileOfflineStoreConfig)
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
assert isinstance(feature_view.batch_source, FileSource)

pa_schema, column_names = get_pyarrow_schema_from_batch_source(
Expand Down
4 changes: 3 additions & 1 deletion sdk/python/feast/infra/offline_stores/offline_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,9 @@ def build_point_in_time_query(
full_feature_names: bool = False,
) -> str:
"""Build point-in-time query between each feature view table and the entity dataframe for Bigquery and Redshift"""
template = Environment(loader=BaseLoader()).from_string(source=query_template)
template = Environment(autoescape=True, loader=BaseLoader()).from_string(
source=query_template
)

final_output_feature_names = list(entity_df_columns)
final_output_feature_names.extend(
Expand Down
7 changes: 4 additions & 3 deletions sdk/python/feast/repo_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@
}

OFFLINE_STORE_CLASS_FOR_TYPE = {
"file": "feast.infra.offline_stores.file.FileOfflineStore",
"file": "feast.infra.offline_stores.dask.DaskOfflineStore",
"dask": "feast.infra.offline_stores.dask.DaskOfflineStore",
"bigquery": "feast.infra.offline_stores.bigquery.BigQueryOfflineStore",
"redshift": "feast.infra.offline_stores.redshift.RedshiftOfflineStore",
"snowflake.offline": "feast.infra.offline_stores.snowflake.SnowflakeOfflineStore",
Expand Down Expand Up @@ -205,7 +206,7 @@ def __init__(self, **data: Any):
self.registry_config = data["registry"]

self._offline_store = None
self.offline_config = data.get("offline_store", "file")
self.offline_config = data.get("offline_store", "dask")

self._online_store = None
self.online_config = data.get("online_store", "sqlite")
Expand Down Expand Up @@ -348,7 +349,7 @@ def _validate_offline_store_config(cls, values: Any) -> Any:

# Set the default type
if "type" not in values["offline_store"]:
values["offline_store"]["type"] = "file"
values["offline_store"]["type"] = "dask"

offline_store_type = values["offline_store"]["type"]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
from feast.data_format import DeltaFormat, ParquetFormat
from feast.data_source import DataSource
from feast.feature_logging import LoggingDestination
from feast.infra.offline_stores.dask import DaskOfflineStoreConfig
from feast.infra.offline_stores.duckdb import DuckDBOfflineStoreConfig
from feast.infra.offline_stores.file import FileOfflineStoreConfig
from feast.infra.offline_stores.file_source import (
FileLoggingDestination,
SavedDatasetFileStorage,
Expand Down Expand Up @@ -84,7 +84,7 @@ def get_prefixed_table_name(self, suffix: str) -> str:
return f"{self.project_name}.{suffix}"

def create_offline_store_config(self) -> FeastConfigBaseModel:
return FileOfflineStoreConfig()
return DaskOfflineStoreConfig()

def create_logged_features_destination(self) -> LoggingDestination:
d = tempfile.mkdtemp(prefix=self.project_name)
Expand Down Expand Up @@ -334,7 +334,7 @@ def get_prefixed_table_name(self, suffix: str) -> str:
return f"{suffix}"

def create_offline_store_config(self) -> FeastConfigBaseModel:
return FileOfflineStoreConfig()
return DaskOfflineStoreConfig()

def teardown(self):
self.minio.stop()
Expand Down
Loading

0 comments on commit 5f34a59

Please sign in to comment.