Skip to content

Newest updates break DaskOfflineStore with S3 parquets #4753

Open
@bjmccotter7192

Description

Expected Behavior

In version 0.40.1 the Dask Offline store was able to read the data_source.path directly from the FileSource and retrieve the data from S3 using a path like: s3://<your-bucket>/<file-name>

Current Behavior

Failing to pull data because it is now appending the repo_path to the front of the s3 url.

Example:
/tmp/feast:s3//<your-bucket>/<file-name>

I believe this is because of a recent change: #4624 which is now not accepting the S3 url as a absolute Path

Steps to reproduce

  • Rebuilt my environment with latest tagged version 0.41.3
  • Reran my get_historical_features and call hung for a while then errored with the file path error not existing

Specifications

  • Version: 0.41.3
  • Platform: Linux
  • Subsystem: Debian

Possible Solution

  • Revert that change or allow a flag that would be able to bypass that breaking change
  • IF storage_options NOT None, Read parquet directly

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions