-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add column reordering to write_to_offline_store
#2876
feat: Add column reordering to write_to_offline_store
#2876
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2876 +/- ##
==========================================
- Coverage 80.68% 80.59% -0.09%
==========================================
Files 176 176
Lines 15670 15663 -7
==========================================
- Hits 12643 12624 -19
- Misses 3027 3039 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
||
class SparkProcessorConfig(ProcessorConfig): | ||
spark_session: SparkSession | ||
processing_time: str | ||
query_timeout: int | ||
processing_time: str = "30 seconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't set a default here since we have no clue what the correct window should be. Should force the user to set the processing window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
processing_time: str | ||
query_timeout: int | ||
processing_time: str = "30 seconds" | ||
query_timeout: int = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
from feast.infra.contrib.stream_processor import ( | ||
ProcessorConfig, | ||
StreamProcessor, | ||
StreamTable, | ||
) | ||
from feast.stream_feature_view import StreamFeatureView | ||
|
||
if TYPE_CHECKING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per offline conversation, this is dangerous. If we ever want to move the functionality into a supported passthrough function in feature store, this is a circular dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized it actually isn't circular lol, updating
sdk/python/feast/feature_store.py
Outdated
) | ||
source_columns = [column for column, _ in column_names_and_types] | ||
source_columns = [ | ||
column for column in source_columns if not re.match("__|__$", column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are columns w/. underscores what is the behavior here? Does it just auto fail? I'm confused about why we need to do this check, are we not writing to these internal columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this isn't necessary; good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
da60e79
to
b75883a
Compare
…ther tests Signed-off-by: Felix Wang <wangfelix98@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: felixwang9817, kevjumba The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Add feature extraction logic to batch writer Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Enable StreamProcessor to write to both online and offline stores Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Fix incorrect columns error message Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Reorder columns in _write_to_offline_store Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Make _write_to_offline_store a public method Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Import FeatureStore correctly Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Remove defaults for `processing_time` and `query_timeout` Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Clean up `test_offline_write.py` Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Do not do any custom logic for double underscore columns Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Lint Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Switch entity values for all tests using push sources to not affect other tests Signed-off-by: Felix Wang <wangfelix98@gmail.com>
# [0.23.0](v0.22.0...v0.23.0) (2022-08-02) ### Bug Fixes * Add dummy alias to pull_all_from_table_or_query ([#2956](#2956)) ([5e45228](5e45228)) * Bump version of Guava to mitigate cve ([#2896](#2896)) ([51df8be](51df8be)) * Change numpy version on setup.py and upgrade it to resolve dependabot warning ([#2887](#2887)) ([80ea7a9](80ea7a9)) * Change the feature store plan method to public modifier ([#2904](#2904)) ([0ec7d1a](0ec7d1a)) * Deprecate 3.7 wheels and fix verification workflow ([#2934](#2934)) ([040c910](040c910)) * Do not allow same column to be reused in data sources ([#2965](#2965)) ([661c053](661c053)) * Fix build wheels workflow to install apache-arrow correctly ([#2932](#2932)) ([bdeb4ae](bdeb4ae)) * Fix file offline store logic for feature views without ttl ([#2971](#2971)) ([26f6b69](26f6b69)) * Fix grpc and update protobuf ([#2894](#2894)) ([86e9efd](86e9efd)) * Fix night ci syntax error and update readme ([#2935](#2935)) ([b917540](b917540)) * Fix nightly ci again ([#2939](#2939)) ([1603c9e](1603c9e)) * Fix the go build and use CgoArrowAllocator to prevent incorrect garbage collection ([#2919](#2919)) ([130746e](130746e)) * Fix typo in CONTRIBUTING.md ([#2955](#2955)) ([8534f69](8534f69)) * Fixing broken links to feast documentation on java readme and contribution ([#2892](#2892)) ([d044588](d044588)) * Fixing Spark min / max entity df event timestamps range return order ([#2735](#2735)) ([ac55ce2](ac55ce2)) * Move gcp back to 1.47.0 since grpcio-tools 1.48.0 got yanked from pypi ([#2990](#2990)) ([fc447eb](fc447eb)) * Refactor testing and sort out unit and integration tests ([#2975](#2975)) ([2680f7b](2680f7b)) * Remove hard-coded integration test setup for AWS & GCP ([#2970](#2970)) ([e4507ac](e4507ac)) * Resolve small typo in README file ([#2930](#2930)) ([16ae902](16ae902)) * Revert "feat: Add snowflake online store ([#2902](#2902))" ([#2909](#2909)) ([38fd001](38fd001)) * Snowflake_online_read fix ([#2988](#2988)) ([651ce34](651ce34)) * Spark source support table with pattern "db.table" ([#2606](#2606)) ([3ce5139](3ce5139)), closes [#2605](#2605) * Switch mysql log string to use regex ([#2976](#2976)) ([5edf4b0](5edf4b0)) * Update gopy to point to fork to resolve github annotation errors. ([#2940](#2940)) ([ba2dcf1](ba2dcf1)) * Version entity serialization mechanism and fix issue with int64 vals ([#2944](#2944)) ([d0d27a3](d0d27a3)) ### Features * Add an experimental lambda-based materialization engine ([#2923](#2923)) ([6f79069](6f79069)) * Add column reordering to `write_to_offline_store` ([#2876](#2876)) ([8abc2ef](8abc2ef)) * Add custom JSON table tab w/ formatting ([#2851](#2851)) ([0159f38](0159f38)) * Add CustomSourceOptions to SavedDatasetStorage ([#2958](#2958)) ([23c09c8](23c09c8)) * Add Go option to `feast serve` command ([#2966](#2966)) ([a36a695](a36a695)) * Add interfaces for batch materialization engine ([#2901](#2901)) ([38b28ca](38b28ca)) * Add pages for individual Features to the Feast UI ([#2850](#2850)) ([9b97fca](9b97fca)) * Add snowflake online store ([#2902](#2902)) ([f758f9e](f758f9e)), closes [#2903](#2903) * Add Snowflake online store (again) ([#2922](#2922)) ([2ef71fc](2ef71fc)), closes [#2903](#2903) * Add to_remote_storage method to RetrievalJob ([#2916](#2916)) ([109ee9c](109ee9c)) * Support retrieval from multiple feature views with different join keys ([#2835](#2835)) ([056cfa1](056cfa1))
What this PR does / why we need it: In addition to adding column reordering logic, this PR adds logic for extracting the latest feature values into the
SparkKafkaProcessor
.Which issue(s) this PR fixes:
Fixes #