Don't load entire bigquery query results in memory #1638

tsotnet · 2021-06-11T01:56:54Z

Signed-off-by: Tsotne Tabidze tsotne@tecton.ai

What this PR does / why we need it: In BigQueryOfflineStore.get_historical_features Feast calls entity_df_job.result() which causes the entity data frame to be loaded in memory. We only need to wait for the job to be done and for the schema to be returned, while not changing the existing query (because the results of the query are being used for joining with offline feature data). This PR changes the call to entity_df_job.result(max_results=0), which at the same time:

Executes the same query in BigQuery, so the table that gets joined does not change
Does not load any of the query rows in memory
Still allows the schema to be acquired from the resulting object

I checked all other places where we call .result() in the codebase, but none of them needed fixing due to couple of different reasons (happy to elaborate if anyone is curious).

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Fix BigQuery entity dataframe SQL query results being completely loaded in memory

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

codecov-commenter · 2021-06-11T01:59:37Z

Codecov Report

Merging #1638 (2e66974) into master (0d7e858) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1638      +/-   ##
==========================================
+ Coverage   83.59%   83.60%   +0.01%     
==========================================
  Files          67       67              
  Lines        6017     6027      +10     
==========================================
+ Hits         5030     5039       +9     
- Misses        987      988       +1

Flag	Coverage Δ
integrationtests	`83.52% <100.00%> (+0.01%)`	⬆️
unittests	`76.43% <0.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdk/python/feast/infra/offline_stores/bigquery.py	`92.48% <100.00%> (ø)`
sdk/python/tests/test_e2e_local.py	`100.00% <0.00%> (ø)`
sdk/python/feast/infra/offline_stores/file.py	`96.70% <0.00%> (+0.03%)`	⬆️
sdk/python/feast/errors.py	`66.66% <0.00%> (+0.87%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d7e858...2e66974. Read the comment docs.

achals

/lgtm

feast-ci-bot · 2021-06-11T06:21:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, tsotnet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [achals,tsotnet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

Don't load entire bigquery query results in memory

2e66974

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

tsotnet added the kind/bug label Jun 11, 2021

tsotnet requested a review from woop June 11, 2021 01:56

tsotnet requested review from achals and a team as code owners June 11, 2021 01:56

feast-ci-bot added release-note approved size/XS labels Jun 11, 2021

achals approved these changes Jun 11, 2021

View reviewed changes

feast-ci-bot assigned achals Jun 11, 2021

feast-ci-bot added the lgtm label Jun 11, 2021

feast-ci-bot merged commit 731bca7 into feast-dev:master Jun 11, 2021

tsotnet mentioned this pull request Jun 11, 2021

Don't use .result() in BigQueryOfflineStore, since it still leads to OOM #1642

Merged

tsotnet pushed a commit that referenced this pull request Jun 17, 2021

Don't load entire bigquery query results in memory (#1638)

90ada3c

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't load entire bigquery query results in memory #1638

Don't load entire bigquery query results in memory #1638

tsotnet commented Jun 11, 2021

codecov-commenter commented Jun 11, 2021 •

edited

Loading

achals left a comment

feast-ci-bot commented Jun 11, 2021

Don't load entire bigquery query results in memory #1638

Don't load entire bigquery query results in memory #1638

Conversation

tsotnet commented Jun 11, 2021

codecov-commenter commented Jun 11, 2021 • edited Loading

Codecov Report

achals left a comment

Choose a reason for hiding this comment

feast-ci-bot commented Jun 11, 2021

codecov-commenter commented Jun 11, 2021 •

edited

Loading