Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Rework get_online_features helper functions #5060

Merged

Conversation

wckdman
Copy link
Contributor

@wckdman wckdman commented Feb 15, 2025

What this PR does / why we need it:

Is some use cases, for example in recommender systems, we need to retrieve features for hundreds and thousands of entities. As stated in the issue #3596, current implementation has a significant overhead.

I ran a benchmark test for 1000 entities with profiling and noticed, that _populate_response_from_feature_data and _convert_rows_to_protobuf functions are quite heavy.

A have several proposals for improvement:

  • _populate_response_from_feature_data has 3 calls of apply_list_mapping where we calculate output_len = sum(len(item) for item in mapping_indexes) which is time consuming. We could calculate output_len only once, when we deduplicate entities. And we could combine all 3 vectors into response in one loop
  • We get feature_data from _convert_rows_to_protobuf and start re-formatting it with zip(zip)s. We could construct feature_data exactly in the desired format and get rid of this unpacking.

This test should cover all the above changes.

Which issue(s) this PR fixes:

Misc

For experimentation I used current benchmark tooling with some adjustments. What I changed:

  • tested different amount of entities from 10 to 2000
  • increased number of iterations to 3 and number of rounds to 50
  • tested against redis and sqlite

Here's results for the latest release (time in ms):

Number of entities OnlineStore Min Max Mean StdDev Median
10 sqlite 4.4021 (1.0) 5.0044 (1.0) 4.6377 (1.0) 0.1576 (1.0) 4.6037 (1.0)
10 redis 5.9276 (1.35) 7.7548 (1.55) 6.4771 (1.40) 0.3828 (2.43) 6.4391 (1.40)
100 sqlite 8.3776 (1.0) 10.5403 (1.0) 8.9139 (1.0) 0.4123 (1.0) 8.8378 (1.0)
100 redis 10.6544 (1.27) 13.3852 (1.27) 11.6403 (1.31) 0.4281 (1.04) 11.6137 (1.31)
1000 sqlite 49.6249 (1.0) 104.2753 (1.0) 84.5442 (1.0) 24.7145 (1.00) 101.4559 (1.0)
1000 redis 59.1902 (1.19) 115.1285 (1.10) 92.3902 (1.09) 24.6651 (1.0) 110.0397 (1.08)
2000 sqlite 142.5302 (1.0) 208.4495 (1.0) 178.5956 (1.0) 26.7520 (1.04) 176.8854 (1.01)
2000 redis 168.0299 (1.18) 241.9603 (1.16) 195.3997 (1.09) 25.6117 (1.0) 174.3306 (1.0)

And here's results for proposed changes. Performance for 1000 and more entities is better:

Number of entities OnlineStore Min Max Mean StdDev Median
10 sqlite 4.4178 (1.0) 5.1714 (1.0) 4.6655 (1.0) 0.1594 (1.0) 4.6369 (1.0)
10 redis 6.1177 (1.38) 8.8817 (1.72) 6.6482 (1.42) 0.4161 (2.61) 6.5609 (1.41)
100 sqlite 8.3808 (1.0) 9.7838 (1.0) 8.8807 (1.0) 0.2814 (1.0) 8.9061 (1.0)
100 redis 11.3189 (1.35) 61.7919 (6.32) 12.8694 (1.45) 7.0757 (25.15) 11.8019 (1.33)
1000 sqlite 51.3373 (1.0) 105.2218 (1.0) 75.3780 (1.0) 25.4122 (1.03) 53.5322 (1.0)
1000 redis 62.3018 (1.21) 116.3195 (1.11) 88.0296 (1.17) 24.7718 (1.0) 67.4150 (1.26)
2000 sqlite 139.8804 (1.0) 201.7754 (1.0) 155.1063 (1.0) 18.3177 (1.0) 148.7958 (1.0)
2000 redis 163.8357 (1.17) 222.4737 (1.10) 179.5300 (1.16) 21.5222 (1.17) 168.8162 (1.13)

Signed-off-by: Artem Petrov <58334441+wckdman@users.noreply.github.com>
Signed-off-by: Artem Petrov <58334441+wckdman@users.noreply.github.com>
@wckdman wckdman force-pushed the rework-redis-get-online-features branch from 60e3686 to 4201b72 Compare February 15, 2025 22:22
@@ -1244,35 +1242,34 @@ def _get_entity_key_protos(

def _convert_rows_to_protobuf(
requested_features: List[str],
read_rows: List[Tuple[Optional[datetime], Optional[Dict[str, ValueProto]]]],
read_rows: List[Tuple[datetime | None, dict[str, ValueProto] | None]],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the Optional syntax to b e consistent with the rest of the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this.
Thank you!

Signed-off-by: Artem Petrov <58334441+wckdman@users.noreply.github.com>
Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@franciscojavierarceo franciscojavierarceo merged commit 6bf7516 into feast-dev:master Feb 16, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants