Skip to content

get_online_features forces online stores to make multiple network calls #3259

Open
@chhabrakadabra

Description

@chhabrakadabra

Is your feature request related to a problem? Please describe.

When using get_online_features, if features from multiple feature views are requested, the FeatureStore._get_online_features implementation groups the features by the feature-view they belong to and calls the online store's online_read method for each of those groups sequentially.

This means that if the online store has the ability to fetch all the data for all the feature-views in one network call, that opportunity is not provided to the online store. For example, in the upcoming BigTable online store implementation, data for different feature-views that share the same entities is colocated. I strongly suspect that it would be faster to fetch all the data in one go than it would be to make multiple sequential network calls.

Describe the solution you'd like

One solution would be to allow the OnlineStore to have a separate method (perhaps called online_read_all or something) that could get all the information regarding all the feature_views and the requested features in one shot (instead of splitting that information across the multiple calls to online_read). If the OnlineStore does not implement the online_read_all method, then we can default to calling the various online_read methods sequentially. This way, we don't break the API of the current online stores.

Additionally, in the FeatureStore._get_online_features method, we could potentially consider performing all the online_reads concurrently. This way, all the existing online stores will experience a potentially free performance lift.

Describe alternatives you've considered

Nothing is currently broken, so the alternative is to just not do anything.

Alternatively, a worse solution would be to provide a hint to the existing online_read method about the other feature views and their requested features. The online_read method could then make a single API call for all of it, cache it somewhere and return only the relevant parts. I'm not too keen on this idea though, since a cleaner API option is still available.

Another alternative would be for any of the online stores to just get data for all the feature views ahead of time (for the same entity keys) in one network call and cache it. I'm not 100% sure if this would actually lead to a reasonable performance gain in all scenarios since there's inherent waste here.

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions