-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substrait-based on demand feature views #3945
Comments
This can be a really good feature. |
That's a great question, thanks. If we're talking about how it should be stored in the registry:
If we're talking about user-facing interface, I agree that we can make
The reason why we can't do this now is that ibis follows NEP 29 and has dropped python 3.8 support a while ago. If we were to depend on ibis right now, it would be on version |
Is your feature request related to a problem? Please describe.
On demand feature views as implemented right now are very limited. The only way to specify odfvs is through a python function that takes in pandas Dataframe as input and outputs another pandas Dataframe. This leads to problems for both offline and online interfaces:
Describe the solution you'd like
Allow constructing odfvs as substrait plans. Substrait is a protobuf-based serialization format for relational algebra operations. It is meant to be used as a cross-language and cross-engine format for sharing logical or physical execution plans. It has a number of producers (tools that can generate substrait) and consumers (engines that can run substrait) in different languages.
The example code in my PoC implementation looks something like this:
Substait plan object that feast accepts is
bytes
and introduces no external dependency. I'm usingibis
andibis-substrait
to generate the plan. Right now that's the most practical way to generate substrait plan in python with DataFrame-like API, but this could have been any other substrait producer.Describe alternatives you've considered
An obvious alternative to substrait is sql-based odfvs, but using SQL has a number of important downsides:
Having said that, it probably makes sense to support both substrait-based and sql-based odfvs, because at the moment it might be easier for sql-based logic to be incorporated inside offline store engines.
The text was updated successfully, but these errors were encountered: