Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Implement Databend UDF Server (OSPP) #11517

Closed
xudong963 opened this issue May 19, 2023 · 3 comments
Closed

Feature: Implement Databend UDF Server (OSPP) #11517

xudong963 opened this issue May 19, 2023 · 3 comments
Labels
C-feature Category: feature ospp-2023

Comments

@xudong963
Copy link
Member

xudong963 commented May 19, 2023

Summary

  1. Writing an RFC for the UDF server.
  2. Implementing an MVP that provides the functionality to create, delete, and execute UDFs, and writing relevant tests.
  3. Creating an example using Python.

Mentor @xudong963

@xudong963 xudong963 added C-feature Category: feature ospp-2023 labels May 19, 2023
@manulpatel
Copy link

Hello @xudong963! I would like to implement this feature as a part of OSPP. Could you please give some hints on any specifics to be included in the proposal?

@Jacob953
Copy link

Jacob953 commented Jun 4, 2023

Based on the interpretation of bilibili, this proposal could be divided into 3 parts.

Modify databend kernel:

  • Extend the "create function" syntax to support Python UDF Server.
  • Make data transfer between Databend kernel and Python UDF Server using Arrow Flight.
  • Realize the communication between Eval Expression and Python Server to obtain calculation results.

Implement Python UDF Server:

  • Design relevant information such as the address and port of the Python UDF Server.
  • Implement the registration mechanism to recognize and call functions in the Python UDF Server.
  • Design and implement the mechanism to start the Python UDF Server.

Also, design and execute test cases to ensure that the functions and performance of the modified Databend kernel and Python UDF Server meet the expected requirements.

@gitccl
Copy link
Contributor

gitccl commented Jun 7, 2023

I plan to implement Databend UDF Server from the following three aspects:

  1. Implement Python UDF Server:
  • Arrow Flight is used to communicate between Databend and Python UDF Server.
  • Implement the Python UDF Server to receive the request from the Databend and call the registered function to perform the calculation, then return the result of the calculation.
  • Provide users with a Python SDK, so that users can easily write Python UDF Server. Users only need to provide the Server address, register custom functions, and start the Server to complete a Python UDF Server. Like following:
    from databend.udf import udf, UdfServer
    
    # Define a scalar function
    @udf(input_types=['INT', 'INT'], result_type='INT')
    def gcd(x, y):
        while y != 0:
            (x, y) = (y, x % y)
        return x
    
    # Start a UDF server
    if __name__ == '__main__':
        server = UdfServer(location="0.0.0.0:8815")
        server.add_function(gcd)
        server.serve()
  1. Implement the management of UDF Server in Databend:
  • Extend the syntax and functionality of the CREATE FUNCTION to support creating a UDF Server and storing the information of the UDF Server into the meta.
  • Extend the ALTER FUNCTION and DELETE FUNCTION to enable Datebend to support UDF Server modification and deletion.
  1. Implement the execution of UDF Server in Databend
  • The call to UDF Server is resolved into a ScalarExpr::UDFServerCall in the Binder phase, where the UDFServerCall struct is defined as follows:

    pub struct UDFServerCall {
        pub span: Span,
        pub func_name: String,
        pub server_addr: String,
        pub arguments: Vec<ScalarExpr>,
    }
  • Implement the Databend kernel to call the Python UDF Server and obtain the execution result in the Evaluator execution phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: feature ospp-2023
Projects
None yet
Development

No branches or pull requests

4 participants