Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] About an idea to add a UDF management module for StreamPark #1782

Open
3 tasks done
green241 opened this issue Oct 9, 2022 · 3 comments
Open
3 tasks done

Comments

@green241
Copy link
Contributor

green241 commented Oct 9, 2022

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

I have been in contact with StreamPark for a while now and have had a pretty good experience in terms of ease of use and stability. Currently, StreamPark itself supports UDF functions, but there doesn't seem to be a unified UDF management menu, so it is recommended to add a new UDF management menu for UDF management. The main purpose is to.

  1. implement a unified management of new UDFs created by users (providing the main API for manipulating UDF objects).

  2. Currently, SP uses UDF to upload UDF jar when creating jobs, but in the process of actual use, users may encounter some problems, such as: it is not clear what UDFs were created before, what the identififier is, etc. Therefore, the UDF management module can be used to solve these problems.

Usage Scenario

Note:

  1. This feature is currently only implemented based on SQL jobs in Yarn Application mode;
  2. the JAR is saved on top of HDFS.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@green241
Copy link
Contributor Author

green241 commented Oct 10, 2022

The current plan is mainly based on the yarn application model, so the following is mainly idea of implement.

  1. When creating a job, select the required UDF (e.g., a drop-down box showing the UDF available to the current user, associated with udfId);

  2. When starting a job, it will query the paths of these udf stores according to the selected udfId (there can be more than one), and at the same time stitch these storage paths into strings, and finally pass them into yarn.provided.lib.dirs when submitting the job to achieve dynamic loading.

@datayangl
Copy link

Actually, your plan is pretty much like zepplin's way of managing udf. I would like to contribute. Firstly, overall design for udf management.Secondly, stages to implement.

@green241
Copy link
Contributor Author

hi datayangl,

  • So nice,warmly welcome.
  • After the next version , we may make a discussion in weekly-meeting including the proposal、solutions etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants