Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support input with multiple categories #502

Open
joshuali925 opened this issue Oct 26, 2022 · 1 comment
Open

[FEATURE] Support input with multiple categories #502

joshuali925 opened this issue Oct 26, 2022 · 1 comment
Labels
enhancement New feature or request feature

Comments

@joshuali925
Copy link
Member

Is your feature request related to a problem?
This is for AD command in PPL. Sometimes the inputs will have a tag/category, and data within each category should be separately analyzed. For example

[
  {"category": "A", "value": 1},
  {"category": "A", "value": 2},
  {"category": "B", "value": 20},
  {"category": "B", "value": 40}
]

And user would like AD to evaluate by category, but source=nyc_taxi | fields category, value | AD doesn't work.

What solution would you like?
Support source=nyc_taxi | fields category, value | AD category_field='category' or something similar so that input is categorized then predicted. The category value is irrelevant for prediction.

What alternatives have you considered?
Temp workaround in PPL opensearch-project/sql#952

Do you have any additional context?
I'll try to work on this before 2.4 release, but not sure if i have the time

@ylwu-amzn
Copy link
Collaborator

@joshuali925 , hi, Joshua, this is a good feature. If you have bandwidth, welcome to contribute!
I can see some challenges by supporting such categorized input data.

  1. Memory pressure. Multi-category will bring more data, that will increase memory usage. I think we should limit category numbers in request.
  2. Latency. If we run each category one by one, the latency will be linearly increased. We can run several categories in parallel to speed up.
  3. How to support multiple category fields, for example {"category1": "A","category2": "B", "value": 1},. Maybe we can start from supporting only 1 category field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature
Projects
None yet
Development

No branches or pull requests

3 participants