Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. #71

spbjss · 2021-08-18T05:05:36Z

Description

The training and prediction APIs extends the RestSearchAction, which will check the XContent of the request and try to build search condition based on the XContent.
If we define the ml_parameters through XContent, the SearchSourceBuilder will run failed to build the search source with parsing_exception "Unknown key for a START_OBJECT in [ml_parameters]."
Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input.

For example, before this PR, below is how would the ml_parameters looks like:

curl -H "Content-type: application/json" -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&q=k1:*&pretty=true' -d '
{
    "ml_parameters": {
        "paramName1": "value1",
        "paramName2": 123
    }
}'

After received this request, the RestSearchAction will try to build the search condition based on:

{
    "ml_parameters": {
        "paramName1": "value1",
        "paramName2": 123
    }
}

The RestSearchAction will run failed because of it's not a valid search expression, thus the client will get parsing_exception with the reason: "Unknown key for a START_OBJECT in [ml_parameters]"

Issues Resolved

#70

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Alex <pengsun@amazon.com>

the conflict with search XContent input. Signed-off-by: Alex <pengsun@amazon.com>

ylwu-amzn · 2021-08-19T20:35:06Z

Can you add some input examples to show why we have to move to request parameters?

wqixian · 2021-08-19T23:14:22Z

Hi Alex, as we discussed, could you also take a look in MLPredictionTaskRequest and make sure the list of ml parameters is handled properly? In writeTo(), if the list is null, writeList() would cause an error. This might also apply to any tasks with the ML parameters list. Thanks!

spbjss · 2021-08-25T02:38:13Z

Hi Alex, as we discussed, could you also take a look in MLPredictionTaskRequest and make sure the list of ml parameters is handled properly? In writeTo(), if the list is null, writeList() would cause an error. This might also apply to any tasks with the ML parameters list. Thanks!

Sure, I will submit a separate PR to fix the issue you mentioned.

spbjss · 2021-08-25T02:46:50Z

Can you add some input examples to show why we have to move to request parameters?

Example added to the PR description.

ylwu-amzn · 2021-08-25T17:17:26Z

General question not related with this fix: is it enough to support queries with URL request parameters only? In AD, we support filter in detector. Not sure if we can use similar way to let user pass filter in http body.

spbjss · 2021-08-25T17:36:08Z

General question not related with this fix: is it enough to support queries with URL request parameters only? In AD, we support filter in detector. Not sure if we can use similar way to let user pass filter in http body.

Oh, after this change:

The ml_parameters (which is required by the models) will be the HTTP request parameters.
The search queries will support two different way:

a. XContent of the HTTP request, e.g.

{
    "query": {
        "range" : {
            "feet": {
                "gte": 4000.0,
                "lte": 4500.0
            }
        }
    },
    "_source" : ["feet","price"]
}'

b. HTTP request parameter, for example: &q=k1:*
curl -H "Content-type: application/json" -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&q=k1:*&pretty=true'

For example, we could run the following request with ml_parameters in HTTP request parameter, and with query in the HTTP request XContent (JSON string)

curl -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&pretty=true&ml_parameters={"paramName1":"value1","paramName2":123}' -d '
{
    "query": {
        "range" : {
            "feet": {
                "gte": 4000.0,
                "lte": 4500.0
            }
        }
    },
    "_source" : ["feet","price"]
}'

ylwu-amzn · 2021-08-25T21:33:09Z

General question not related with this fix: is it enough to support queries with URL request parameters only? In AD, we support filter in detector. Not sure if we can use similar way to let user pass filter in http body.

Oh, after this change:

The ml_parameters (which is required by the models) will be the HTTP request parameters.

The search queries will support two different way:

a. XContent of the HTTP request, e.g.
{
    "query": {
        "range" : {
            "feet": {
                "gte": 4000.0,
                "lte": 4500.0
            }
        }
    },
    "_source" : ["feet","price"]
}'
b. HTTP request parameter, for example: &q=k1:*
curl -H "Content-type: application/json" -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&q=k1:*&pretty=true'

For example, we could run the following request with ml_parameters in HTTP request parameter, and with query in the HTTP request XContent (JSON string)
curl -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&pretty=true&ml_parameters={"paramName1":"value1","paramName2":123}' -d '
{
    "query": {
        "range" : {
            "feet": {
                "gte": 4000.0,
                "lte": 4500.0
            }
        }
    },
    "_source" : ["feet","price"]
}'

As we discussed offline, suggest to use such JSON format for input

{
  "filter_query": {
    "query": {
      "range": {
        "feet": {
          "gte": 4000,
          "lte": 4500
        }
      }
    },
    "_source": [
      "feet",
      "price"
    ]
  },
  "ml_parameters": {
    "paramName1": "value1",
    "paramName2": 123
  }
}

We can write custom logic to parse the input, if we see filter_query, we can parse it to OpenSearch query; if the filed name is ml_parameters, we can parse it to ML parameter.

Put ml_parameters in URL parameter has some cons

Not scalable, if some model needs more parameters, it may exceed the URL length limit.
Not so secure as the ML parameters will be visible if put in URL parameters.

Approve this PR now to unblock your testing. You can fix this later.

…id the conflict with search XContent input. (#71) (#77) * Create JvmService instance on demand. Signed-off-by: Alex <pengsun@amazon.com> * Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. Signed-off-by: Alex <pengsun@amazon.com> Co-authored-by: Alex <pengsun@amazon.com> Co-authored-by: Alex <pengsun@amazon.com>

…id the conflict with search XContent input. (#71) * Create JvmService instance on demand. Signed-off-by: Alex <pengsun@amazon.com> * Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. Signed-off-by: Alex <pengsun@amazon.com> Co-authored-by: Alex <pengsun@amazon.com>

Alex and others added 3 commits August 10, 2021 16:48

Create JvmService instance on demand.

db1a026

Signed-off-by: Alex <pengsun@amazon.com>

Move the ml_parameters from XContent to the request parameters to avoid

4af2aac

the conflict with search XContent input. Signed-off-by: Alex <pengsun@amazon.com>

Merge branch 'opensearch-project:develop' into develop

4a5cdd4

spbjss requested review from kaituo, ylwu-amzn, ohltyler and wqixian August 18, 2021 05:05

ylwu-amzn approved these changes Aug 25, 2021

View reviewed changes

spbjss merged commit 5bbf8d9 into opensearch-project:develop Aug 25, 2021

spbjss mentioned this pull request Sep 3, 2021

Move the ml_parameters from XContent to the request parameters to avo… #77

Merged

5 tasks

ylwu-amzn added the refactor label Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. #71

Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. #71

spbjss commented Aug 18, 2021 •

edited

Loading

ylwu-amzn commented Aug 19, 2021

wqixian commented Aug 19, 2021

spbjss commented Aug 25, 2021

spbjss commented Aug 25, 2021

ylwu-amzn commented Aug 25, 2021

spbjss commented Aug 25, 2021 •

edited

Loading

ylwu-amzn commented Aug 25, 2021 •

edited

Loading

Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. #71

Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. #71

Conversation

spbjss commented Aug 18, 2021 • edited Loading

Description

Issues Resolved

Check List

ylwu-amzn commented Aug 19, 2021

wqixian commented Aug 19, 2021

spbjss commented Aug 25, 2021

spbjss commented Aug 25, 2021

ylwu-amzn commented Aug 25, 2021

spbjss commented Aug 25, 2021 • edited Loading

ylwu-amzn commented Aug 25, 2021 • edited Loading

spbjss commented Aug 18, 2021 •

edited

Loading

spbjss commented Aug 25, 2021 •

edited

Loading

ylwu-amzn commented Aug 25, 2021 •

edited

Loading