Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. #71

Merged
merged 3 commits into from
Aug 25, 2021

Conversation

spbjss
Copy link
Contributor

@spbjss spbjss commented Aug 18, 2021

Description

The training and prediction APIs extends the RestSearchAction, which will check the XContent of the request and try to build search condition based on the XContent.
If we define the ml_parameters through XContent, the SearchSourceBuilder will run failed to build the search source with parsing_exception "Unknown key for a START_OBJECT in [ml_parameters]."
Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input.

For example, before this PR, below is how would the ml_parameters looks like:

curl -H "Content-type: application/json" -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&q=k1:*&pretty=true' -d '
{
    "ml_parameters": {
        "paramName1": "value1",
        "paramName2": 123
    }
}' 

After received this request, the RestSearchAction will try to build the search condition based on:

{
    "ml_parameters": {
        "paramName1": "value1",
        "paramName2": 123
    }
}

The RestSearchAction will run failed because of it's not a valid search expression, thus the client will get parsing_exception with the reason: "Unknown key for a START_OBJECT in [ml_parameters]"

Issues Resolved

#70

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Alex and others added 3 commits August 10, 2021 16:48
Signed-off-by: Alex <pengsun@amazon.com>
the conflict with search XContent input.

Signed-off-by: Alex <pengsun@amazon.com>
@ylwu-amzn
Copy link
Collaborator

Can you add some input examples to show why we have to move to request parameters?

@wqixian
Copy link

wqixian commented Aug 19, 2021

Hi Alex, as we discussed, could you also take a look in MLPredictionTaskRequest and make sure the list of ml parameters is handled properly? In writeTo(), if the list is null, writeList() would cause an error. This might also apply to any tasks with the ML parameters list. Thanks!

@spbjss
Copy link
Contributor Author

spbjss commented Aug 25, 2021

Hi Alex, as we discussed, could you also take a look in MLPredictionTaskRequest and make sure the list of ml parameters is handled properly? In writeTo(), if the list is null, writeList() would cause an error. This might also apply to any tasks with the ML parameters list. Thanks!

Sure, I will submit a separate PR to fix the issue you mentioned.

@spbjss
Copy link
Contributor Author

spbjss commented Aug 25, 2021

Can you add some input examples to show why we have to move to request parameters?

Example added to the PR description.

@ylwu-amzn
Copy link
Collaborator

General question not related with this fix: is it enough to support queries with URL request parameters only? In AD, we support filter in detector. Not sure if we can use similar way to let user pass filter in http body.

@spbjss
Copy link
Contributor Author

spbjss commented Aug 25, 2021

General question not related with this fix: is it enough to support queries with URL request parameters only? In AD, we support filter in detector. Not sure if we can use similar way to let user pass filter in http body.

Oh, after this change:

  1. The ml_parameters (which is required by the models) will be the HTTP request parameters.
  2. The search queries will support two different way:

a. XContent of the HTTP request, e.g.

{
    "query": {
        "range" : {
            "feet": {
                "gte": 4000.0,
                "lte": 4500.0
            }
        }
    },
    "_source" : ["feet","price"]
}'

b. HTTP request parameter, for example: &q=k1:*
curl -H "Content-type: application/json" -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&q=k1:*&pretty=true'

For example, we could run the following request with ml_parameters in HTTP request parameter, and with query in the HTTP request XContent (JSON string)

curl -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&pretty=true&ml_parameters={"paramName1":"value1","paramName2":123}' -d '
{
    "query": {
        "range" : {
            "feet": {
                "gte": 4000.0,
                "lte": 4500.0
            }
        }
    },
    "_source" : ["feet","price"]
}'

@ylwu-amzn
Copy link
Collaborator

ylwu-amzn commented Aug 25, 2021

General question not related with this fix: is it enough to support queries with URL request parameters only? In AD, we support filter in detector. Not sure if we can use similar way to let user pass filter in http body.

Oh, after this change:

  1. The ml_parameters (which is required by the models) will be the HTTP request parameters.
  2. The search queries will support two different way:

a. XContent of the HTTP request, e.g.

{
    "query": {
        "range" : {
            "feet": {
                "gte": 4000.0,
                "lte": 4500.0
            }
        }
    },
    "_source" : ["feet","price"]
}'

b. HTTP request parameter, for example: &q=k1:*
curl -H "Content-type: application/json" -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&q=k1:*&pretty=true'

For example, we could run the following request with ml_parameters in HTTP request parameter, and with query in the HTTP request XContent (JSON string)

curl -XGET 'http://localhost:9200/_plugins/_ml/_train?algorithm=kmeans&index=test_data&pretty=true&ml_parameters={"paramName1":"value1","paramName2":123}' -d '
{
    "query": {
        "range" : {
            "feet": {
                "gte": 4000.0,
                "lte": 4500.0
            }
        }
    },
    "_source" : ["feet","price"]
}'

As we discussed offline, suggest to use such JSON format for input

{
  "filter_query": {
    "query": {
      "range": {
        "feet": {
          "gte": 4000,
          "lte": 4500
        }
      }
    },
    "_source": [
      "feet",
      "price"
    ]
  },
  "ml_parameters": {
    "paramName1": "value1",
    "paramName2": 123
  }
}

We can write custom logic to parse the input, if we see filter_query, we can parse it to OpenSearch query; if the filed name is ml_parameters, we can parse it to ML parameter.

Put ml_parameters in URL parameter has some cons

  1. Not scalable, if some model needs more parameters, it may exceed the URL length limit.
  2. Not so secure as the ML parameters will be visible if put in URL parameters.

Approve this PR now to unblock your testing. You can fix this later.

@spbjss spbjss merged commit 5bbf8d9 into opensearch-project:develop Aug 25, 2021
spbjss added a commit that referenced this pull request Sep 3, 2021
…id the conflict with search XContent input. (#71) (#77)

* Create JvmService instance on demand.

Signed-off-by: Alex <pengsun@amazon.com>

* Move the ml_parameters from XContent to the request parameters to avoid
the conflict with search XContent input.

Signed-off-by: Alex <pengsun@amazon.com>

Co-authored-by: Alex <pengsun@amazon.com>

Co-authored-by: Alex <pengsun@amazon.com>
jackiehanyang pushed a commit that referenced this pull request Nov 16, 2021
…id the conflict with search XContent input. (#71)

* Create JvmService instance on demand.

Signed-off-by: Alex <pengsun@amazon.com>

* Move the ml_parameters from XContent to the request parameters to avoid
the conflict with search XContent input.

Signed-off-by: Alex <pengsun@amazon.com>

Co-authored-by: Alex <pengsun@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants