Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] action pre_process_function does not appear to do anything at all #2986

Closed
svdasein opened this issue Sep 25, 2024 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@svdasein
Copy link

svdasein commented Sep 25, 2024

What is the bug?
It seems that regardless of what I say in my model definition's action section regarding pre_process_function, the effect is the same - no effect at all. In fact I have been unable to get anything to error out when I assign garbage to the property; it is as if it's simply not being used.

How can one reproduce the bug?

  1. Set up an ollama based instance so you have an external model to talk to
  2. pull the nomic-embed-text model
  3. In dev tools do this:
POST /_plugins/_ml/models/_register
{
  "name": "nomic-embed-text",
  "function_name": "remote",
  "description": "Embedding model on myhost under Ollama",
  "connector": {
    "name": "myhostllama",
    "description": "Ollama on MyHost",
    "version": "1.0",
    "protocol": "http",
    "credential": {},
    "parameters": {

      "input_type": "text_docs",
      "model": "nomic-embed-text"
    },
    "actions": [
      {
        "action_type": "predict",
        "method": "POST",
        "headers": {
          "content-type": "application/json"
        },
        "url": "http://myhost:11434/api/embed",
        "request_body": """{"model": "${parameters.model}", "input": "${parameters.input}" }""",
        "pre_process_function": "connector.pre_process.idon'texist.embedding"
      }
    ]
  }
}

(You will note that it happily takes this input and registers the model despite the pre_process_function being meaningless)
image

  1. Now try some queries:
    a)
POST /_plugins/_ml/models/<modelid>/_predict
{
  "parameters": {
    "input": "scary swedish cult"
  }
}

--- This works for me but only because nothing is trying to coerce the input string into an array
b)

POST /_plugins/_ml/models/<modelid>/_predict
{
  "parameters": {
    "input": 
      ["scary swedish cult"]
  }
}

--- This does not work for me - once it's in array format the framework complains that it has generated invalid json
c)

GET /rb-movies/_search
{
  "_source": {
    "excludes": [
      "overviewVector"
    ]
  },
  "query": {
    "bool": {
      "should": [
        {
          "neural": {
            "overviewVector": {
              "query_text": "scary swedish cult",
              "model_id": "<modelid>"",
              "k": 100
            }
          }
        }
      ]
    }
  }
}

--- This does not work for me - once it's in array format the framework complains that it has generated invalid json

What is the expected behavior?
I expect a few things to be different:

  1. If I submit garbage when I define a model, I expect that I'd get some kind of message indicating that I have done so
  2. I would not expect definition with garbage to actually be accepted and installed
  3. When I execute a query I expect that the framework is going to attempt to run my input through the specified function such that it produces altered (valid) json

What is your host/environment?

  • OS: Ubuntu 22.04.2 LTS
  • Version: opensearch 2.17.0
  • Plugins: ml

Do you have any screenshots?

Here is what I see on the receiving end when I run 5)a)
image

image

When I run one of the queries that attempts to coerce an array, the transaction never actually leaves opensearch; instead I see this:
image

Do you have any additional context?
Note that I have been attempting to fix this for more than a day, not realizing until just now that regardless of what I try in the pre_process_function nothing works because nothing appears to even be looking at or referring to that function.

Once I figured that out I tried deleting and re-registering the model - no difference. I have tried quite a large number of things over the last day or day and a half - I've gotten two types of results from this:

  1. The ${parameters.whatIputHere} macro does not expand because whatIputHere isn't recognized by the framework - so it just includes the literal string "${parameters.whatIputHere}" in my query (as in you can see that string in the packet capture on the model side)
  2. It blows up with an exception in dev tools and never actually attempts to do the transaction over the network because of that.
@svdasein svdasein added bug Something isn't working untriaged labels Sep 25, 2024
@ylwu-amzn
Copy link
Collaborator

ylwu-amzn commented Sep 26, 2024

Had a call with Dave via Slack Huddle.

If the model input is array of string, should not add " in connector request_body. Now the connector has "request_body": """{"model": "${parameters.model}", "input": "${parameters.input}" }""" . If the input is array of string, should change to "input": ${parameters.input} } by removing the " . Only wrap ${parameters.input} with " if the parameter is String type.

Explained the predict request process:

  1. Parse predict parameters into json string first. Then substitute the ${parameter.xx} part. The parameter name in predict request should match the name defined in request_body template placeholder ${parameter.. For example you can define "request_body": """{"model": "${parameters.model}", "input": ${parameters.my_input} }""". Then use such predict parameters
POST /_plugins/_ml/models/<modelid>/_predict
{
  "parameters": {
    "my_input":  ["scary swedish cult"]
  }
}
  1. Check if the request body after substitution is valid json. If not, throw exception. If yes, run the pre_process_function
  2. Send the substituted and pre_processed request body to model.
  3. If post_process_function configured, process the model response and return to the client side

Suggestions from Dave: we should enhance the validation. Document not clear, so thought the pre process function will be invoked first.

@ylwu-amzn
Copy link
Collaborator

Create another issue for enhancing validation part #2993

@svdasein
Copy link
Author

Removing the quotation marks in the request body string around the parameters.input expansion solved the problem
image
Also - note: for models under ollama the cohere post processing logic correctly parses the response. Closing - thanks for the help!

@svdasein
Copy link
Author

(closing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants