Skip to content

Fix bug in rule_query that resulted in errors when performing text_ex… #105311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kderusso
Copy link
Member

@kderusso kderusso commented Feb 8, 2024

Query rules has a bug where a rule_query with an underlying organic text_expansion query will fail with the reason "failed to create query: text_expansion should have been rewritten to another query type".

The reason for this is that the text_expansion query will throw this as it expects to be rewritten as a boolean query of weighted tokens. However, when we change it to a pinned query during the rewrite phase, it never gets re-written again when the pinned query tries to generate a query.

I also did some cleanup in this PR to simplify pinned documents - previously we allowed a list of String ids or Item documents specifying an index and an id. Since we now support documents with a null index that behave the same as a String id we can simplify the logic to determine what needs to be pinned.

Here's an example script to test the functionality. It requires an index pipeline set up using ELSER on a field called text.

// Index some sample data
POST /search-example/_doc/1?pipeline=search-example@ml-inference
{
  "text": "What is the weather in Jamaica?"
}
POST /search-example/_doc/2?pipeline=search-example@ml-inference
{
  "text": "Why is the sky blue?"
}
// Here we create a hidden document, with a different field. 
// This will not be returned by search results but will be returned by query rules 
POST /search-example/_doc/hidden
{
  "name": "Rule query pinned doc"
}

// Ruleset - since we did some cleanup, we'll specify two rules: one that specifies IDs and one that specifies docs.
PUT /_query_rules/rules
{
  "ruleset_id": "rules",
  "rules": [
    {
      "rule_id": "pin_hidden_id",
      "type": "pinned",
      "criteria": [
        {
          "type": "exact",
          "metadata": "foo",
          "values": [
            "bar"
          ]
        }
      ],
      "actions": {
        "ids": [
          "hidden"
        ]
      }
    },
    {
      "rule_id": "pin_hidden_doc",
      "type": "pinned",
      "criteria": [
        {
          "type": "exact",
          "metadata": "foo",
          "values": [
            "baz"
          ]
        }
      ],
      "actions": {
        "docs": [
          {
            "_index": "search-example",
            "_id": "hidden"
          }
        ]
      }
    }
  ]
}

// Simple search examples verifying the rule query works with ID matches, docs matches, and no matches
POST search-example/_search
{
  "query": {
    "rule_query": {
      "organic": {
        "query_string": {
          "query": "blue",
          "default_field": "text"
        }
      },
      "ruleset_id": "rules",
      "match_criteria": {
        "foo": "bar"
      }
    }
  }, "_source": [ "text", "name" ]
}

POST search-example/_search
{
  "query": {
    "rule_query": {
      "organic": {
        "query_string": {
          "query": "blue",
          "default_field": "text"
        }
      },
      "ruleset_id": "rules",
      "match_criteria": {
        "foo": "baz"
      }
    }
  }, "_source": [ "text", "name" ]
}

POST search-example/_search
{
  "query": {
    "rule_query": {
      "organic": {
        "query_string": {
          "query": "blue",
          "default_field": "text"
        }
      },
      "ruleset_id": "rules",
      "match_criteria": {
        "foo": "nomatch"
      }
    }
  }, "_source": [ "text", "name" ]
}

// Example showing a rule query with a text_expansion query, this previously threw an error and now works 
POST search-example/_search
{
  "query": {
    "rule_query": {
      "organic": {
        "text_expansion": {
          "ml.inference.text_expanded.predicted_value": {
            "model_id": ".elser_model_2",
            "model_text": "Why is the sky blue?",
            "boost": 1
          }
        }
      },
      "ruleset_id": "rules",
      "match_criteria": {
        "foo": "bar"
      }
    }
  }, "_source": [ "text", "name" ]
}

…pansion queries. Clean up the rule_query logic to be a bit simpler to read.
@kderusso kderusso added >bug :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team labels Feb 8, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @kderusso, I've created a changelog YAML for you.

}

@Override
public void testCacheability() throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had to be overwritten to avoid throwing indoToQuery

@kderusso kderusso marked this pull request as ready for review February 8, 2024 22:02
@kderusso kderusso requested a review from a team February 8, 2024 22:02
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ent-search-eng (Team:Enterprise Search)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment regarding the cleanup. Can you open a separate pr and focus this one on fixing the bug? That should also simplify the backport.

}

public RuleQueryBuilder(StreamInput in) throws IOException {
super(in);
organicQuery = in.readNamedWriteable(QueryBuilder.class);
matchCriteria = in.readGenericMap();
rulesetId = in.readString();
pinnedIds = in.readOptionalStringCollectionAsList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be removed without a backward compatibility check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♀️ Thanks for the catch!

if (pinnedDocsSupplier != null) {
throw new IllegalStateException("pinnedDocsSupplier must be null, can't serialize suppliers, missing a rewriteAndFetch?");
}

out.writeNamedWriteable(organicQuery);
out.writeGenericMap(matchCriteria);
out.writeString(rulesetId);
out.writeOptionalStringCollection(pinnedIds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good to me, but I will abstain from approving pending resolution of @jimczi's comments

@kderusso kderusso requested review from Mikep86 and jimczi February 9, 2024 15:37
@@ -13,23 +13,16 @@
import static org.elasticsearch.xpack.searchbusinessrules.PinnedQueryBuilder.Item;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to your changes, but do you think the name Item still fits our purposes? My guess is that when we added the pinned query, we did not think this class will get reused, so having it called Item has not an issue.
But now when I see Item referenced outside of the pinned query, I have to stop and think where that comes from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Item is nice because it's short, but PinnedDocument would probably be a better name in retrospect since Document would be too overloaded. Or PinnedDoc if we thought PinnedDocument was way too long.

I'd be open to a quick rename PR as a followup.

@kderusso
Copy link
Member Author

kderusso commented Feb 9, 2024

Closing in favor of #105365 and #105368

@kderusso kderusso closed this Feb 9, 2024
@kderusso kderusso deleted the kderusso/fix-rule-query-with-text-expansion branch July 8, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants