Skip to content

Improve error message for search.check_ccs_compatibility=true mode flag #97059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 28, 2023

Conversation

quux00
Copy link
Contributor

@quux00 quux00 commented Jun 23, 2023

The code to have the better error message is already present. But the current tests are not set up in a way to reveal that. This commit adjusts the testing FailBeforeCurrentQueryBuilder to demonstrate that the useful error messages are present.

To show this via manual testing, I changed the MatchAllQueryBuilder to return a TransportVersion in the future from its getMinimalSupportedVersion method.

When I start elasticsearch with -E search.check_ccs_compatibility=true and do a match_all query, the response is:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "[class org.elasticsearch.action.search.SearchRequest] is not compatible with version 8500020 and the 'search.check_ccs_compatibility' setting is enabled."
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "[class org.elasticsearch.action.search.SearchRequest] is not compatible with version 8500020 and the 'search.check_ccs_compatibility' setting is enabled.",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "[match_all] was released first in version 8511132, failed compatibility check trying to send it to node with version 8500020"
    }
  },
  "status": 400
}

The reason in the cause_by section shows which query failed and why (which version it was introduced in) compared to the version being used by the cluster being queried.

Closes #85487

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.10.0 labels Jun 23, 2023
@quux00 quux00 requested review from javanna and cbuescher June 23, 2023 21:30
@quux00 quux00 added >non-issue :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team labels Jun 23, 2023
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jun 23, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@quux00
Copy link
Contributor Author

quux00 commented Jun 23, 2023

One problem I see with the new Transport Versions and this error message is that it has become opaque to users what ES release version that maps to.

Possible solutions:

  1. provide a mapping of TransportVersion's to ES release versions (either in the error message or link to public docs with that info?)
  2. in the GET _remote/info provide information about what TransportVersion the remote clusters are using.

Right now, it just reports:

{
  "remote2": {
    "connected": true,
    "mode": "sniff",
    "seeds": [
      "127.0.0.1:9302"
    ],
    "num_nodes_connected": 1,
    "max_connections_per_cluster": 3,
    "initial_connect_timeout": "30s",
    "skip_unavailable": true
  },
  "remote1": {
    "connected": true,
    "mode": "sniff",
    "seeds": [
      "127.0.0.1:9301"
    ],
    "num_nodes_connected": 1,
    "max_connections_per_cluster": 3,
    "initial_connect_timeout": "30s",
    "skip_unavailable": false
  }
}

@cbuescher cbuescher self-assigned this Jun 26, 2023
@cbuescher
Copy link
Member

problem I see with the new Transport Versions and this error message

I'm not concerned about that. The ccs flag itself is mostly meant for internal, pre-release usage anyway, and the details in the error mostly for ourselves getting more information for debugging. The extension to GET _remote/info might be a useful thing for error analysis anyway, probably a separate PR though.

ex.getCause().getMessage()
);

String expectedCause = "[fail_before_current_version] was released first in version XXXXXXX, failed compatibility check trying to"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get the failing "future" version from the FailBeforeCurrentVersionQuery test query builder? I guess getting the failing node version is more difficult, so maybe no way to get around the string replace operation here? Would be nice to avoid it but not a blocker for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this in a new commit.

Doing that forced me to realize there is one minor problem here.

The error message has the format:

[<query_type>] was released first in version X, failed compatibility check trying to send it to node with version Y

In production settings, the version "Y" will be correct - the actual TransportVersion being used.

But for the check_ccs_compatibility=true condition, it won't actually report current TransportVersion - it will report TransportVersion.MINIMUM_CCS_VERSION, because that's what we pass into the TransportSearchHelper check method here: https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/search/TransportSearchHelper.java#L108

It is correct to pass in TransportVersion.MINIMUM_CCS_VERSION to do the minimal check, but the error message is then wrong - we aren't checking current version, we are checking the previous release version. So that error message could be confusing to kibana testers.

But we can't change the error message to say "minimal version" because in production it reports actual version.

So I just left it as is.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, I left a small question around the tests but nothing that should block from my side.

quux00 added 2 commits June 27, 2023 12:37
The code to have the better error message is already present.
But the current tests are not set up in a way to reveal that.
This adjusts the testing FailBeforeCurrentQueryBuilder to demonstrate it.

To show this via manual testing, I changed the MatchAllQueryBuilder to return
a TransportVersion in the future from its getMinimalSupportedVersion method.

When I start elasticsearch with -E search.check_ccs_compatibility=true and
do a match_all query, the response is:

```
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "[class org.elasticsearch.action.search.SearchRequest] is not compatible with version 8500020 and the 'search.check_ccs_compatibility' setting is enabled. YY"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "[class org.elasticsearch.action.search.SearchRequest] is not compatible with version 8500020 and the 'search.check_ccs_compatibility' setting is enabled. YY",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "[match_all] was released first in version 8511132, failed compatibility check trying to send it to node with version 8500020"
    }
  },
  "status": 400
}
```

The underlying caused_by shows the exact query (match_all) that was incompatible.
… expected error message, including Transport versions
@quux00 quux00 force-pushed the ccs/improve-force-fail-error-message branch from b8da0a5 to 0b45cd5 Compare June 27, 2023 17:58
@quux00 quux00 merged commit 84c295c into elastic:main Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error message for ccs_force_fail mode flag
4 participants