Skip to content

Fix Response for version conflict with conflicts=proceed#149404

Open
joshua-adams-1 wants to merge 1 commit into
elastic:mainfrom
joshua-adams-1:tf-delete-by-query
Open

Fix Response for version conflict with conflicts=proceed#149404
joshua-adams-1 wants to merge 1 commit into
elastic:mainfrom
joshua-adams-1:tf-delete-by-query

Conversation

@joshua-adams-1
Copy link
Copy Markdown
Contributor

Fixes the YAML test x-pack:qa:multi-project:core-rest-tests-with-multiple-projects:yamlRestTest org.elasticsearch.multiproject.test.CoreWithMultipleProjectsClientYamlTestSuiteIT.test {yaml=delete_by_query/10_basic/Response for version conflict with conflicts=proceed}

Closes: #145253

Fixes the YAML test x-pack:qa:multi-project:core-rest-tests-with-multiple-projects:yamlRestTest org.elasticsearch.multiproject.test.CoreWithMultipleProjectsClientYamlTestSuiteIT.test {yaml=delete_by_query/10_basic/Response for version conflict with conflicts=proceed}

Closes: elastic#145253
@joshua-adams-1 joshua-adams-1 self-assigned this May 19, 2026
@joshua-adams-1 joshua-adams-1 added >test Issues or PRs that are addressing/adding tests :Distributed/Reindex Issues relating to reindex that are not caused by issues further down labels May 19, 2026
@joshua-adams-1
Copy link
Copy Markdown
Contributor Author

As a note to the reviewer, more information can be found in #145253.

The test runs as so:

  1. Creates an index, indexes a single document, and refreshes this index so that the document is available to search. This gives the document a seq_no == 0.
  2. Indexes a second document with the same id as the first, so that the internal seq_no is updated to 1. Does not refresh so that search does not find this document.
  3. Attempts to do a delete by query for the singular doc. The delete by query will use scroll search to get the document, but since we did not refresh after the second index, the document with seq_no == 0 is returned. This causes a version conflict (which the test expects) and we assert on this.

The flakiness is happening within the CoreWithMultipleProjectsClientYamlTestSuiteIT test runner. This test runner uses two nodes within the cluster. Other runners such as ReindexClientYamlTestSuiteIT are succeeding, where the cluster topology is a single node. I suspect that somehow using >1 nodes is causing either:

  1. Delays
  2. A refresh under the hood

which is causing the DBQ to succeed where we expect it to fail.

My proposed solution is to use a single shard with no replicas, to remove the flakiness where a live replica shard is created which is updated with the latest indexed document before DBQ has a chance to run. This should "simulate" running on a single node which we knows works. As a note, other tests such as '^delete_by_query/50_wait_for_active_shards/can override wait_for_active_shards', have been disabled from running on multi node clusters since they assume the test runs on a single node. My test fix uses the same principle but avoids dropping the test coverage (which was proposed by #149381)

@joshua-adams-1 joshua-adams-1 marked this pull request as ready for review May 19, 2026 15:24
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label May 19, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@joshua-adams-1 joshua-adams-1 enabled auto-merge (squash) May 19, 2026 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Reindex Issues relating to reindex that are not caused by issues further down Team:Distributed Meta label for distributed team. >test Issues or PRs that are addressing/adding tests v9.5.0

Projects

None yet

2 participants