Backport test fixes from `main` to `v4.0.x` #13378

dumbbell · 2025-02-20T13:37:03Z

The fixes come from the following pull requests:

They are backported together to reduce the number of pull requests and the load on CI. Also, CI would likely fail a lot more with one of the fixes missing.

There is still work to do to fix all test flakes, but backporting these will already bring an improvement for the v4.0.x branch.

[Why] The `force_reset` command simply removes local files on disk for the local node. In the case of Ra, this can't work because the rest of the cluster does not know about the forced-reset node. Therefore the leader will continue to send `append_entry` commands to the reset node. If that forced-reset node restarts and receives these messages, it will either join the cluster again (because it's on an older Raft term) or it will hit an assertion and exit (because it's on the same Raft term). [How] Given we can't really support this scenario and it has little value, the command will now return an error if someone attemps a `force_reset` with a node running Khepri. This also deprecates the command: once Mnesia support is removed, the command will be removed at the same time. This is noted in the rabbitmqctl.8 manpage. (cherry picked from commit c78aec7)

[Why] We hit some transient errors with the previous order when doing mixed-version testing. Swapping the nodes seems to fix the problem. (cherry picked from commit 5cbda4c)

... are being used at the same time. [Why] Depending on which node clusters with which, a node running an older version of the Khepri Ra machine may not be able to apply Ra commands and could be stuck. There is no real solution and this clearly an unsupported scenario. An old node won't always be able to join a newer cluster. [How] In the testsuites, we skip clustering tests if we detect that multiple Khepri Ra machine versions are being used. (cherry picked from commit 1f1a135)

[Why] During mixed-version testing, the old node might not be able to join or rejoin a cluster if the other nodes run a newer Khepri machine version. [How] The old node is used as the cluster seed node and is never touched otherwise. Other nodes are restarted or join the cluster later. (cherry picked from commit e76233a)

… with Khepri [Why] This test plays with the Mnesia database explicitly. (cherry picked from commit f088c4f)

[Why] We see nodes trying to use busy ports in CI from time to time. (cherry picked from commit e76c227)

... in retry_if_coordinator_unavailable(). (cherry picked from commit ee0b5b5)

(cherry picked from commit b7c9e64)

(cherry picked from commit 64b68e5)

This may help debug nodes that try to open busy ports. (cherry picked from commit a5f30ea)

dumbbell self-assigned this Feb 20, 2025

mergify bot added bazel make labels Feb 20, 2025

dumbbell changed the base branch from main to v4.0.x February 20, 2025 13:45

dumbbell force-pushed the backport-test-fixes-from-main branch 2 times, most recently from 63f1d8a to 5e1538b Compare February 20, 2025 14:52

dumbbell added 10 commits February 20, 2025 20:14

rabbit_stream_queue_SUITE: Swap uses of node 2 and 3 in format

d1a1f97

[Why] We hit some transient errors with the previous order when doing mixed-version testing. Swapping the nodes seems to fix the problem. (cherry picked from commit 5cbda4c)

clustering_management_SUITE: Skip start_with_invalid_schema_in_path…

d32de91

… with Khepri [Why] This test plays with the Mnesia database explicitly. (cherry picked from commit f088c4f)

Increase the TCP ports range used by parallel-ct-set-*

c4eb581

[Why] We see nodes trying to use busy ports in CI from time to time. (cherry picked from commit e76c227)

rabbit_stream_queue_SUITE: Fix recursion issue

7f6a797

... in retry_if_coordinator_unavailable(). (cherry picked from commit ee0b5b5)

amqp_auth_SUITE: Handle error in init_per_group/2

a1f918a

(cherry picked from commit b7c9e64)

unit_credit_flow_SUITE: Greatly reduce time trap

3c0d892

(cherry picked from commit 64b68e5)

GitHub workflows: List open TCP ports

02c7b04

This may help debug nodes that try to open busy ports. (cherry picked from commit a5f30ea)

dumbbell force-pushed the backport-test-fixes-from-main branch from bfa8721 to 02c7b04 Compare February 20, 2025 19:15

dumbbell marked this pull request as ready for review February 20, 2025 21:27

dumbbell merged commit 6955665 into v4.0.x Feb 20, 2025
270 checks passed

dumbbell deleted the backport-test-fixes-from-main branch February 20, 2025 21:27

dumbbell added this to the 4.0.7 milestone Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backport test fixes from `main` to `v4.0.x` #13378

Backport test fixes from `main` to `v4.0.x` #13378

Uh oh!

dumbbell commented Feb 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Backport test fixes from main to v4.0.x #13378

Backport test fixes from main to v4.0.x #13378

Uh oh!

Conversation

dumbbell commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Backport test fixes from `main` to `v4.0.x` #13378

Backport test fixes from `main` to `v4.0.x` #13378

dumbbell commented Feb 20, 2025 •

edited

Loading