-
Notifications
You must be signed in to change notification settings - Fork 63
Region snapshot replacement for read-only regions #7435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reuse the region snapshot replacement machinery to replace read-only regions. This is done by storing a replacement type in the region snapshot replacement record such that either a region snapshot _or_ a read-only region can be the subject of this type of replacement. The procedure for both types is the same so all the code can be reused. A future commit will rename region snapshot replacement (and all references) to "read-only target replacement" to reflect that the machinery now applies to both region snapshots and read-only regions. This will be a mostly mechanical set of changes that can be reviewed separately with much less scrutiny. Right now manually requesting a region replacement with omdb is done through the region replacement manual request, not the region snapshot replacement manual request. This will change in that future commit to be part of a read-only target replacement request.
leftwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks pretty good, just some general questions.
I've been replacing read only regions in snapshots for about a week now, no issues found.
nexus/src/app/background/tasks/region_snapshot_replacement_garbage_collect.rs
Outdated
Show resolved
Hide resolved
addresses test flake due to slow systems being at the start of execution, then bailing out of the wait for condition, then querying for the state and seeing that it's in the middle of execution
I set every poll interval related to the background task and crucible polling to 500 milliseconds, and this is way too long for computers. Cut that by 10 to 50 milliseconds. Also, more `wait_for_all_replacements` calls to make sure replacement tasks are finished. This should address timeout test flakes in CI.
| request.volume_id, | ||
| ) | ||
| } else { | ||
| String::from("UNKNOWN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would happen if someone changed how step_invalidated was assigned and did not add a place for it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly
leftwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything in the code, and I've done 26 test runs and not seen the test flake.
I also did more than 7 sled expunges and as long as I don't have running instances on the sled I expunged, things have been working correclty.
…ents also, run all replacement tasks _before_ checking if there are any replacements left, because the replacement tasks add replacements!
Reuse the region snapshot replacement machinery to replace read-only regions. This is done by storing a replacement type in the region snapshot replacement record such that either a region snapshot or a read-only region can be the subject of this type of replacement. The procedure for both types is the same so all the code can be reused.
A future commit will rename region snapshot replacement (and all references) to "read-only target replacement" to reflect that the machinery now applies to both region snapshots and read-only regions. This will be a mostly mechanical set of changes that can be reviewed separately with much less scrutiny. Right now manually requesting a region replacement with omdb is done through the region replacement manual request, not the region snapshot replacement manual request. This will change in that future commit to be part of a read-only target replacement request.
Fixes #6172