Skip to content

Conversation

annrpom
Copy link
Contributor

@annrpom annrpom commented Sep 24, 2025

We now persist a stable identifier to both the OPTIONS file and a file within the secondary directory (failover_identifier). If recovery finds that the secondary directory does not contain a matching identifier, we abort recovery indicating that the secondary seems incorrect / corrupt.

Fixes: #4416

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@annrpom annrpom marked this pull request as ready for review September 26, 2025 15:50
@annrpom annrpom requested a review from a team as a code owner September 26, 2025 15:50
@annrpom annrpom requested a review from xxmplus September 26, 2025 15:50
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbowens reviewed 4 of 9 files at r1, all commit messages.
Reviewable status: 4 of 9 files reviewed, 2 unresolved discussions (waiting on @xxmplus)


wal/failover_manager.go line 642 at r1 (raw file):

			o.SecondaryIdentifier = identifier
		} else {
			o.SecondaryIdentifier = existingIdentifier

instead of mutating the Options from inside the WAL manager initialization, maybe we should expose a ValidateOrInitWALDir func from the wal package that we'd use like:

walDir, err = wal.ValidateOrInitWALDir(walDir)

and we could use this during Open to validate and initialize them before initializing the WAL manager; wdyt?


options.go line 1381 at r1 (raw file):

	// write to the primary WAL stalls. The Lock field may be set during setup to
	// preacquire the lock on the secondary directory.
	Secondary wal.Dir

Sorry this is expanding the scope of the change, but I think maybe we should embed an ID field into the wal.Dir. If the user disables WAL failover and passes the directory as a WALRecoveryDir, we can again validate it.

We now persist a stable identifier to both the OPTIONS file and a file within
the secondary directory (failover_identifier). If recovery finds that the
secondary directory does not contain a matching identifier, we abort
recovery indicating that the secondary seems incorrect / corrupt.

Fixes: cockroachdb#4416
@annrpom annrpom force-pushed the validate-recovery-dir branch from a56b334 to 9f02811 Compare October 8, 2025 17:48
Copy link
Contributor Author

@annrpom annrpom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 10 files reviewed, 2 unresolved discussions (waiting on @jbowens and @xxmplus)


options.go line 1381 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Sorry this is expanding the scope of the change, but I think maybe we should embed an ID field into the wal.Dir. If the user disables WAL failover and passes the directory as a WALRecoveryDir, we can again validate it.

done i think


wal/failover_manager.go line 642 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

instead of mutating the Options from inside the WAL manager initialization, maybe we should expose a ValidateOrInitWALDir func from the wal package that we'd use like:

walDir, err = wal.ValidateOrInitWALDir(walDir)

and we could use this during Open to validate and initialize them before initializing the WAL manager; wdyt?

done

@annrpom annrpom requested a review from jbowens October 9, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wal: validate recovery directory through a stable identifier

3 participants