Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Adapt test_database_no_disk_space() to newer libraft versions #462

Merged
merged 1 commit into from
Feb 2, 2024

Conversation

freeekanayaka
Copy link
Contributor

Starting from raft 0.21.0 (which will be released shortly), errors due to lack of disk space will be correctly reported not only in single-node situations, but also in case enough voting nodes run out of disk space so that no further entry can get committed until at least one of them recovers (currently the client would hit a timeout error, with now way for it to know what happened: the command might also eventually succeed far later in the future if disk space is recovered at a later point in time).

Although this is an improvement, it has the slight downside that for single-node situations there will be a short lag before the node actually notices that disk space has recovered and new entries can be committed again. Special-casing the single node situation would be tricky and I'd prefer to avoid that.

This lag is currently about 5 seconds by default, but it could be lowered to any amount if desired. Basically, every 5 seconds the raft engine will retry to allocate space for new entries, so decreasing the lag will increase the retry frequency.

If 5 seconds seems too convervative, it could be perhaps be lowered to 1 second?

In any case, the change in this PR will be needed, since there's no guarantee that running:

rm "${BIG_FILE}"

and then immediately running:

incus config set c "user.prop${i}" - < "${DATA}";

will succeed, so a small retry loop is needed.

@stgraber
Copy link
Member

stgraber commented Feb 2, 2024

5s seems quite reasonable to me

@freeekanayaka
Copy link
Contributor Author

BTW, since libraft is now going to be able to inform consuming code of the amount of reserved disk space that a certain node still has (i.e. the amount of space available in successfully created open segments), cowsql/go-cowsql could be modified so that if a voting node is running out of disk space, the engine will try to transfer its voting rights to another node.

@stgraber
Copy link
Member

stgraber commented Feb 2, 2024

Ah that would be good. If the consumer (incus) can be notified somehow, we'd also be able to issue a warning through our warnings API.

@freeekanayaka
Copy link
Contributor Author

Ah that would be good. If the consumer (incus) can be notified somehow, we'd also be able to issue a warning through our warnings API.

Yes, we can surface this info all the way up to Incus.

Starting from raft 0.21.0 (which will be released shortly), errors due to lack
of disk space will be correctly reported not only in single-node situations, but
also in case enough voting nodes run out of disk space so that no further entry
can get committed until at least one of them recovers (currently the client
would hit a timeout error, with now way for it to know what happened: the
command might also eventually succeed far later in the future if disk space is
recovered at a later point in time).

Although this is an improvement, it has the slight downside that for single-node
situations there will be a short lag before the node actually notices that disk
space has recovered and new entries can be committed again. Special-casing
the single node situation would be tricky and I'd prefer to avoid that.

This lag is currently about 5 seconds by default, but it could be lowered to any
amount if desired. Basically, every 5 seconds the raft engine will retry to
allocate space for new entries, so decreasing the lag will increase the retry
frequency.

If 5 seconds seems too convervative, it could be perhaps be lowered to 1 second?

In any case, the change in this PR will be needed, since there's no guarantee
that running:

rm "${BIG_FILE}"

and then **immediately** running:

incus config set c "user.prop${i}" - < "${DATA}";

will succeed, so a small retry loop is needed.

Signed-off-by: Free Ekanayaka <free@ekanayaka.io>
@stgraber stgraber merged commit 2cb70c7 into lxc:main Feb 2, 2024
25 checks passed
@freeekanayaka freeekanayaka deleted the tweak-database-no-space-test branch February 2, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants