test: Adapt test_database_no_disk_space() to newer libraft versions #462

freeekanayaka · 2024-02-02T14:58:26Z

Starting from raft 0.21.0 (which will be released shortly), errors due to lack of disk space will be correctly reported not only in single-node situations, but also in case enough voting nodes run out of disk space so that no further entry can get committed until at least one of them recovers (currently the client would hit a timeout error, with now way for it to know what happened: the command might also eventually succeed far later in the future if disk space is recovered at a later point in time).

Although this is an improvement, it has the slight downside that for single-node situations there will be a short lag before the node actually notices that disk space has recovered and new entries can be committed again. Special-casing the single node situation would be tricky and I'd prefer to avoid that.

This lag is currently about 5 seconds by default, but it could be lowered to any amount if desired. Basically, every 5 seconds the raft engine will retry to allocate space for new entries, so decreasing the lag will increase the retry frequency.

If 5 seconds seems too convervative, it could be perhaps be lowered to 1 second?

In any case, the change in this PR will be needed, since there's no guarantee that running:

rm "${BIG_FILE}"

and then immediately running:

incus config set c "user.prop${i}" - < "${DATA}";

will succeed, so a small retry loop is needed.

stgraber · 2024-02-02T15:05:13Z

5s seems quite reasonable to me

freeekanayaka · 2024-02-02T15:06:20Z

BTW, since libraft is now going to be able to inform consuming code of the amount of reserved disk space that a certain node still has (i.e. the amount of space available in successfully created open segments), cowsql/go-cowsql could be modified so that if a voting node is running out of disk space, the engine will try to transfer its voting rights to another node.

stgraber · 2024-02-02T15:07:52Z

Ah that would be good. If the consumer (incus) can be notified somehow, we'd also be able to issue a warning through our warnings API.

freeekanayaka · 2024-02-02T15:30:35Z

Ah that would be good. If the consumer (incus) can be notified somehow, we'd also be able to issue a warning through our warnings API.

Yes, we can surface this info all the way up to Incus.

Starting from raft 0.21.0 (which will be released shortly), errors due to lack of disk space will be correctly reported not only in single-node situations, but also in case enough voting nodes run out of disk space so that no further entry can get committed until at least one of them recovers (currently the client would hit a timeout error, with now way for it to know what happened: the command might also eventually succeed far later in the future if disk space is recovered at a later point in time). Although this is an improvement, it has the slight downside that for single-node situations there will be a short lag before the node actually notices that disk space has recovered and new entries can be committed again. Special-casing the single node situation would be tricky and I'd prefer to avoid that. This lag is currently about 5 seconds by default, but it could be lowered to any amount if desired. Basically, every 5 seconds the raft engine will retry to allocate space for new entries, so decreasing the lag will increase the retry frequency. If 5 seconds seems too convervative, it could be perhaps be lowered to 1 second? In any case, the change in this PR will be needed, since there's no guarantee that running: rm "${BIG_FILE}" and then **immediately** running: incus config set c "user.prop${i}" - < "${DATA}"; will succeed, so a small retry loop is needed. Signed-off-by: Free Ekanayaka <free@ekanayaka.io>

freeekanayaka requested a review from stgraber as a code owner February 2, 2024 14:58

freeekanayaka force-pushed the tweak-database-no-space-test branch from 8d52fe5 to 6c5b52a Compare February 2, 2024 15:36

stgraber approved these changes Feb 2, 2024

View reviewed changes

stgraber merged commit 2cb70c7 into lxc:main Feb 2, 2024

freeekanayaka deleted the tweak-database-no-space-test branch February 2, 2024 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

test: Adapt test_database_no_disk_space() to newer libraft versions #462

test: Adapt test_database_no_disk_space() to newer libraft versions #462

Uh oh!

freeekanayaka commented Feb 2, 2024

Uh oh!

stgraber commented Feb 2, 2024

Uh oh!

freeekanayaka commented Feb 2, 2024

Uh oh!

stgraber commented Feb 2, 2024

Uh oh!

freeekanayaka commented Feb 2, 2024

Uh oh!

Uh oh!

Uh oh!

test: Adapt test_database_no_disk_space() to newer libraft versions #462

test: Adapt test_database_no_disk_space() to newer libraft versions #462

Uh oh!

Conversation

freeekanayaka commented Feb 2, 2024

Uh oh!

stgraber commented Feb 2, 2024

Uh oh!

freeekanayaka commented Feb 2, 2024

Uh oh!

stgraber commented Feb 2, 2024

Uh oh!

freeekanayaka commented Feb 2, 2024

Uh oh!

Uh oh!