Skip to content

Conversation

@kolyshkin
Copy link
Collaborator

@kolyshkin kolyshkin commented May 14, 2021

A backport of opencontainers/runc#2941 to rhaos-4.6, might help some BZs. Original description follows.


I hate to keep adding those kludges, but lately TestFreeze (and
TestSystemdFreeze) from libcontainer/integration fails a lot. The
failure comes and goes, and is probably this is caused by a slow host
allocated for the test, and a slow VM on top of it.

To remediate, add a small sleep on every 25th iteration in between
asking the kernel to freeze and checking its status.

In the worst case scenario (failure to freeze), this adds about 0.4 ms
(40 x 10 us) to the duration of the call.

It is hard to measure how this affects CI as GHA plays a roulette when
allocating a node to run the test on, but it seems to help. With
additional debug info, I saw somewhat frequent "frozen after 24 retries"
or "frozen after 49 retries", meaning it succeeded right after the added
sleep.

While at it, rewrite/improve the comments.

Signed-off-by: Kir Kolyshkin kolyshkin@gmail.com
(cherry picked from commit 524abc5)
Signed-off-by: Kir Kolyshkin kolyshkin@gmail.com

I hate to keep adding those kludges, but lately TestFreeze (and
TestSystemdFreeze) from libcontainer/integration fails a lot. The
failure comes and goes, and is probably this is caused by a slow host
allocated for the test, and a slow VM on top of it.

To remediate, add a small sleep on every 25th iteration in between
asking the kernel to freeze and checking its status.

In the worst case scenario (failure to freeze), this adds about 0.4 ms
(40 x 10 us) to the duration of the call.

It is hard to measure how this affects CI as GHA plays a roulette when
allocating a node to run the test on, but it seems to help. With
additional debug info, I saw somewhat frequent "frozen after 24 retries"
or "frozen after 49 retries", meaning it succeeded right after the added
sleep.

While at it, rewrite/improve the comments.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 524abc5)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@haircommander
Copy link
Collaborator

LGTM

@haircommander haircommander merged commit 23384e2 into projectatomic:rhaos-4.6 May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants