Skip to content

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jul 29, 2021

As reported in issue #3119, there is a race in nsexec logging
that can lead to garbled json received by log forwarder, which
complains about it with a "failed to decode" error.

This happens because dprintf (used since the very beginning of nsexec
logging introduced in commit ba3cabf) relies on multiple write(2)
calls, and with additional logging added by 64bb59f (appeared in rc94)
a race is possible between runc init parent and its children.

The fix is to prepare a string and write it using a single call to
write(2).

Fixes: #3119

Practically, though, debug logging is not needed in 99.9% of all cases,
and so PR #3114 (commit b1bc762) eliminates the possibility
of it having this race in the first place. Yet I think this fix makes sense.

1.0 backport: #3130

@kolyshkin
Copy link
Contributor Author

I think this is a candidate for backporting to 1.0 -- while the race is rare, the consequences are fatal.

@kolyshkin kolyshkin added this to the 1.1.0 milestone Aug 2, 2021
@kolyshkin kolyshkin added the backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 label Aug 2, 2021
@kolyshkin kolyshkin changed the title libct/nsenter: fix logging race in nsexec libct/nsenter: fix logging race in nsexec (regression in rc94) Aug 4, 2021
As reported in issue 3119, there is a race in nsexec logging
that can lead to garbled json received by log forwarder, which
complains about it with a "failed to decode" error.

This happens because dprintf (used since the very beginning of nsexec
logging introduced in commit ba3cabf) relies on multiple write(2)
calls, and with additional logging added by 64bb59f a race is
possible between runc init parent and its children.

The fix is to prepare a string and write it using a single call to
write(2).

[v2: NULLify json on error from asprintf]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin mentioned this pull request Aug 11, 2021
@mrunalp mrunalp merged commit b114862 into opencontainers:master Aug 11, 2021
@kolyshkin kolyshkin added backport/1.0-done A PR in main branch which has been backported to release-1.0 and removed backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 labels Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/logging area/nsenter backport/1.0-done A PR in main branch which has been backported to release-1.0 kind/bug regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nsexec logging race

3 participants