Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workround for podman readlink error #1905

Merged
merged 4 commits into from
Apr 20, 2022
Merged

workround for podman readlink error #1905

merged 4 commits into from
Apr 20, 2022

Conversation

luluz66
Copy link
Contributor

@luluz66 luluz66 commented Apr 20, 2022

When we detect that podman failed with the storage corruption warning
and readlink no such file error, we will remove the image and set the
command result to unavailable error to trigger a retry.

When we detect that podman failed with the storage corruption warning
and readlink no such file error, we will remove the image and set the
command result to unavailable error to trigger a retry.
@luluz66 luluz66 force-pushed the podman-readlink-error branch from 95ca7b5 to 9cd41b6 Compare April 20, 2022 01:14
@luluz66 luluz66 marked this pull request as ready for review April 20, 2022 01:15
@luluz66 luluz66 changed the title podman readlink error workround for podman readlink error Apr 20, 2022
Comment on lines 148 to 151
err = removeImage(ctx, c.image)
if err != nil {
log.Warningf("Failed to remove corrupted image: %s", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can inline:

if err := removeImage(ctx, c.image); err != nil {
log.Warningf("Failed to remove corrupted image: %s", err)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -141,6 +143,13 @@ func (c *podmanCommandContainer) Run(ctx context.Context, command *repb.Command,
podmanRunArgs = append(podmanRunArgs, c.image)
podmanRunArgs = append(podmanRunArgs, command.Arguments...)
result = runPodman(ctx, "run", nil, nil, podmanRunArgs...)
if isStorageCorrupted(result) {
result.Error = status.UnavailableError("podman run failed because of storage corruption")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about just "a storage corruption occurred"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 176 to 182
if isStorageCorrupted(createResult) {
createResult.Error = status.UnavailableError("podman create failed because of storage corruption")
err = removeImage(ctx, c.image)
if err != nil {
log.Warningf("Failed to remove corrupted image: %s", err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you move this to a new helper function you can use in both places?

func maybeCleanupCorruptedImages()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -164,6 +173,13 @@ func (c *podmanCommandContainer) Create(ctx context.Context, workDir string) err
if err = createResult.Error; err != nil {
return status.UnavailableErrorf("failed to create container: %s", err)
}
if isStorageCorrupted(createResult) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to happen before the check on line 173?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

Copy link
Member

@bduffany bduffany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit

if !storageErrorRegex.MatchString(string(result.Stderr)) {
return nil
}
result.Error = status.UnavailableError("a storage corruption occurred for podman")
Copy link
Member

@bduffany bduffany Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set result.ExitCode = commandutil.NoExitCode here -- the convention for CommandResult is that ExitCode>=0 if and only if Error==nil, since an Error means that the process itself did not run normally and return a valid exit code in the range 0-255. In this case podman is the thing returning the exit code, and podman is sort of an implementation detail that we shouldn't expose in the Run()/Exec() API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the explanation!

@luluz66 luluz66 merged commit b0b586c into master Apr 20, 2022
@luluz66 luluz66 deleted the podman-readlink-error branch April 20, 2022 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants