Skip to content

Conversation

@mtrmac
Copy link
Collaborator

@mtrmac mtrmac commented Oct 21, 2022

When we reload the store in a reader, drop the lock, lock for writing, reload again to trigger cleanup, and then lock again for reading and proceed with the originally intended operation.

This should fix #1136 , although that’s not yet proven. (It only took 6 weeks of work to get this far…)

Built on top of #1399 and #1406 .

mtrmac added a commit to mtrmac/buildah that referenced this pull request Oct 21, 2022
> go mod edit -replace github.com/containers/storage=github.com/mtrmac/storage@cleanup-in-readers
> make vendor

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@mtrmac
Copy link
Collaborator Author

mtrmac commented Oct 21, 2022

@mtrmac mtrmac force-pushed the cleanup-in-readers branch from 4682b8d to 28cc4ec Compare October 27, 2022 07:09
Copy link
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mtrmac
Copy link
Collaborator Author

mtrmac commented Oct 28, 2022

This is still a draft: I’d like to take the time to manually set up the inconsistent state and actually verify that this fixes (the current variant of) #1136 .

@mtrmac mtrmac closed this Oct 28, 2022
@mtrmac mtrmac reopened this Oct 28, 2022
@mtrmac mtrmac force-pushed the cleanup-in-readers branch from 5f161a7 to c2a3d16 Compare November 1, 2022 01:27
mtrmac added a commit to mtrmac/buildah that referenced this pull request Nov 1, 2022
> go mod edit -replace github.com/containers/storage=github.com/mtrmac/storage@cleanup-in-readers
> make vendor

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@rhatdan
Copy link
Member

rhatdan commented Nov 4, 2022

@mtrmac mtrmac force-pushed the cleanup-in-readers branch from d7eb381 to 9c473b9 Compare November 4, 2022 18:02
mtrmac added a commit to mtrmac/buildah that referenced this pull request Nov 4, 2022
> go mod edit -replace github.com/containers/storage=github.com/mtrmac/storage@cleanup-in-readers
> make vendor

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@mtrmac
Copy link
Collaborator Author

mtrmac commented Nov 4, 2022

So this was completely ineffective (calling reloadIfChanged the second time after a change does nothing). At least based on the test case in #1136 , it seems to work now.

Waiting for another round of downstream tests : containers/buildah#4368 containers/podman#16253 ; if they pass, I’m calling this ready for review.

@mtrmac mtrmac marked this pull request as ready for review November 4, 2022 19:36
@mtrmac mtrmac force-pushed the cleanup-in-readers branch from 9c473b9 to 2d1cbd1 Compare November 8, 2022 13:19
@rhatdan
Copy link
Member

rhatdan commented Nov 8, 2022

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM

Note that I am far from being familiar with the low-level details.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@flouthoc
Copy link
Collaborator

@mtrmac containers/buildah#4368 containers/podman#16253 did not run test the most recent push on this PR

@mtrmac
Copy link
Collaborator Author

mtrmac commented Nov 10, 2022

@flouthoc There were no changes to this PR’s code, but, sure, let me rebase and re-test to be sure.

@mtrmac mtrmac force-pushed the cleanup-in-readers branch from 2d1cbd1 to c5fb207 Compare November 10, 2022 15:05
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... instead of combining control flow branches; we will change
behavior on some of them.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... as an error value instead of just a boolean.  That
will allow extending the logic to more kinds of inconsistencies,
and consolidates the specifics of the inconsistency knowledge
into a single place.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
- In read-write stores, fail even readers if there are incomplete
  layers.  Right now that would increase observed failures (OTOH
  with better consistency), but we'll fix that again soon
- Decide whether to save read-write stores strictly based on
  the need to clean up, instead of cleaning up opportunistically
  (which is less predictable).
- Correctly return the right error, depending on whether there
  are duplicate layers

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... instead of failing for duplicate names, and instead of
ignoring the "incomplete" state of layers.

Try up to 3 times if other writers are creating inconsistent
state in the meantime.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@mtrmac mtrmac force-pushed the cleanup-in-readers branch from c5fb207 to 7f6a670 Compare November 10, 2022 15:14
mtrmac added a commit to mtrmac/buildah that referenced this pull request Nov 10, 2022
> go mod edit -replace github.com/containers/storage=github.com/mtrmac/storage@cleanup-in-readers
> make vendor

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
mtrmac added a commit to mtrmac/buildah that referenced this pull request Nov 10, 2022
> go mod edit -replace github.com/containers/storage=github.com/mtrmac/storage@cleanup-in-readers
> make vendor

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
mtrmac added a commit to mtrmac/libpod that referenced this pull request Nov 10, 2022
[NO NEW TESTS NEEDED]

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@TomSweeneyRedHat
Copy link
Member

LGTM
and happy green test buttons.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Nov 10, 2022

@flouthoc Rebased tests passed again.

This PR now includes #1428 , alternatively it could be rebased on top with no changes, but the Buildah/Podman vendoring tests wouldn’t match commit IDs. I don’t care which way this PR is merged.

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash resiliency: deleting incomplete layers doesn’t reliably happen

7 participants