-
Notifications
You must be signed in to change notification settings - Fork 267
Description
Consider this sequential sequence of events, with the overlay graph driver.
- The user initiates pull of an image which contains 2 layers, parentLayer and childLayer
- While creating parentLayer, the WIP layer object is recorded in
layers.jsonwithincompleteFlag. - Afterwards, during
ApplyDiff, the pull process is forcibly killed (so that it can’t do its own cleanup). - Result:
layers.jsoncontain a record of the layer, withincompleteFlag; the overlay graph driver contains an incomplete/inconsistent layer, but a$parentLayer/linkfile and al/$linksymbolic link exist. This is all as expected.
- The user initiates a pull of the same image again.
- (Just like the first time), the pull first checks for pre-existing layers in storage, via
Store.Layer(parentLayer). This locks thelayerStoreread-only first. Thus, the firstlayerStore.ReloadIfChangeddoes trigger alayerStore.Load(), but that does not clean up incomplete layers. ButlayerStore.lockFile.lwwas updated to match the lock file contents. - Consequently, the record of the incomplete layer continues to exist, and
Store.Layerreports thatparentLayerexists. - Pull proceeds, assuming that
parentLayerexists, and starts creatingchildLayer. - While creating
childLayer, thelayerStoreis locked read-write, but because nothing has changed on disk andlayerStore.lockFile.lwmatches (within the same process),layerStore.ReloadIfChangeddoes nothing, and does not enterlayerStore.Load()and the “delete incomplete layers” code is not reached. Consequently,parentLayercontinues to exist in incomplete state. - This allows creation of
childLayerto succeed.$childLayer/loweris created, and includes the short link fromparentLayer/link. - Result: The whole pull is reported as successful. The image, though, contains an incomplete layer, with incomplete/inconsistent contents.
- Next, the user does something that doesn’t start with a read-only lock of
layerStore. That finally triggerslayerStore.Loadto delete incomplete layers — and nowparentLayeris deleted, resulting in a broken parent link fromchildLayertoparentLayer. - For example,
podman run theSameImageworks for this purpose. That deletes the layer and fails withError: layer not known(with a currently unclear call stack).
- One more
podman run theSameImagecauses the missing layer to be noticed, with
ERRO[0000] Image theSameImage exists in local storage but may be corrupted: layer not known
- … and that triggers a re-pull.
- This re-pull correctly detects that
parentLayeris missing, and creates it afresh, with a new$parentLayer/linkvalue. - But,
childLayeris not missing, and the previous one is just reused.$childLayer/lowercontinues to contain the old$parentLayer/linkvalue. - Finally, when trying to actually use
childLayer, this manifests in
WARN[0093] Can't read link "/var/lib/containers/storage/overlay/l/UDGNJ5CR2MQ2QQDGYYK2W4WCBR" because it does not exist. A storage corruption might have occurred, attempting to recreate the missing symlinks. It might be best wipe the storage to avoid further errors due to storage corruption.
Error: readlink /var/lib/containers/storage/overlay/l/UDGNJ5CR2MQ2QQDGYYK2W4WCBR: no such file or directory
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels