Skip to content

lockfile.Locked() must not be used to make any decisions #1388

@mtrmac

Description

@mtrmac

Looking into how #1324 crashed:

github.com/containers/storage/pkg/lockfile.(*lockfile).Touch
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/pkg/lockfile/lockfile_unix.go:247 +0x198

failed because layerStore.lockfile was not held.

But we got there via

github.com/containers/storage.(*layerStore).saveLayers
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/layers.go:498 +0x1d4
github.com/containers/storage.(*layerStore).Load
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/layers.go:422 +0x534

which can only happen if layerStore.lockfile.Locked()!

My guess at what has happened (apart from the mystery of graphLock, #1324 (comment) ):

  • There was a layerStore instance for the path in question.
  • Somehow, we got another layerStore instance. Most likely, store.graphLock.TouchedSince(…) triggered a re-load, which abandons the current store.layerStore and created a new instance, without writing for the old one to go quiescent. That’s should be fine, the code is supposed to be reliable against out-of-process writers, so two independent in-process writers is easier.
  • The two layerStore instances get layerStore.lockfile via GetLockfile(path), and both receive the same pkg/lockfile.lockfile instance. That’s as expected, and fine.
  • But now, Locked() is no longer useful! It shows that one of the two owners of that lock object has locked it for writing, but it does not say that it’s the current one.

So, with A and B, both living in the same process, owning the same lock, this becomes possible:

  • A locks the lock
  • B checks .Locked() and thinks that B owns it
  • B starts to make destructive changes
  • At this point A and B are racing and creating invalid state, although out-of-process readers and writers are still excluded
  • A can unlock the lock
  • B continues to make destructive changes, racing against other processes.

AFAICS Locked is fundamentally not useful in Go (at least not without a concept of a goroutine ID, which is intentionally not available).

So, following the precedent of #1376 , I plan to break the API and remove that method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions