-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon, snapstate: move ensureCore from daemon/api.go into snapstate.go #3727
Conversation
This is prereq work for auto-installing missing base snapd and missing default-provider for content snaps.
overlord/managers_test.go
Outdated
"timestamp": time.Now().Format(time.RFC3339), | ||
} | ||
headers["snap-id"] = fakeSnapID(headers["snap-name"].(string)) | ||
a, err := ms.storeSigning.RootSigning.Sign(asserts.SnapDeclarationType, headers, nil, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ms.storeSigning is enough . RootSigning should be rarely needed (mostly to sign main keys)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but when I just use the "storeSigning" I get an error like this:
FAIL: managers_test.go:105: mgrsSuite.SetUpTest
managers_test.go:174:
c.Assert(err, IsNil)
... value *errors.errorString = &errors.errorString{s:"no matching public key \"XcSmSI47H8EdldbP1BMKBUXs_5Jr0Ck2wSecjcExc5KMEAzsfZocbpaoSvQAW7_9\" for signature by \"can0nical\""} ("no matching public key \"XcSmSI47H8EdldbP1BMKBUXs_5Jr0Ck2wSecjcExc5KMEAzsfZocbpaoSvQAW7_9\" for signature by \"can0nical\"")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually that means the test is missing to add the store key in the local database, I would need to look more carefully. But SnapDeclarations are not signed by the root key so it's better to keep things realistic and use the store key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this:
err := assertstate.Add(st, ms.storeSigning.StoreAccountKey(""))
c.Assert(err, IsNil)
overlord/snapstate/snapstate.go
Outdated
return nil, err | ||
} | ||
// some other change aleady installs core | ||
if snapsup.Name() == defaultCoreSnapName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is problematic because we could end up with this snap but without core if that other core install failed, or try to run things too early in this install that might need core to be there. I fear we need to give up instead for now or construct a cross change dependency which we never tried
overlord/snapstate/snapstate.go
Outdated
} | ||
|
||
// check that core is not installed already | ||
_, err := CoreInfo(st) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be done before the check over all Changes
overlord/snapstate/snapstate.go
Outdated
ts := state.NewTaskSet() | ||
if flags&doNotAutoInstallMissingBases == 0 { | ||
var err error | ||
ts, err = ensureCore(st, snapsup.Name(), snapsup.UserID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too simple, because during an InstallMany or UpdateMany we will get here many times but the change doesn't exist yet, so will create many tasks to install core unless I'm confused
Closing for now as this needs some re-thinking for {Install,Update}Many. |
This will automatically create Install tasks to fetch prerequisites.
Reopen for more early feedback from Samuele (and everyone else who is interessted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial comments
overlord/snapstate/handlers.go
Outdated
// the state lock is not enough to ensure this is not run in | ||
// parallel as it is dropped in snapstate.Install() below | ||
m.prerequisitesLock.Lock() | ||
defer m.prerequisitesLock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we tend to use SetBlocked callbacks to serialize tasks, not locks, so they don't even start
overlord/snapstate/handlers.go
Outdated
if snapsup.Name() == defaultCoreSnapName { | ||
// if something else installs core | ||
// we need to wait for it | ||
waitNextHelper(t.Change(), state.NewTaskSet(chg.Tasks()...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless I'm confused this will wait for the full change there, even the actual installed non-core snaps, shouldn't we just wait for things in the core lane/core itself ?
also I thought the original plan was not to cross wait, but return with Retry here and wait that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right, this is one step too far. I will do the Retry as originally planed (I was mostly curious about exploring the cross-change waits but this can wait :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this now.
} | ||
|
||
// not installed, nor queued for install -> install it | ||
ts, err := Install(st, defaultCoreSnapName, defaultBaseSnapsChannel, snap.R(0), snapsup.UserID, Flags{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need consider that people can still do "snap install core" and here we release the state lock, so this can return with a conflict I think and then we need to setup waiting instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added code for this now, I need to think a bit about a test.
overlord/snapstate/handlers.go
Outdated
func (m *SnapManager) undoPrerequisites(t *state.Task, _ *tomb.Tomb) error { | ||
// FIXME: undo any bases/content-providers once we start installing | ||
// those | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might use a gc mechanism instead, where once in a while we just remove unreferenced stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I removed it for now.
overlord/snapstate/handlers.go
Outdated
|
||
waitNextHelper := func(chg *state.Change, ts *state.TaskSet) { | ||
for _, t := range chg.Tasks() { | ||
t.Kind() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this a print at some point?
Codecov Report
@@ Coverage Diff @@
## master #3727 +/- ##
==========================================
+ Coverage 75.78% 75.83% +0.04%
==========================================
Files 414 414
Lines 35646 35687 +41
==========================================
+ Hits 27014 27062 +48
+ Misses 6729 6724 -5
+ Partials 1903 1901 -2
Continue to review full report at Codecov.
|
Add scenario so that some-snap gets installed and during the install (when unlocking for the store info call for core) an explicit "snap install core" happens. In this case the snapstate will return a change conflict. We handle this via a retry, ensure this is actually what happens.
for !chg.IsReady() { | ||
// nothing | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is strange, why would the changes affect first boot stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not yet figured out why this affects the test, I mean, I understand why settle() never finishes, but I have not tracked down why this was working before because we call ensureSeedYaml a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll try to play with this locally myself tomorrow, it might be that the test was problematic already and some change here triggered something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played a bit, the "issue" seems to be that prerequisites not having an undo handler make it so that the change is ready earlier and so an ensure cycle triggers agains seed. I don't care too much either way, but I'm not a fan of the Overlord.Loop() in this approach. I think firstboot_test.go are the only tests that use a full overlord in *state/*_test.go. I have a plan since a while to have a overlord.Mock() that would give us a bit more control when we need to setup more than one manager, maybe it's time for me to look into that. It hopefully would help kill all the various settle helpers we have around as well but need to try.
overlord/snapstate/handlers.go
Outdated
const defaultBaseSnapsChannel = "stable" | ||
|
||
// timeout for tasks to check if the prerequisites are ready | ||
var prerequisitesRetryTimeout = 1 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be shorter, like 30s.
for _, t := range chg.Tasks() { | ||
t.WaitAll(ts) | ||
} | ||
chg.AddAll(ts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to set the task state to done directly like we do for link-snap, otherwise there's non-zero chance that we crash in the middle of exiting the task and we will have the new tasks in the change but will rerun prerequisites again and it will conflict with itself, or we need to deal with that situation otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we check for only one kind of conflict error/case
overlord/snapstate/handlers.go
Outdated
t.WaitAll(ts) | ||
} | ||
chg.AddAll(ts) | ||
// ensure this is not rerun again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something more explicit // make sure that the new change is committed to the state together with marking this task done
ts, err := Install(st, defaultCoreSnapName, defaultBaseSnapsChannel, snap.R(0), snapsup.UserID, Flags{}) | ||
// something might have triggered an explicit install of core while | ||
// the state was unlocked -> deal with that here | ||
if _, ok := err.(changeDuringInstallError); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this check for changeConflictError too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think it is sensible to test for both, I have not yet managed to write a test that triggered the changeConflictError condition, I will think about this tomorrow some more.
overlord/snapstate/snapstate_test.go
Outdated
|
||
// Two changes are created, the first will fails, the second will | ||
// be fine. The order of what change runs first is random, the | ||
// first change will also insall core in its own lane. This test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
This is prereq work for auto-installing missing base snapd and
missing default-provider for content snaps.
It moves the ensureCore() from daemon/api.go to overlord/snapstate/snapstate.go as part of doInstall().
There is a bit of TOCTOU in the check (below in line 437) if we need to download the core, if something else creates a change in parallel that also ensureCore() and both are not yet added to a change then the check if we have something else that already installs core will not see this (I think
We probably also want to pass the userID to "snapstate.InstallPath()" and "snapstate.Try()" as an additional parameter so that the download code can set this correctly. This is not done yet but I plan to do it here (with some more tests). Then this can be expanded to cover bases and content snaps.