Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon, snapstate: move ensureCore from daemon/api.go into snapstate.go #3727

Merged
merged 21 commits into from
Sep 6, 2017

Conversation

mvo5
Copy link
Contributor

@mvo5 mvo5 commented Aug 14, 2017

This is prereq work for auto-installing missing base snapd and
missing default-provider for content snaps.

It moves the ensureCore() from daemon/api.go to overlord/snapstate/snapstate.go as part of doInstall().

There is a bit of TOCTOU in the check (below in line 437) if we need to download the core, if something else creates a change in parallel that also ensureCore() and both are not yet added to a change then the check if we have something else that already installs core will not see this (I think

We probably also want to pass the userID to "snapstate.InstallPath()" and "snapstate.Try()" as an additional parameter so that the download code can set this correctly. This is not done yet but I plan to do it here (with some more tests). Then this can be expanded to cover bases and content snaps.

This is prereq work for auto-installing missing base snapd and
missing default-provider for content snaps.
@mvo5 mvo5 requested a review from pedronis August 14, 2017 14:32
"timestamp": time.Now().Format(time.RFC3339),
}
headers["snap-id"] = fakeSnapID(headers["snap-name"].(string))
a, err := ms.storeSigning.RootSigning.Sign(asserts.SnapDeclarationType, headers, nil, "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ms.storeSigning is enough . RootSigning should be rarely needed (mostly to sign main keys)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but when I just use the "storeSigning" I get an error like this:

FAIL: managers_test.go:105: mgrsSuite.SetUpTest

managers_test.go:174:
    c.Assert(err, IsNil)
... value *errors.errorString = &errors.errorString{s:"no matching public key \"XcSmSI47H8EdldbP1BMKBUXs_5Jr0Ck2wSecjcExc5KMEAzsfZocbpaoSvQAW7_9\" for signature by \"can0nical\""} ("no matching public key \"XcSmSI47H8EdldbP1BMKBUXs_5Jr0Ck2wSecjcExc5KMEAzsfZocbpaoSvQAW7_9\" for signature by \"can0nical\"")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually that means the test is missing to add the store key in the local database, I would need to look more carefully. But SnapDeclarations are not signed by the root key so it's better to keep things realistic and use the store key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like this:

    err := assertstate.Add(st, ms.storeSigning.StoreAccountKey(""))
    c.Assert(err, IsNil)

return nil, err
}
// some other change aleady installs core
if snapsup.Name() == defaultCoreSnapName {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is problematic because we could end up with this snap but without core if that other core install failed, or try to run things too early in this install that might need core to be there. I fear we need to give up instead for now or construct a cross change dependency which we never tried

}

// check that core is not installed already
_, err := CoreInfo(st)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be done before the check over all Changes

ts := state.NewTaskSet()
if flags&doNotAutoInstallMissingBases == 0 {
var err error
ts, err = ensureCore(st, snapsup.Name(), snapsup.UserID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too simple, because during an InstallMany or UpdateMany we will get here many times but the change doesn't exist yet, so will create many tasks to install core unless I'm confused

@mvo5
Copy link
Contributor Author

mvo5 commented Aug 15, 2017

Closing for now as this needs some re-thinking for {Install,Update}Many.

@mvo5 mvo5 closed this Aug 15, 2017
@mvo5
Copy link
Contributor Author

mvo5 commented Aug 18, 2017

Reopen for more early feedback from Samuele (and everyone else who is interessted)

@mvo5 mvo5 reopened this Aug 18, 2017
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments

// the state lock is not enough to ensure this is not run in
// parallel as it is dropped in snapstate.Install() below
m.prerequisitesLock.Lock()
defer m.prerequisitesLock.Unlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we tend to use SetBlocked callbacks to serialize tasks, not locks, so they don't even start

if snapsup.Name() == defaultCoreSnapName {
// if something else installs core
// we need to wait for it
waitNextHelper(t.Change(), state.NewTaskSet(chg.Tasks()...))
Copy link
Collaborator

@pedronis pedronis Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless I'm confused this will wait for the full change there, even the actual installed non-core snaps, shouldn't we just wait for things in the core lane/core itself ?

also I thought the original plan was not to cross wait, but return with Retry here and wait that way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right, this is one step too far. I will do the Retry as originally planed (I was mostly curious about exploring the cross-change waits but this can wait :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this now.

}

// not installed, nor queued for install -> install it
ts, err := Install(st, defaultCoreSnapName, defaultBaseSnapsChannel, snap.R(0), snapsup.UserID, Flags{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need consider that people can still do "snap install core" and here we release the state lock, so this can return with a conflict I think and then we need to setup waiting instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added code for this now, I need to think a bit about a test.

func (m *SnapManager) undoPrerequisites(t *state.Task, _ *tomb.Tomb) error {
// FIXME: undo any bases/content-providers once we start installing
// those
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might use a gc mechanism instead, where once in a while we just remove unreferenced stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I removed it for now.


waitNextHelper := func(chg *state.Change, ts *state.TaskSet) {
for _, t := range chg.Tasks() {
t.Kind()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this a print at some point?

@codecov-io
Copy link

codecov-io commented Aug 21, 2017

Codecov Report

Merging #3727 into master will increase coverage by 0.04%.
The diff coverage is 84.53%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3727      +/-   ##
==========================================
+ Coverage   75.78%   75.83%   +0.04%     
==========================================
  Files         414      414              
  Lines       35646    35687      +41     
==========================================
+ Hits        27014    27062      +48     
+ Misses       6729     6724       -5     
+ Partials     1903     1901       -2
Impacted Files Coverage Δ
overlord/snapstate/snapstate.go 80.86% <100%> (+0.45%) ⬆️
overlord/snapstate/snapmgr.go 80.57% <100%> (+0.91%) ⬆️
daemon/api.go 72.5% <100%> (+0.05%) ⬆️
overlord/snapstate/handlers.go 65.6% <75%> (+0.49%) ⬆️
interfaces/sorting.go 98.71% <0%> (-1.29%) ⬇️
overlord/devicestate/devicemgr.go 75.43% <0%> (+0.69%) ⬆️
cmd/snap/cmd_aliases.go 95% <0%> (+1.66%) ⬆️
wrappers/binaries.go 79.54% <0%> (+6.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9e7ddf...9966425. Read the comment docs.

Add scenario so that some-snap gets installed and during the
install (when unlocking for the store info call for core) an
explicit "snap install core" happens. In this case the snapstate
will return a change conflict. We handle this via a retry, ensure
this is actually what happens.
for !chg.IsReady() {
// nothing
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is strange, why would the changes affect first boot stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet figured out why this affects the test, I mean, I understand why settle() never finishes, but I have not tracked down why this was working before because we call ensureSeedYaml a lot.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll try to play with this locally myself tomorrow, it might be that the test was problematic already and some change here triggered something.

Copy link
Collaborator

@pedronis pedronis Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played a bit, the "issue" seems to be that prerequisites not having an undo handler make it so that the change is ready earlier and so an ensure cycle triggers agains seed. I don't care too much either way, but I'm not a fan of the Overlord.Loop() in this approach. I think firstboot_test.go are the only tests that use a full overlord in *state/*_test.go. I have a plan since a while to have a overlord.Mock() that would give us a bit more control when we need to setup more than one manager, maybe it's time for me to look into that. It hopefully would help kill all the various settle helpers we have around as well but need to try.

const defaultBaseSnapsChannel = "stable"

// timeout for tasks to check if the prerequisites are ready
var prerequisitesRetryTimeout = 1 * time.Minute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be shorter, like 30s.

for _, t := range chg.Tasks() {
t.WaitAll(ts)
}
chg.AddAll(ts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to set the task state to done directly like we do for link-snap, otherwise there's non-zero chance that we crash in the middle of exiting the task and we will have the new tasks in the change but will rerun prerequisites again and it will conflict with itself, or we need to deal with that situation otherwise

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we check for only one kind of conflict error/case

t.WaitAll(ts)
}
chg.AddAll(ts)
// ensure this is not rerun again
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something more explicit // make sure that the new change is committed to the state together with marking this task done

ts, err := Install(st, defaultCoreSnapName, defaultBaseSnapsChannel, snap.R(0), snapsup.UserID, Flags{})
// something might have triggered an explicit install of core while
// the state was unlocked -> deal with that here
if _, ok := err.(changeDuringInstallError); ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this check for changeConflictError too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think it is sensible to test for both, I have not yet managed to write a test that triggered the changeConflictError condition, I will think about this tomorrow some more.


// Two changes are created, the first will fails, the second will
// be fine. The order of what change runs first is random, the
// first change will also insall core in its own lane. This test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thank you!

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

@pedronis pedronis merged commit 3954d21 into canonical:master Sep 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants