-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance daemon states management to improve usability and stability #152
Enhance daemon states management to improve usability and stability #152
Conversation
0a1baf6
to
474561b
Compare
2c733e2
to
154e065
Compare
Codecov ReportBase: 30.00% // Head: 30.25% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #152 +/- ##
==========================================
+ Coverage 30.00% 30.25% +0.25%
==========================================
Files 20 20
Lines 1650 1626 -24
==========================================
- Hits 495 492 -3
+ Misses 1090 1069 -21
Partials 65 65
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
pkg/filesystem/fs/fs.go
Outdated
if _, e := fs.getSharedDaemon(); e != nil { | ||
log.G(ctx).Warnf("initialize the shared nydus daemon") | ||
if d := fs.getSharedDaemon(); d != nil && !d.Connected { | ||
log.G(ctx).Infof("initialize the shared nydus daemon") | ||
fs.initSharedDaemon(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error needs to be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pkg/filesystem/fs/fs.go
Outdated
if fs.mode == SharedInstance { | ||
if d.ID != daemon.SharedNydusDaemonID && sharedDaemonConnected { | ||
if err := d.SharedMount(); err != nil { | ||
return nil, errors.Errorf("failed to mount rafs when recovering, %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, errors.Wrapf(err, "failed to mount rafs when recovering")
should be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pkg/filesystem/fs/fs.go
Outdated
} | ||
} else { | ||
if !d.Connected { | ||
fs.manager.StartDaemon(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error needs to be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
s.idxBySnapshotID[daemon.SnapshotID] = daemon | ||
|
||
if ok { | ||
return old |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to return old
? seems it doesn't to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is just a hint to callers. It is up to the caller if they need to check the daemon is previously inserted.
By returning old daemon pointer, caller has a chance to tell it is overwitten.
pkg/process/manager.go
Outdated
deleted = victim | ||
continue | ||
} | ||
ds = append(ds, victim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the victim added to the survivor list? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The matching victim is already skipped. Please check it out on L83.
return old | ||
} | ||
|
||
func (s *DaemonStates) RemoveByDaemonID(id string) *daemon.Daemon { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem it not to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is the original semantic that daemon can be removed by its snapshot ID. So I transformed it a bit.
pkg/process/manager.go
Outdated
return daemon | ||
} | ||
|
||
func (s *DaemonStates) GetBySnapshotID(id string, op func(s *DaemonStates, d *daemon.Daemon)) *daemon.Daemon { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to put s
as a parameter of the callback func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For current usage, the callback is to delete an item from daemons states cache. So it has to access the s as daemon states manager
} | ||
|
||
listed := make([]*daemon.Daemon, len(s.daemons)) | ||
copy(listed, s.daemons) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to copy one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is protected by the mutex lock. If not copying the slice, we can't say the returned slice is consistent.
50684ef
to
f203e5d
Compare
Decouple daemon states cache from persistence storage in DB. So it is easier to do nydusd daemon recover. Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
f203e5d
to
f86c905
Compare
Several community users and nydus contributors have complained about nydus-snapshotter's usability and stability.
To solve the issues once and for all, this PR refactors nydusd daemons states management. It decouples in-memory states cache from DB persistence storage. Snapshotter never empties the DB when restarting but to rebuild the in-memory daemon states cache to get consistent with DB persistent storage. In addition, it will request API to restarted nydusd daemon to recover the API mounts.
Several E2E cases are accompanied.
Address: #142 #125