Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add fuse-manager #324

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

[WIP] Add fuse-manager #324

wants to merge 1 commit into from

Conversation

ilyee
Copy link
Contributor

@ilyee ilyee commented May 19, 2021

What is fusemanager

fuse-manager is a fuse daemon that handles all fuse processes, so fuse goroutines won't get killed when we restart stargz-snapshotter, and runnings containers don't need to be restarted.

By separating out fuse-manager, filesystem operations will all be handed by fuse-manager, and stargz-snapshotter is only responsable of snapshot operations.

How to run it

  1. Run fuse-manager with option fusestore (address for the fusemanager's store) and socket (address for the fusemanager's gRPC socket)
  2. Run stargz-snapshotter with option fusemanager-address (same with socket), and enable-fusemanager with 'true' to enable it

How it works

fuse-manager interacts with stargz-snapshotter through gRPC, its client implements Filesystem interface, so can be passed to snbase.NewSnapshotter to create a new snapshotter instance.

After running grpc server, fuse-manager waits for Init method to be called. When stargz-snapshotter starts(or restarts), Init will be called to pass fs related configs to fuse-manager, which will create a new fs instance for handling new incoming Mount/Check/Unmount requests.

@ilyee
Copy link
Contributor Author

ilyee commented May 19, 2021

This pr refers to issue #318

Copy link
Member

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for PR!

snapshot/snapshot.go Outdated Show resolved Hide resolved
fs/fs.go Outdated Show resolved Hide resolved
fusemanager/fusemanager.go Outdated Show resolved Hide resolved
@ilyee ilyee force-pushed the fuse-manager branch 2 times, most recently from 7e7a0f0 to ccd8af2 Compare May 19, 2021 12:02
@ilyee ilyee changed the title [WIP] Add fuse-manager Add fuse-manager May 19, 2021
fusemanager/api/api.proto Show resolved Hide resolved
fusemanager/fusemanager.go Outdated Show resolved Hide resolved
fusemanager/fusestore.go Outdated Show resolved Hide resolved
cmd/containerd-stargz-grpc/main.go Outdated Show resolved Hide resolved
cmd/containerd-stargz-grpc/main.go Outdated Show resolved Hide resolved
cmd/fuse-manager/main.go Outdated Show resolved Hide resolved
cmd/containerd-stargz-grpc/main.go Outdated Show resolved Hide resolved
fusemanager/fusemanager.go Outdated Show resolved Hide resolved
fusemanager/fusemanager.go Outdated Show resolved Hide resolved
@ilyee ilyee changed the title Add fuse-manager [WIP] Add fuse-manager May 20, 2021
@ilyee ilyee force-pushed the fuse-manager branch 2 times, most recently from ca6c291 to 1745155 Compare May 20, 2021 05:13
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What's the purpose of this? Why not just reimplement reloading config?
  • s/fuse-manager/stargz-fuse-manager/
  • How is the fuse-manager daemon started?

@ilyee
Copy link
Contributor Author

ilyee commented May 20, 2021

@AkihiroSuda

  • What's the purpose of this? Why not just reimplement reloading config?

As mentioned in #318 , reloading config needs a restart of snapshotter, so we need to restart all running containers and restore stargz layers. And a separate fuse daemon will focuse on fuse goroutines so snapshotter don't need to handle their lifecycles, a config update will be done quickly.

  • How is the fuse-manager daemon started?

fuse-manager should be started before stargz-snapshotter.

Its startup command is like fuse-manager -fusestore=/var/lib/containerd-stargz-grpc/fusestore.db -socket=/run/containerd-stargz-grpc/fuse-manager.sock with this options:

  • socket: gRPC socket path
  • fusestore: path to backend storage of fs related configs with mountpoint as key to restore fuse mounts after restarting fuse-manager
  • log-level: logging level

And with option fusemanager-address we can enable stargz-snapshotter to use fuse-manager, otherwise it will run without fuse-manager as well.

@AkihiroSuda
Copy link
Member

fuse-manager should be started before stargz-snapshotter.

Needs systemd unit
https://github.com/containerd/stargz-snapshotter/tree/02bf457fbb2044ead1d4958e364010f42b43e6ea/script/config/etc/systemd/system

@AkihiroSuda
Copy link
Member

Or can we let stargz-snapshotter start stargz-fuse-manager in the background and detach it, just like containerd starts containerd-shim-runc-v2 processes in the background and detach them.

@ilyee
Copy link
Contributor Author

ilyee commented May 20, 2021

Or can we let stargz-snapshotter start stargz-fuse-manager in the background and detach it, just like containerd starts containerd-shim-runc-v2 processes in the background and detach them.

It makes sense, I'll try both and make an update

@ktock
Copy link
Member

ktock commented May 20, 2021

SGTM about the shim-like approach.

@ilyee ilyee force-pushed the fuse-manager branch 10 times, most recently from 6233527 to 4f1e4c6 Compare June 3, 2021 11:25
@ilyee
Copy link
Contributor Author

ilyee commented Jun 3, 2021

Now stargz-fuse-manager uses a shim-like approach to start

Please take a look if free @AkihiroSuda @ktock

Copy link
Member

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good but some minor nits

cmd/containerd-stargz-grpc/main.go Outdated Show resolved Hide resolved
fusemanager/api/api.proto Outdated Show resolved Hide resolved
cmd/containerd-stargz-grpc/main.go Outdated Show resolved Hide resolved
@ktock
Copy link
Member

ktock commented Jun 23, 2021

@AkihiroSuda PTAL when you have time

Before we merge this, we need to

  1. merge [WIP] Add fuse-manager #324 Makefile: add make generate for proto code generation #341,
  2. rebase this PR on that patch and
  3. make sure everything work well (including make validate-generated in CI)

let me turn this PR into draft until we complete them for avoiding to accidentally merge this PR.

@ktock ktock marked this pull request as draft June 23, 2021 14:08
@ktock ktock self-assigned this Jun 23, 2021
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a design doc to explain why we need this?

@ilyee ilyee force-pushed the fuse-manager branch 2 times, most recently from dca6dc9 to 33eaed8 Compare June 27, 2021 05:18
@ilyee
Copy link
Contributor Author

ilyee commented Jul 20, 2021

let me turn this PR into draft until we complete them for avoiding to accidentally merge this PR.

@ktock seems like these have been completed, can draft be removed?

@ilyee ilyee force-pushed the fuse-manager branch 2 times, most recently from ddfb938 to d52ebb4 Compare July 20, 2021 12:08
@ktock
Copy link
Member

ktock commented Jul 20, 2021

@ilyee Could you add changes for #324 (review) ?

@ilyee
Copy link
Contributor Author

ilyee commented Jul 20, 2021

@ilyee Could you add changes for #324 (review) ?

sure, I'll handle it

@ilyee ilyee force-pushed the fuse-manager branch 2 times, most recently from 0cdde0e to 87101f6 Compare August 13, 2021 10:10
@ilyee ilyee marked this pull request as ready for review August 13, 2021 15:34
@ilyee
Copy link
Contributor Author

ilyee commented Aug 13, 2021

@ilyee Could you add changes for #324 (review) ?

@ktock related doc is added in doc/overview.md, PTAL if free, thx

@AkihiroSuda
Copy link
Member

Needs rebase

@AkihiroSuda AkihiroSuda marked this pull request as draft August 26, 2021 01:40
Signed-off-by: Zuti He <ilyeeelihe@gmail.com>
@ilyee
Copy link
Contributor Author

ilyee commented Sep 6, 2021

Needs rebase

@AkihiroSuda done

@ilyee ilyee marked this pull request as ready for review September 7, 2021 03:56
Copy link
Member

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments.

We need to think about having tests for this.

@@ -58,14 +61,19 @@ const (
defaultLogLevel = logrus.InfoLevel
defaultRootDir = "/var/lib/containerd-stargz-grpc"
defaultImageServiceAddress = "/run/containerd/containerd.sock"
defaultFuseManagerAddress = "/run/containerd-stargz-grpc/fuse-namanger.sock"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typo doesn't seem to be fixed (/run/containerd-stargz-grpc/fuse-manager.sock)

}
return runtime.NewImageServiceClient(conn), nil
}
f, _ := cri.NewCRIKeychain(ctx, connectCRI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why CRI server is ignored here? fuse-manager should provide CRI proxy as well.

Comment on lines +165 to +168
if config.restoreSnapshots {
if err := o.restoreRemoteSnapshot(ctx); err != nil {
return nil, errors.Wrap(err, "failed to restore remote snapshot")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not change the default behaviour here. Restoring should be enabled by default and disabled by an option.

ctx := context.Background()
if err := o.cleanup(ctx, cleanupCommitted); err != nil {
if err := o.cleanup(ctx, o.cleanupCommitted); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not change the default behaviour here. cleanupCommitted should be enabled by default and disabled by an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request impact/changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants