-
Notifications
You must be signed in to change notification settings - Fork 261
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's issue are you seeing?
shim_fs_ops::checkpoint is to return fs specific data.
Currently only the chroot fs provides checkpoint method and it seems okay to call checkpoint method once and to re-use cpdata.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @smherwig)
LibOS/shim/src/fs/shim_fs.c, line 586 at r1 (raw file):
new_mount->cpdata = NULL; entry->paddr = &new_mount->cpdata; mount->cpdata = NULL;
It seems a intentional design to keep cpdata as cached.
If you're seeing an issue, can you please elaborate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not using chrootfs; I'm implementing remote filesystems. The remote filesystems are not stateless, so each time Graphene forks, the Graphene client-side portion of the filesystem needs to send an RPC to the remote server to tell the server that it is forking. The server then creates a control block (basically a file descriptor table) for the child process and returns an identifier (similar to a PID). This identifier gets passed from the parent Graphene to the child during the migration. On the Graphene child's fs migrate callback, the child process then presents the ID to the remote fs server as a sort of bearer token so that it can attach to the appropriate control block.
The question is whether checkpoint/migrate should be a generic hook for fork; currently it's assuming specific properties of the filesystem.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @smherwig)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a problem with this PR. As @yamahata notes, the only FS which implements checkpoint()
callback is chroot. Looking at fs/chroot/fs.c: chroot_checkpoint()
, it is as trivial as setting a pointer to the correct memory region (in local memory of chroot). So I don't see any performance optimization achieved by caching the checkpoint data.
Also, the logic of @smherwig makes sense: the generic checkpoint method should call the checkpoint method of the specific FS. Though I would change the source code to better reflect this.
Reviewed 1 of 1 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @smherwig and @yamahata)
LibOS/shim/src/fs/shim_fs.c, line 567 at r1 (raw file):
ADD_TO_CP_MAP(obj, off); if (!mount->cpdata &&
I think what @smherwig really wants is to call the checkpoint()
method of his specific FS always. Thus, I suggest:
mount->cpdata = NULL;
if (mount->fs_ops && mount->fs_ops->checkpoint) { ...
LibOS/shim/src/fs/shim_fs.c, line 586 at r1 (raw file):
Previously, yamahata wrote…
It seems a intentional design to keep cpdata as cached.
If you're seeing an issue, can you please elaborate it.
This line won't be needed with my proposed change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @yamahata)
LibOS/shim/src/fs/shim_fs.c, line 567 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I think what @smherwig really wants is to call the
checkpoint()
method of his specific FS always. Thus, I suggest:mount->cpdata = NULL; if (mount->fs_ops && mount->fs_ops->checkpoint) { ...
Done. I agree, setting cpdata to NULL here helps in the function's readability. I pushed the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @smherwig and @yamahata)
a discussion (no related file):
For final rebase: all commits must be squashed into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @smherwig and @yamahata)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dimakuv and @yamahata)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
For final rebase: all commits must be squashed into one.
I do suspect the goal was to cache some of the checkpointing work, but I think it would be better to push this down into the FS. I doubt there is a common case where a checkpoint will be untouched between two checkpoints, and the VFS level is the wrong point at which to try to interpret state or reason about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dimakuv and @yamahata)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dimakuv)
LibOS/shim/src/fs/shim_fs.c, line 586 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This line won't be needed with my proposed change.
+1. the above two lines can be removed.
Other change looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dimakuv and @yamahata)
LibOS/shim/src/fs/shim_fs.c, line 586 at r1 (raw file):
Previously, yamahata wrote…
+1. the above two lines can be removed.
Other change looks good to me.
@yamahata, but the line I wanted to remove was already removed. Do you want to remove these two lines?
new_mount->cpdata = NULL;
entry->paddr = &new_mount->cpdata;
I am not sure this will be safe. I believe these two lines should stay as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @yamahata from a discussion.
Reviewable status: complete! all files reviewed, all discussions resolved
With Isaku's consent, remove due to lack of time for reviewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks okay, but there can be a potential memory leakage issue if mount->cpdata
points to a malloc'ed data. Please add an issue regarding this problem.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, see #744.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
Retest please |
Retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
Previously, FS-specific checkpoint callback was called only the first time process forks. This commit fixes this bug by always calling FS-specific checkpoint (it is now the responsibility of the specific checkpoint callback to skip re-creating the checkpoint as perf optimization).
Retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4.
Reviewable status: complete! all files reviewed, all discussions resolved
Affected components
Description of the changes (reasons and measures)
The mount->cpdata field is not reset to NULL; hence, the checkpoint callback is only called the first time a process forks.
Fixes #595.
How to test this PR? (if applicable)
This change is