Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

bugfix: fix shim_fs.c's BEGIN_CP_FUNC #596

Merged
merged 1 commit into from
Jun 10, 2019

Conversation

smherwig
Copy link
Contributor

@smherwig smherwig commented Apr 8, 2019

Affected components

  • README and global configuration
  • Linux PAL
  • SGX PAL
  • FreeBSD PAL
  • Common PAL code
  • Library OS (i.e., SHIM), including GLIBC

Description of the changes (reasons and measures)

The mount->cpdata field is not reset to NULL; hence, the checkpoint callback is only called the first time a process forks.
Fixes #595.

How to test this PR? (if applicable)


This change is Reviewable

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's issue are you seeing?
shim_fs_ops::checkpoint is to return fs specific data.
Currently only the chroot fs provides checkpoint method and it seems okay to call checkpoint method once and to re-use cpdata.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @smherwig)


LibOS/shim/src/fs/shim_fs.c, line 586 at r1 (raw file):

            new_mount->cpdata = NULL;
            entry->paddr = &new_mount->cpdata;
            mount->cpdata = NULL;

It seems a intentional design to keep cpdata as cached.
If you're seeing an issue, can you please elaborate it.

Copy link
Contributor Author

@smherwig smherwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not using chrootfs; I'm implementing remote filesystems. The remote filesystems are not stateless, so each time Graphene forks, the Graphene client-side portion of the filesystem needs to send an RPC to the remote server to tell the server that it is forking. The server then creates a control block (basically a file descriptor table) for the child process and returns an identifier (similar to a PID). This identifier gets passed from the parent Graphene to the child during the migration. On the Graphene child's fs migrate callback, the child process then presents the ID to the remote fs server as a sort of bearer token so that it can attach to the appropriate control block.

The question is whether checkpoint/migrate should be a generic hook for fork; currently it's assuming specific properties of the filesystem.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @smherwig)

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a problem with this PR. As @yamahata notes, the only FS which implements checkpoint() callback is chroot. Looking at fs/chroot/fs.c: chroot_checkpoint(), it is as trivial as setting a pointer to the correct memory region (in local memory of chroot). So I don't see any performance optimization achieved by caching the checkpoint data.

Also, the logic of @smherwig makes sense: the generic checkpoint method should call the checkpoint method of the specific FS. Though I would change the source code to better reflect this.

Reviewed 1 of 1 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @smherwig and @yamahata)


LibOS/shim/src/fs/shim_fs.c, line 567 at r1 (raw file):

        ADD_TO_CP_MAP(obj, off);

        if (!mount->cpdata &&

I think what @smherwig really wants is to call the checkpoint() method of his specific FS always. Thus, I suggest:

mount->cpdata = NULL;
if (mount->fs_ops && mount->fs_ops->checkpoint) { ...

LibOS/shim/src/fs/shim_fs.c, line 586 at r1 (raw file):

Previously, yamahata wrote…

It seems a intentional design to keep cpdata as cached.
If you're seeing an issue, can you please elaborate it.

This line won't be needed with my proposed change.

Copy link
Contributor Author

@smherwig smherwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @yamahata)


LibOS/shim/src/fs/shim_fs.c, line 567 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I think what @smherwig really wants is to call the checkpoint() method of his specific FS always. Thus, I suggest:

mount->cpdata = NULL;
if (mount->fs_ops && mount->fs_ops->checkpoint) { ...

Done. I agree, setting cpdata to NULL here helps in the function's readability. I pushed the change.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @smherwig and @yamahata)

a discussion (no related file):
For final rebase: all commits must be squashed into one.


mkow
mkow previously approved these changes May 31, 2019
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @smherwig and @yamahata)

Copy link
Contributor

@donporter donporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dimakuv and @yamahata)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

For final rebase: all commits must be squashed into one.

I do suspect the goal was to cache some of the checkpointing work, but I think it would be better to push this down into the FS. I doubt there is a common case where a checkpoint will be untouched between two checkpoints, and the VFS level is the wrong point at which to try to interpret state or reason about this.


donporter
donporter previously approved these changes May 31, 2019
Copy link
Contributor

@donporter donporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dimakuv and @yamahata)

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dimakuv)


LibOS/shim/src/fs/shim_fs.c, line 586 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This line won't be needed with my proposed change.

+1. the above two lines can be removed.
Other change looks good to me.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @dimakuv and @yamahata)


LibOS/shim/src/fs/shim_fs.c, line 586 at r1 (raw file):

Previously, yamahata wrote…

+1. the above two lines can be removed.
Other change looks good to me.

@yamahata, but the line I wanted to remove was already removed. Do you want to remove these two lines?

            new_mount->cpdata = NULL;
            entry->paddr = &new_mount->cpdata;

I am not sure this will be safe. I believe these two lines should stay as-is.

dimakuv
dimakuv previously approved these changes Jun 7, 2019
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @yamahata from a discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@dimakuv dimakuv dismissed yamahata’s stale review June 7, 2019 00:26

With Isaku's consent, remove due to lack of time for reviewing

@dimakuv dimakuv dismissed stale reviews from donporter, mkow, and themself via 83ff057 June 7, 2019 00:57
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

yamahata
yamahata previously approved these changes Jun 7, 2019
Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel)

chiache
chiache previously approved these changes Jun 7, 2019
Copy link
Contributor

@chiache chiache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay, but there can be a potential memory leakage issue if mount->cpdata points to a malloc'ed data. Please add an issue regarding this problem.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

dimakuv
dimakuv previously approved these changes Jun 7, 2019
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, see #744.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@dimakuv dimakuv dismissed stale reviews from chiache, yamahata, and themself via bf22bd3 June 7, 2019 22:23
dimakuv
dimakuv previously approved these changes Jun 7, 2019
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

@dimakuv
Copy link

dimakuv commented Jun 7, 2019

Retest please

@dimakuv
Copy link

dimakuv commented Jun 7, 2019

Retest this please

chiache
chiache previously approved these changes Jun 8, 2019
Copy link
Contributor

@chiache chiache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@dimakuv dimakuv dismissed stale reviews from chiache and themself via 1489ae7 June 8, 2019 05:04
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

Previously, FS-specific checkpoint callback was called only the
first time process forks. This commit fixes this bug by always calling
FS-specific checkpoint (it is now the responsibility of the specific
checkpoint callback to skip re-creating the checkpoint as perf
optimization).
@dimakuv
Copy link

dimakuv commented Jun 10, 2019

Retest this please

Copy link
Contributor

@chiache chiache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@donporter donporter merged commit 15ca4a9 into gramineproject:master Jun 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shim_fs.c's checkpoint callback only called on first first
6 participants