Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow instructors to share files via shared group id #1000

Merged
merged 20 commits into from
Aug 21, 2019

Conversation

rkdarst
Copy link
Contributor

@rkdarst rkdarst commented Aug 2, 2018

Hi,

I have been working on a setup where I wanted instructors to be able to share releasing files, collecting, and so on.nbgrader says that it doesn't support this, but I tried anyway. This PR contains the changes needed to make this work, which have been mostly minimal. Things are not perfect, and some of these changes may be not needed. I'm still testing locally and this is mainly to get some more expert opinions.

This adds a new option groupshared which enables the behavior. By default, no behavior should change.

These changes assume that the following are set by the environment (admins). They are satisfied in my setup because we run via kubespawner and I carefully control these:

  • instructors have a unique-per-course primary gid which students do not have. Alternatively, the setgid bit could be set (in practice I do both). If using setgid, you have to set that up yourself.
  • umasks are 0007
  • shared course and exchange directories, like normal.
  • Instructors won't do something wrong with permissions which breaks the security.

We expect these behaviors to happen when groupshared = True:

  • All info in the course directory is g+rw. Instructors can all edit files.
  • Multiple instructors can release and collect files from the exchange. Basically, when releasing, group permissions mirror user, not other. Because the submitted folder is set g+rwx, other instructors can also collect.
  • Don't insist on non-group writability in some places.

It actually seems to mostly work (though I am still doing internal tests) Remaining problems (I will keep updating this as I find more):

  • The groupshared set in two places. Set it only once. (I didn't see a common base class...)
  • when the release/ and release/{assignment_id} directories are created, it is g-w. Haven't been able to figure out why yet - it is not the umask, and I can't figure out where in the code the dirs are created. (help requested!)
  • sqlite3 hardcodes a create mode of 644. I hacked around this by touching gradebook.db and setting a proper mode first.
  • tests
  • Study 664 vs 444 as default permissions for autograde. (rkdarst)
  • Study what the problem with ensure_root was (rkdarst)

For now I am trying to just make something that will work, then I can work on improving it more. What do you think? Is this something that I should work on to merge upstream?

@rkdarst
Copy link
Contributor Author

rkdarst commented Aug 5, 2018

release/ and release/{assignment_id} were annoying to fix, because they were created from ipython_genutils which has a mode argument, but no way to pass it the mode unless two other packages are modified. Fixed by locally overriding, and just pushed again. Now all the problems I know about are solved... let's hope I don't find more.

@rkdarst
Copy link
Contributor Author

rkdarst commented Aug 6, 2018

I think this is coming close to completion (I'm finding less things that are missing). I'd like feedback on what you think. If it's worth taking it forward, I can ask about some questions relating to internal improvement it needs, and then start fixing docs and tests.

@jhamrick jhamrick added this to the 0.6.0 milestone Oct 6, 2018
@rkdarst
Copy link
Contributor Author

rkdarst commented Oct 7, 2018

I said I would comment on how my PRs went...

This has been working so far, it seems, because there haven't been any complaints about wrong permissions lately and files seem OK. Can't promise I got everything, though.

Things that are still needed:

  • documentation
  • make only one groupshared option, right now you have to set it twice on two different casses - I'm not sure how to do this.
  • tests? It will require a new type of test I guess.
  • Check if any of the logic can be simplified, especially the inline ternary operators

Currently, I will wait for a detailed review before starting work. But I can't promise to get back to it soon, so if someone else take it and fixes stuff, please go ahead. However, some of these things I could not do quickly so I encourage help!

Copy link
Member

@jhamrick jhamrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks good! I just want to try to understand the use case a bit better, though---is the idea that you have multiple instructors each with their own course directory, or who share a course directory? I guess if it's the latter, then you are not using the formgrader web interface since that requires that the user's notebook can access the files, which it wouldn't if those files are in another user's directory?

@default("permissions")
def _permissions_default(self):
return 444
return 664 if self.groupshared else 644
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think that the permissions should still be 444 when running the autograder, which is what this line does (it looks like for assign and feedback, it gets overwritten to 644). The reasoning behind that is to prevent instructors from modifying the autograded notebooks (mostly to prevent confusion, so students don't get back a notebook that is different from the one they submitted). So I'm fine changing the default here but we should also change it to 444 in the autograder then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one consideration was: if instructor1 autogrades, then it creates some files which instructor2 can't remove. So no one else can autograde an updated assignment. (This was a problem somewhere, but I may be remembering something wrong).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hmm, I see. I think though that if the file has the right group ownership, then other people in the group should be able to remove the file, even if write permissions are disabled. At least, you do see a similar issue with single instructors---they can't directly remove the file but because they own it they can change the permissions and then remove (and I believe this happens in nbgrader somewhere under the hood if you use --force). So I would expect you could do something similar with group-level permissions, though I haven't tested it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to try without to see what the problem was. It could relate to subdirs (if you can write dir, you can remove file, but if you can't write subdir, you can't delete stuff within to get an empty dir to remove)...

I'll note to come back to this.

nbgrader/converters/base.py Outdated Show resolved Hide resolved
nbgrader/exchange/exchange.py Outdated Show resolved Hide resolved
nbgrader/converters/base.py Outdated Show resolved Hide resolved
nbgrader/converters/base.py Outdated Show resolved Hide resolved
nbgrader/converters/base.py Outdated Show resolved Hide resolved
nbgrader/converters/base.py Outdated Show resolved Hide resolved
nbgrader/exchange/exchange.py Outdated Show resolved Hide resolved

def start(self):
if sys.platform == 'win32':
self.fail("Sorry, the exchange is not available on Windows.")

self.ensure_root()
if not self.groupshared:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think this check is mainly to ensure that the current user can access the exchange, so I think it is probably still worthwhile to keep this check in even when self.groupshared is set. Unless there's some reason why this check doesn't work that I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commit message was groupshared: don't force exchange root +w for everyone. I can't seem to remember the exact reason, but... maybe it's because now that instructors have a special group, we don't need to be g+w for everyone. But this doesn't necessarily seem specific to groupshared.

One other thing I remembered was tring to reset permissions or force-create the directory. But, users can't chmod dirs owned by others, so it fails. This may be realated to me not setting others+write on the top-level exchange directory.

I should think about this more and see if it's really needed. I probably have to remove it and see what breaks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is the same comment as below (#1000 (comment)). Discussion about it is written there. I'll leave this sub-conversation open but do discussion down there.

@jhamrick
Copy link
Member

jhamrick commented Oct 7, 2018

Oh, hah, we both commented at the same time :)

This has been working so far, it seems, because there haven't been any complaints about wrong permissions lately and files seem OK. Can't promise I got everything, though.

Awesome---having use cases in an actual course is my favorite way to have nbgrader be tested, so if it seems to be working that's great.

documentation

I think as a start you could just write up the details of your particular use case and how you're using the option. If you don't have time to write official documentation right away what would be helpful would be to open a new issue with detailed information and then I or you, or someone else can later add it to the official docs.

make only one groupshared option, right now you have to set it twice on two different casses - I'm not sure how to do this.

See my review comments.

tests? It will require a new type of test I guess.

Yeah... the current tests don't really check permissions either because the require having different user accounts which I couldn't figure out how to do on Travis. Even though I'd prefer to have tests I'm not sure what the best way to test it is. Maybe at the very least just copy the existing tests and run them with the groupshared option just to make sure stuff doesn't break.

Check if any of the logic can be simplified, especially the inline ternary operators

I'll have another look with respect to this.

@rkdarst
Copy link
Contributor Author

rkdarst commented Oct 7, 2018

Big picture comment:

Overall I think this looks good! I just want to try to understand the use case a bit better, though---is the idea that you have multiple instructors each with their own course directory, or who share a course directory? I guess if it's the latter, then you are not using the formgrader web interface since that requires that the user's notebook can access the files, which it wouldn't if those files are in another user's directory?

I guess this is the hardest thing to convey, since it's quite different from what other poeple do. I control the whole system: user accounts, filesystem mounts, users, and groups. An artifical example (in reality done with kubernetes mounts and uid/gid setting):

addgroup instructors
adduser inst1 instructors
adduser inst2 instructors
mkdir /coursedir/
chgrp instructors /coursedir/
chmod 2770 /coursedir/

So, there is one course directory shared between two instructors (the second option). Because it is mode g+rwX, both people can access. Formgrader Just Worked(tm) with this setup. Because it is g+s, files inside automatically get g+w so both instructors can do course management.

Some of the changes to g+w were so that if one instructor does something, the other can come back and (auto)overwrite/delete it to run the same task again. There were other small implicit assumptions which needed updating.

@jhamrick
Copy link
Member

jhamrick commented Oct 7, 2018

Ok, got it. And when you say the formgrader just worked, were you running a separate notebook service with the formgrader from the course directory, that then both instructors had access to? Or did each instructor separately run a formgrader instance?

@rkdarst
Copy link
Contributor Author

rkdarst commented Oct 7, 2018 via email

@jhamrick
Copy link
Member

jhamrick commented Oct 7, 2018

Huh, I am surprised that works given #1012. I'm curious what the nbgrader_config.py looks like for your instructor accounts?

@rkdarst
Copy link
Contributor Author

rkdarst commented Oct 7, 2018

The nbgrader_config.py files are fun:

c = get_config()
c.CourseDirectory.root = "/course"
c.Exchange.course_id = "testcourse"
c.Exchange.multiuser = True
c.Exchange.groupshared = True
c.BaseConverter.groupshared = True
c.AssignmentList.assignment_dir = "/notebooks/"
c.Exchange.assignment_dir = "/notebooks/"                    # I think this is duplicated for backwards compatibilty with a previous version.  Or maybe not...
c.ExecutePreprocessor.timeout = 120
c.NbGrader.logfile = "/course/.nbgraber.log"

I give students an instructors access to the whole filesystem: notebook_dir="/". Per-user notebooks are mounted in /notebooks, and course dir is at /course. default_url is set so that you start in notebooks if not an insturctor, /course if an instructor. There is also an optional /coursedata directory, which people should be able to access.

I couldn't mount the course and coursedata directories as subpaths of /notebooks because of the way NFS works. I have root_squash on, so I couldn't get kubernetes to mkdir/mount within the user-only /notebooks directory. (Student directories have their real university UID, because they can also mount them via SMB. So I can't make student directories world-read/write).

@jhamrick
Copy link
Member

jhamrick commented Oct 7, 2018

Ahh, ok, I think I understand now. Thanks for all the clarifications!

I think the reason why it works for you is because you set notebook_dir to be the root of the filesystem---that way all the course directories are still under the root of the notebook server. I believe your setup wouldn't work if that weren't the case (as in #1012 ). I think it's generally recommended to not run the notebook server from the root of the filesystem, for security reasons, but since you're working with kubernetes I guess it's not so much a problem there.

Anyway, I undestand better now what your usecase is and I think that's definitely fine. We will just need to make sure we document this usecase so people understand what the different options are for. This could potentially go somewhere around here: https://nbgrader.readthedocs.io/en/master/configuration/jupyterhub_config.html#example-use-case-one-class-multiple-graders as an alternate way to setup a class with multiple graders.

@rkdarst
Copy link
Contributor Author

rkdarst commented Oct 7, 2018

I've just tried to correct the easy problems, but I am sure there is more. I am about to run out of time now, but once you give me a go-ahead I will rebase to new master with working tests.

One could ask "why am I going through all this?" I wanted to start with the vision of a shared unix system and make Jupyter fit to that, rather than the idea of a typical limited Jupyter setup. And for this, nbgrader is a side-feature. I think the result has been good.

@jhamrick
Copy link
Member

jhamrick commented Oct 7, 2018

once you give me a go-ahead I will rebase to new master with working tests.

👍

(though FYI tests are still a bit flaky on master, sigh...)

@rkdarst
Copy link
Contributor Author

rkdarst commented Oct 7, 2018

I think it's generally recommended to not run the notebook server from the root of the filesystem, for security reasons, but since you're working with kubernetes I guess it's not so much a problem there.

Well, I figured you're running arbitrary code anyway, so you can access everything if you try. I do the same on our HPC systems, but users run with the exact permissions they would have when the login with ssh anyway. I say let them learn Linux.

I might change to have a notebook_dir="/notebook_dir", and mount stuff at /notebook_dir/notebooks, /notebook_dir/course, /notebook_dir/coursedata. This will have the same problem fixed by the same solution, though, so I think it's important anyway.

Anyway, I undestand better now what your usecase is and I think that's definitely fine. We will just need to make sure we document this usecase so people understand what the different options are for. This could potentially go somewhere around here: https://nbgrader.readthedocs.io/en/master/configuration/jupyterhub_config.html#example-use-case-one-class-multiple-graders as an alternate way to setup a class with multiple graders.

Thanks. I'll see what I can do... but no time now.

@bbhopesh
Copy link

bbhopesh commented Oct 21, 2018

HI @rkdarst. I have exactly same use case where I have one gid for instructors group. Thanks a lot for doing this.
I was trying to work around this problem by setting configuration c.BaseConverter.permissions=660. This was not perfect though as this only affects .ipynb files and not intermediate directories like release/{assignment_id}.
Your code will be released with 0.6.0, in the meanwhile, what do you think will be best workaround for me to set correct permissions on release/{assignment_id}?

@bbhopesh
Copy link

HI @jhamrick . I ran into the same problem as #1012 . My work around was bit different from that of @rkdarst. I(the admin) creates a symlink to the shared nbgrader root directory in user's home and add this line to nbgrader_config.py c.CourseDirectory.root = os.path.expanduser('~/symlink_to_ngroot')

- When the `release` and `release/{assignment_id}` directories are
  created, they are hardcoded to be mode=644.
- These are set deep within ipython_genutils, which is called via
  nbconvert.  ipython_genutils.path.ensure_dir_exists has a mode
  argument, but would require modification of both ipython_genutils
  and nbconvert in order to pass an argument all the way to there.
- Thus, we have to manually change directory permissions after the
  fact.
- On my NFS system, the stat.st_uid result is misleading and you can
  actually change permissions despite what the uid says.
- Instead of separate assignment/exchange options, which was redundant
  and a bad design in the first place.
- Thanks to @jhamrick for the pointer in review.
- fixup: groupshared: fix permissions of subdirs of release/ directory
- Also be more intelligent about attempting to change modes when
  needed, don't rely only on uid.
@rkdarst
Copy link
Contributor Author

rkdarst commented Jul 19, 2019

OK, rebased to current master and checked all the pieces. But, still no automated tests and no manual tests from anyone other than me.

Most issues are resolved, with the exception of one related issue above. I'll probably move it to a different branch, but will wait for your opinion anyway.

@rkdarst
Copy link
Contributor Author

rkdarst commented Jul 30, 2019

I've now added tests for:

  • generate assignment
  • release assignment
  • collect
  • autograde
  • generate feedback
  • release feedback

Push incoming once they pass locally. Then... I think almost everything has been resolved for this.

@rkdarst rkdarst force-pushed the groupshared branch 3 times, most recently from 93ed21e to faf185d Compare July 30, 2019 16:53
@rkdarst
Copy link
Contributor Author

rkdarst commented Jul 30, 2019

Took some time to debug windows builds, but it's done now.

Looking at the CI output, the failure is codecov - but basically every line is hit except exception handling, which requires there to be a wrong UID owning a file so that os.chmod fails. This is such a corner case and difficult to test (would require OS switching users or emulating that) that it can probably be left out. Am I interpreting this correctly?

Then the remaining changes requested are discussed here: #1000 (review) . What do you think should be done?

Copy link
Member

@jhamrick jhamrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the failure is codecov

That's fine, it's sometimes hard to make codecov happy 😄

Thanks a lot for your work on this!

@jhamrick jhamrick changed the title [WIP] Allow instructors to share files via shared group id Allow instructors to share files via shared group id Aug 21, 2019
@jhamrick jhamrick merged commit ecb7aea into jupyter:master Aug 21, 2019
@rkdarst
Copy link
Contributor Author

rkdarst commented Aug 21, 2019

Thanks! This was quite a huge process, thanks for the review and hints.

Next up is documentation. Would you like this option to be somewhat hidden or given extensive documentation? I can think to add it to one of:

  • only under "configuration options" with an more extensive help string that serves as the full documentation of how to use it. This would make it only appear under "Configuration options" (relatively hidden).
  • under the "Advanced topic" section (promoted).

Of course whatever the case, it will be clearly indicated what the conditions and risks are.

@jhamrick
Copy link
Member

Thanks! This was quite a huge process, thanks for the review and hints.

Thank you for all the effort you put into getting this in! (And sorry it took so long on my end)

Next up is documentation. Would you like this option to be somewhat hidden or given extensive documentation?

I think it is probably fine to put it under the "Advanced" section so we can give a bit more extensive documentation, but I would maybe explicitly add a warning using one of the .. warning sphinx directives saying this feature is more experimental and that we are less likely to be able to provide help on debugging if someone can't get it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants