Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch image for stat20 hub #4846

Open
andrewpbray opened this issue Aug 11, 2023 · 4 comments
Open

Switch image for stat20 hub #4846

andrewpbray opened this issue Aug 11, 2023 · 4 comments

Comments

@andrewpbray
Copy link
Contributor

Hi team!

I've been working with @ryanlovett to build a docker image for the curriculum of Stat 20: https://hub.docker.com/repository/docker/stat20/stat20-docker/general. This is being used to compile a public and staff-facing website containing the lecture notes, slides, and assignments for the course, as well as the course documentation. All of those source docs are in this repo: https://github.com/stat20/stat20.

My question: would it make sense for the stat20 hub to pull this same image? It is currently just a tiny bit bigger than the image currently served up to students for them to run RStudio and do their assignments, but barely (just a few R packages I think). Running both from the same image should simplify the maintenance of the image and help catch bugs that students might hit on the hub since the curriculum will be regularly run through CI using the same image.

Happy to hear your thoughts!

@ryanlovett
Copy link
Collaborator

One issue is that all of the datahub images are managed within the berkeley-dsep-infra/datahub repo and merge rights are limited to tech infrastructure admins including me. When instructors and GSIs need any of those images to be updated, they can make pull requests which tend to be merged fairly quickly. But if the textbook were to use the stat20 hub image, there would be a new dependency on another group of people.

The other issue, and maybe more critical, is that datahub images are pushed to the Google Container Registry at gcr.io, rather than docker hub. This is done to improve pulling performance since the hubs run on Google Cloud. We'd have to see if we can make the images public so anyone could docker pull them. Alternatively I could find a way to push the images to docker hub during CI.

Another possibility is to have the stat20 datahub consume the stat20-docker image. At times there have been considerations to move the development of hub user images to external repos where other people have access, but it can become a management problem. Datahub images need a common set of libraries to function on a jupyterhub and those dependencies are seeded by a script in the datahub repo. I think the datahub staff would have to plan out distributed image management before we could use this approach.

I think the first approach could work and would ensure everything is on the same stack. We'd just want to make sure that changes can be integrated in a timely manner. And we'd have to resolve the container registry issue.

@ryanlovett
Copy link
Collaborator

Another possibility is that people can do textbook development on the stat20 hub directly, without needing a local docker workflow. Changes to the image would still have to go through merge and CI in this repo.

We could also install https://github.com/jupyterhub/gh-scoped-creds which enables some users of the hub to push to the textbook repo without needing to setup PATs or ssh keys. This was used on the stat159 datahub. I could try setting up gh-scoped-creds on stat20 hub if you'd like to give you a sense of how it'd work.

@andrewpbray
Copy link
Contributor Author

In light of our conversation today (most dev can be done not in the container), doing dev on the hub sounds better and better. The approach for most instructors would be: feel free to do your dev locally. If one of your PRs doesn't pass the checks, then 1. read the docs about how to file a PR to add the dependency to the image and 2. if it's not clear what the problem is, switch over to doing dev directly on the hub (that is, log into rstudio on the hub and pull down the branch you're working on and troubleshoot there).

@ryanlovett
Copy link
Collaborator

I think that'd be easiest. I spent some time with PATs today but didn't have as much time as I thought I would. I'll try to land that feature by Thursday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants