Description
This is something I'm working on so SHPC integrates better with our cluster tech stack. I wanted to post about it here for feedback on the overarching plan and see if it would be something that could be an "accessory" to SHPC once it's polished (nothing impacts core behavior and it would likely go in its own repository).
OpenHPC is a collection of building blocks and a design philosophy for deploying HPC clusters. OHPC provides RPM-based software arranged into a particular structure that is easy to deploy across nodes in a cluster (e.g., apps are in /opt/ohpc/pub/apps
). OHPC is very opinionated with how things are deployed to make sure software is consistent and compatible.
I think SHPC can fit into the OHPC ecosystem quite easily and provide another avenue for integrating container-based software into OHPC clusters. In addition, there are a LOT of OHPC based clusters out there, and easy deployment for an OHPC system would mean a wider audience for SHPC. Here are the aspects that I've thought of so far.
- RPM-based deployment of SHPC. I have a working RPM for deploying SHPC on our clusters already, though it's a bit "janky" (technical term ;) ). This needs to be cleaned up significantly.
- Integration of SHPC views into OHPC moduledeps. OHPC provides several combinations of compilers and mpi stacks (e.g., gnu9 with openmpi4). On our current clusters, we are manually loading views for various compiler / mpi stacks, but loading the views can easily be stuck in the appropriate lmod files for the compiler / mpi stack (like here). Unfortunately those lmod files are created from the official OHPC RPMs so it's not really possible to make the modification automatic, so it may be easier to put the view logic in the SHPC rpm and have builds for different compiler / mpi stacks (though it's not actually compiler / mpi dependent). But then we have redundancy. Clearly I need to think about this point more :)
- Setup of user-space views. SHPC is already really friendly for doing stuff in user space, but I'd like to figure out how to have it automatically set up user-space views when pulling containers (that have no dependencies or an mpi or gpu dependency, for example). May require patching in the RPM.
- OHPC friendly registry. I'd like to be able to pull an appropriate mpi / gpu based container based on the loaded view without the user having to think about it. Ultimately, I'd like users to be able to easily control their own containers so our admins don't have to do it cluster-wide and so users don't have to set up shpc themselves and think about mpi / gpu compatibility.
Any thoughts? Glaring pitfalls?
I'll update this issue as I get things working!