-
Notifications
You must be signed in to change notification settings - Fork 12
Description
P:5
I just got off from a phone call with Geoffrey and he likes that we use the virtual cluster command from cloudmesh
I view this as a high priority item and several people may have to work on this. I know that Hyungro has worked on this before so we need his input
Here I see how we get there
a)
- make sure vm rename works we may need that to align names of multiple vms across different IaaS
- make sure we can do a key upload to multiple clouds with the same keyname to all of these clouds
- make sure we can boot vms with the key
- make sure we can log into the vms
b) Mangirish/Gourav we need a monitoring ability which lets us monitor the status of the vms in a virtual cluster
- e.g. we should be able to verify if we can
- ping
- login
- execute something
- run some defined test
c) Hyungro: we used to have the cm cluster command that we need to check if its still working
- this one creates securly keys among each other so we can log in between each other
- we may have different kinds of clusters
a) one in private network
b) one in public networks
c) heterogeneous clusters on multiple IaaS with same OS and on public network
so we can talk between nodes
- demonstrate how to create a cluster
- have some test to see if the cluster works
- demonstrate how to add new node to the cluster or remove dead nodes
e.g. cluster is dynamic
cm cluster -n 4 mycluster
creates a cluster mycluster
cm cluster add -n 3 mycluster
adds 3 more nodes to the cluster
cm cluster list
lists the cluster
c) Mangirish/Gourav/Fugang/Allan: Naturally we have yet another important concept and that is setting up a batch system on
such a cluster once it became available.
Lets assume I have all ip addresses of a virtual cluster as defined above and i can login from one to the other node.
The next task is to set up a SLURM cluster on these nodes. Ideally we will want ansible to setup this cluster.