-
Notifications
You must be signed in to change notification settings - Fork 1
NYU_cluster
You can find NYU's documentation here. Click Here to request an account.
If you are connected to the NYU network, you can ssh the cluster directly:
ssh <NYU_NetID>@greene.hpc.nyu.edu
Otherwise, you need to go through the special gateway server.
ssh <NYU_NetID>@gw.hpc.nyu.edu
ssh <NYU_NetID>@greene.hpc.nyu.edu
You can set up a ssh tunnel by following this documentation.
You can download and install Miniconda with:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
When it asks whether to put Miniconda to your .bashrc, say yes.
Do you wish the installer to prepend the Miniconda install location
to PATH in your /home/<netid>/.bashrc ? [yes|no]
Then, source .bashrc
and Conda should be available. You can try conda list
.
You can learn how to manage Conda environments here. On Prince cluster, I noticed that conda activate myenv
needs to be replace by source activate myenv
in sbatch files. I need to check if it is still the case for Greene.
- Submitting jobs with sbatch
- If you only want to access a node:
srun --pty /bin/bash
orsrun --gres gpu:1 --pty /bin/bash
. - You can see your jobs with
watch squeue -u <netid>
- Dask can be a very usefull toof if you want to run a lot of similar jobs. Here is an example:
from dask_jobqueue import SLURMCluster
from dask.distributed import Client
import itertools
from mycode import main_exp
learning_rate = [1e-3, 1e-4]
batch_size = [10, 100, 1000]
n_layers = [1, 2, 3, 4]
parallel_args = []
for (bs, lr, layer) in itertools.product(learning_rate, batch_size, n_layers):
args = {"lr": lr, "bs":bs, "layer":layer}
parallel_args.append(args)
env_extra = ['source activate myenv']
if __name__=='__main__':
cluster = SLURMCluster(job_extra=['--cpus-per-task=1', '--ntasks-per-node=1'],
cores=1, processes=1,
memory='16GB',
walltime='96:00:00',
interface='ib0',
log_directory='log_dask',
local_directory='log_dask')
n_workers = 4
cluster.scale(n_workers)
client = Client(cluster)
print(client.cluster)
results = [client.submit(main_exp, args) for args in parallel_args]
print(results)
print(client.gather(results))
- Link to solo12 Demo
All our open source software are licensed against the BSD 3-clause license.
Copyright 2016-2020, New York University and Max Planck Gesellschaft.
- Home Page
- Contribute to the wiki
- Logo
- Introduction
- Our Codebase
- Build Our Codebase
- Build tools introduction
- Build chain tutorials
- Dependencies
- Building our software stack
- Troubleshooting
- Robot Tutorials
- Programming
- ODRI Robots
- MicroDrivers
- Solo12
- Bolt
- NYUFinger
- Kuka
- Debugging