Skip to content
berndbischl edited this page Nov 13, 2014 · 13 revisions

How to use BatchJobs on LIDO

  • Get a LIDO account for yourself. Look here and here
  • Send a mail to Sebastian Krey to get included into our lido-users mailing list. Sometimes Bernd and Michel post relevant infos here. If you have questions, MAIL TO THIS LIST and not to Bernd or individually.
  • Understand how software and modules work on LIDO. Here is the list of installed stuff. Most useful software must be loaded via module commands.
    • module avail lists all available modules for the user.
    • module list lists all currently activated modules for the user.
    • module add modul1 [modul2 ...] activates modul modul1.
    • module remove modul1 [modul2 ...] removes modul modul1.
    • module purge removes all activated modules.
    • To be able to work with the queuing system you have to load the torque and maui modules via module add torque maui On the slaves, modules are loaded via lido.tmpl (see below) automatically, so you don't need to do this. I have these lines in my .bashrc. You probably want to have those as well.
case "`hostname`" in
  lidong[12])
    module add python/2.7.2
    module add torque maui
    module add subversion
    module add git

    module add binutils
    module add gotoblas/shared/64/1.26
    module add gcc/4.8.1
    module add R/3.0.1-gcc48-base

    alias myjobs='qstat -u $USER'
    TERM="xterm-256color"
    ;;
esac
  • Log into Lido head lidong1.itmc.tu-dortmund.de per SSH.
  • Install BatchJobs and BatchExperiments from CRAN.
  • Understand what queues exist on LIDO, what resources exist and so on by reading this wiki page.
  • Read and understand the documentation header of /home/bischl/lido.tmpl, to understand what job resources are available and how they work: less /home/bischl/lido.tmpl
  • Read the configuration documentation. Then create a valid config file in your home directory, so at ~/.BatchJobs.R. Here is a template:
cluster.functions = makeClusterFunctionsTorque("/home/bischl/lido.tmpl")
mail.start = "first+last"
mail.done = "first+last"
mail.error = "all"
mail.from = "<me@lidong1.itmc.tu-dortmund.de>"
mail.to = "<me@statistik.tu-dortmund.de>"
mail.control = list(smtpServer="mail.statistik.tu-dortmund.de")

default.resources = list(
  R = "R-3.0.1-gcc-4.8.1",
  modules = "",
  walltime = 3600L,
  memory = 2048L,
  # parcpus is mapped to Torque resource 'nodes', better use this name,
  # so you dont have to change anything when you use our SLURM cluster
  parcpus = 1L
)

staged.queries = TRUE
debug = FALSE

If you want to use event emails, the sender address does not matter and does not need to exist. But your receiver address must be valid of course. I think you need a @statistik mail address. Or figure out which SMTP server to use.

You should probably upgrade the R version in the default resources when LIDO installs new R versions and it should probably correspond to the R version you use on the master node.

  • DO NOT change the first line in the config template above and DO NOT COPY the lido.tmpl to your local home dir or create your own. It is very likely that you do not understand enough details of the system to do this properly. Copying it will prevent you from getting nice updates from Bernd.
  • Run a simple batchMap example. For the first try you should probably set debug = TRUE in the config, so you can better understand errors. If everything works, set debug back to FALSE.
  • On the bash console, this stuff is useful: * qstat will display all jobs * qstat -u $USER will display your jobs (or define myjobs in .bashrc) * kill-all-jobs will kill ALL of your jobs. It is a python script by Bernd in ~bischl/bin. * show-queues displays a nice, alternative status overview of the queues and your jobs. It is not perfect but mainly gets the job done. It is a python script by Bernd in ~bischl/bin. * show-active-users displays a nice, alternative status overview of what users currently do. It is not perfect but mainly gets the job done. It is an R/shell script by Bernd in ~bischl/bin. * You can use the scripts in Bernd's bin folder by adding this line to your .bashrc: PATH=$PATH:/home/bischl/bin
  • R packages must be installed and managed by yourself.

If you ever need to update the Rmpi package, you should do this:

  • Download new targz from CRAN with wget
  • module add . Look up the module name in lido.tmpl.
  • R CMD INSTALL Rmpi_0.6-5.tar.gz --configure-args=--with-mpi=/sysdata/shared/sfw/openmpi/gcc4.8.x/64/1.6.4
  • Of course you need to adjust the names / paths in the last command

How to use BatchJobs with our SLURM cluster

  • Read the additional documentation provided by Sebastian.
  • Send a mail to Sebastian Krey to get included into our lido-users mailing list and get access to the cluster. Sometimes Bernd and Michel post relevant infos here. If you have questions, MAIL TO THIS LIST and not to Bernd individually.
  • Log into shell.statistik.tu-dortmund.de per SSH.
  • Get an interactive job for a few hours by typing: interactive.
  • Read and understand the documentation header of dortmund_fk_statistik.tmpl, to understand what job resources are available and how they work: less /opt/R/BatchJobs/dortmund_fk_statistik.tmpl
  • Read the configuration documentation. Then create a valid config file in your home directory, so at ~/.BatchJobs.R. Here is a template:
cluster.functions = makeClusterFunctionsSLURM("/opt/R/BatchJobs/dortmund_fk_statistik.tmpl")
mail.start = "first+last"
mail.done = "first+last"
mail.error = "all"
mail.from = "<me@shell>"
mail.to = "<me@statistik.tu-dortmund.de>"
mail.control = list(smtpServer="mail.statistik.tu-dortmund.de")

default.resources = list(
  walltime = 3600L,
  memory = 512L,
  # parcpus is mapped to SLURM resource 'ntasks', better use this name,
  # so you dont have to change anything when you use our LIDO cluster
  parcpus = 1L,
  ncpus = 1L
)

staged.queries = TRUE
max.concurrent.jobs = 450
debug = FALSE

If you want to use event emails, the sender address does not matter and does not need to exist. But your receiver address must be valid of course. You need a @statistik or Unimail address. Alternatively figure out which SMTP server and login data to use for a different mail provider.

  • DO NOT change the first line in the config template above and DO NOT COPY the dortmund_fk_statistik.tmpl to your local home dir or create your own. It is very likely that you do not understand enough details of the system to do this properly. Copying it will prevent you from getting updates from Sebastian.
  • Run a simple batchMap example. For the first try you should probably set debug = TRUE in the config, so you can better understand errors. If everything works, set debug back to FALSE.
  • On the bash console, this stuff is useful:
    • squeue will display all jobs
    • squeue -u $USER will display your jobs
    • kill_all_jobs will kill ALL of your jobs (except for the interactive ones)
Clone this wiki locally