Tools for running the CMS Higgs to Two Photons Analysis on NanoAOD
Example worfkflow for drell-yan studies is included.
Each workflow can be a separate "processor" file, creating the mapping from NanoAOD to
the histograms we need. Workflow processors can be passed to the runner.py
script
along with the fileset these should run over. Multiple executors can be chosen
(for now iterative - one by one, uproot/futures - multiprocessing and dask-slurm).
To run the example, run:
python runner.py --workflow dystudies
Example plots can be found in make_some_plots.ipynb
though we might want to make
that more automatic in the end.
For installing Miniconda, see also https://hackmd.io/GkiNxag0TUmHnnCiqdND1Q#Local-or-remote
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Run and follow instructions on screen
bash Miniconda3-latest-Linux-x86_64.sh
NOTE: always make sure that conda, python, and pip point to local Miniconda installation (which conda
etc.).
You can either use the default environmentbase
or create a new one:
# create new environment with python 3.7, e.g. environment of name `coffea`
conda create --name coffea python=3.7
# activate environment `coffea`
conda activate coffea
Install coffea, xrootd, and more:
conda install -c conda-forge coffea # pip install git+https://github.com/CoffeaTeam/coffea.git # for bleeding edge
conda install -c conda-forge xrootd
conda install -c conda-forge ca-certificates
conda install -c conda-forge ca-policy-lcg
conda install -c conda-forge dask-jobqueue
conda install -c anaconda bokeh
conda install -c conda-forge 'fsspec>=0.3.3'
conda install dask
See https://coffeateam.github.io/coffea/installation.html
See also https://hackmd.io/GkiNxag0TUmHnnCiqdND1Q#Remote-jupyter
- On your local machine, edit
.ssh/config
:
Host lxplus*
HostName lxplus7.cern.ch
User <your-user-name>
ForwardX11 yes
ForwardAgent yes
ForwardX11Trusted yes
Host *_f
LocalForward localhost:8800 localhost:8800
ExitOnForwardFailure yes
- Connect to remote with
ssh lxplus_f
- Start a jupyter notebook:
jupyter notebook --ip=127.0.0.1 --port 8800 --no-browser
- URL for notebook will be printed, copy and open in local browser
Scale out can be notoriously tricky between different sites. Coffea's integration of slurm
and dask
makes this quite a bit easier and for some sites the ``native'' implementation is sufficient, e.g Condor@DESY.
However, some sites have certain restrictions for various reasons, in particular Condor @CERN and @FNAL.
Follow setup instructions at https://github.com/CoffeaTeam/lpcjobqueue, run them from within the hgg-coffea
directory that you have checked out.
After starting the singularity container run a test with
python runner.py --meta Era2017_legacy_v1.json --wf dystudies -d root://cmseos.fnal.gov//store/user/$USER/hgg_test/ --executor dask/lpc --samples filefetcher/dystudies.json --chunk=100000 --max=5
Only one port is available per node, so its possible one has to try different nodes until hitting
one with 8786
being open. Other than that, no additional configurations should be necessary.
python runner.py --wf dystudies --executor dask/lxplus
Coffea-casa is a JupyterHub based analysis-facility hosted at Nebraska. For more information and setup instuctions see https://coffea-casa.readthedocs.io/en/latest/cc_user.html
After setting up and checking out this repository (either via the online terminal or git widget utility run with
python runner.py --wf dystudies --executor dask/casa
Authentication is handled automatically via login auth token instead of a proxy. File paths need to replace xrootd redirector with "xcache", runner.py
does this automatically.
python runner.py --meta Era2017_legacy_v1.json --wf dystudies -d root://cmseos.fnal.gov//store/user/$USER/hgg_test/ --executor dask/lpc --samples filefetcher/dystudies.json --chunk=100000 --scaleout 1 --limit 2 --only DYJets-M50 --ts DummyTagger1 DummyTagger2