-
Notifications
You must be signed in to change notification settings - Fork 0
Home
The CWoLa method is applied to discriminate the hadronic decays of top quarks and antiquarks.
This framework was split from the analysis framework, and as such shares many attributes.
Follow these instructions to replicate the training procedure for any myriad of reasons (different physics objects, definitions, working points, samples, etc.).
Cheetah is setup to generate flat ntuples from C++ framework that can be passed
into the python framework using uproot.
The two languages are used because of the tools available in their respective ecosystems:
C++ (speed, ROOT libraries), Python (advanced ML tools).
A CMSSW environment is used to generate samples for training while a
non-CMSSW environment to perform the actual training using
Keras + Tensorflow / PyTorch.
As such, the following guidelines reflect the author's setup.
Ideally, this setup can be easily extended for any user's purpose.
Using the CMSSW and C++ environment, Cheetah is used to prepare ntuples specifically for training. The input ntuples are flat ntuples prepared by the analysis. (The flat ntuples from C++ framework can be passed into the python framework using uproot.)
The two languages are used because of the tools available in their respective ecosystems: C++ (speed, ROOT libraries) & Python (advanced ML tools).
Cheetah builds the Event
for each entry in the ROOT file.
Physics objects (AK8/AK4/leptons/MET) are defined as structs within the framework (interface/physicsObjects.h
).
The Event
object is passed to other classes (histogrammer
, eventSelection
, etc.) that need information from the event.
DESIGN PHILOSOPHY: Classes that use information from the event to generate new information, e.g., kinematic reconstruction, should be called from the Event
class. To achieve this, pass the external classes structs of necessary information, then return the new object back to the Event
class. Thus, users can access all 'event-level' information from the Event
class and do not need to instantiate extra tools in running macros.
Running macros, to perform the event loop, are stored in the bin/
directory.
These macros outline the basic setup of the configuration, file loop, TTree loop (if necessary), and event loop.
The event selection and information for output files are also declared in this script.
Before running, confirm the options in config/training.txt
(or your custom configuration file) are appropriate!
To execute the framework:
$ source setup.csh
$ run_training config/training.txt
- The steering macro (see
bin/
) first initializes and sets the configurations- Declare settings/objects that are 'global' to all files being processed
- File loop
- Prepare output that is file-specific
- Initialize output file, cutflow histograms, efficiencies, etc.
- Prepare output that is file-specific
- TTree loop (for input files that have physics information in multiple TTrees)
- Declare objects that are 'global' to all events in the tree:
-
Event
object - output ttree
- histograms & efficiencies
-
- Declare objects that are 'global' to all events in the tree:
- Event Loop
- Build the
Event
object (jets, leptons, extras e.g., kinematic reconstruction) - Apply a selection(s), if desired
- Save information to TTree & histograms
- Build the
Different classes are used to achieve this workflow, and each one can be modified or extended by the user (inherit from these classes to build your own!).
Class | About |
---|---|
configuration | Class that contains all information for organization. Multiple functions that return basic information as well |
Event | Class that contains all of the information from the event -> loads information from TTree and re-organizes information into structs & functions, calculates weights, etc. |
eventSelection | Class to apply custom event selection (defined by user) |
histogrammer | Class for generating histograms (interface between TH1/TH2 and Cheetah) |
miniTree | Class used exclusively for generating flat ntuples that are used in machine learning contexts |
tools | Collection of functions for doing simple tasks common to different aspects of Cheetah |
truthMatching | Class for determining the matching between truth and reconstructed objects |
Class | About |
---|---|
deepLearning | Class for handling the training/inference for machine learning tasks. The training aspect only prepares inputs as it is assumed training is done in a python environment. |
Class | About |
---|---|
ttbarReco | General reconstruction of the AK8+AK4 system + quality criteria |
If you add directories to the framework, ensure they will be compiled by checking BuildFile.xml
and bin/BuildFile.xml
.
If there are issues, it may be necessary to clean the directory and re-compile everything: scram b clean
It is also possible to submit batch jobs using the script python/submitBatchJobs.py
with the text file batch.txt
. For more information, please see the wiki page for batch jobs.
The actual training of the NN is performed in a python environment outside of CMSSW using the packages
Asimov + HEP Plotter.
The uproot package loads information from the ROOT file, prepared in the previous step,
into a Pandas dataframe that is then easily used in the framework.
Scripts | Description |
---|---|
python/runAsimov.py |
Steering script that determines what options are set and what order to call the functions. |
python/plotlabels.py |
Labels (colors and binning) for samples and variables |
The relevant Asimov classes are called to perform all the training and plot making (Asimov works as an interface between HEP data and Keras).
LWTNN: Deep learning in C++ (use models generated from python tools in C++)
deepLearning.cxx
A std::map<std::string,double>
is created where the keys represent the different variables used in the training.
For each AK8, the map is filled with new values and the lwtnn tool predicts the DNN score.
For more information please submit an issue or PR.