Skip to content

Commit 998bebb

Browse files
mytkomalibuild
authored andcommitted
PIDML: evaluate FSE + self-attention network (AliceO2Group#7162)
* PIDML evaluate FSE + self-attention network (#5) * remove detector count setting and reorder network arguments (with NaNs if detector not available) * update README.md * markdownlint changes * MegaLinter fixes (#6) * fix include missing file, the same way it was before * readd pLimits to ONNXinterface and pass it to ONNXmodel * Please consider the following formatting changes (AliceO2Group#9) * improve qaPidML according to new approach --------- Co-authored-by: ALICE Builder <alibuild@users.noreply.github.com>
1 parent 9f4b844 commit 998bebb

File tree

7 files changed

+180
-127
lines changed

7 files changed

+180
-127
lines changed

Tools/PIDML/KaonPidTask.cxx

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,6 @@ struct KaonPidTask {
6060
Configurable<std::string> cfgCCDBURL{"ccdb-url", "http://alice-ccdb.cern.ch", "URL of the CCDB repository"};
6161
Configurable<int> cfgPid{"pid", 321, "PID to predict"};
6262
Configurable<double> cfgCertainty{"certainty", 0.5, "Minimum certainty above which the model accepts a particular type of particle"};
63-
Configurable<uint32_t> cfgDetector{"detector", kTPCTOFTRD, "What detectors to use: 0: TPC only, 1: TPC + TOF, 2: TPC + TOF + TRD"};
6463
Configurable<uint64_t> cfgTimestamp{"timestamp", 0, "Fixed timestamp"};
6564
Configurable<bool> cfgUseCCDB{"useCCDB", false, "Whether to autofetch ML model from CCDB. If false, local file will be used."};
6665

@@ -85,7 +84,7 @@ struct KaonPidTask {
8584
if (cfgUseCCDB) {
8685
ccdbApi.init(cfgCCDBURL); // Initializes ccdbApi when cfgUseCCDB is set to 'true'
8786
}
88-
pidModel = std::make_shared<PidONNXModel>(cfgPathLocal.value, cfgPathCCDB.value, cfgUseCCDB.value, ccdbApi, cfgTimestamp.value, cfgPid.value, static_cast<PidMLDetector>(cfgDetector.value), cfgCertainty.value);
87+
pidModel = std::make_shared<PidONNXModel>(cfgPathLocal.value, cfgPathCCDB.value, cfgUseCCDB.value, ccdbApi, cfgTimestamp.value, cfgPid.value, cfgCertainty.value);
8988

9089
histos.add("hChargePos", ";z;", kTH1F, {{3, -1.5, 1.5}});
9190
histos.add("hChargeNeg", ";z;", kTH1F, {{3, -1.5, 1.5}});

Tools/PIDML/README.md

Lines changed: 50 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# PID ML in O2
22

3-
Particle identification is essential in most of the analyzes. The PID ML interface will help you to make use of the machine learning models to improve purity and efficiency of particle kinds for your analysis. A single model is tailored to a specific particle kind, e.g., pions with PID 211. For each track, the model returns a float value in [0, 1] which measures the ''certainty'' of the model that this track is of given kind.
3+
Particle identification is essential in most of the analyzes.
4+
The PID ML interface will help you to make use of the machine learning models to improve purity and efficiency of particle kinds for your analysis.
5+
A single model is tailored to a specific particle kind, e.g., pions with PID 211. For each track, the model returns a float value in [0, 1] which measures the ''certainty'' of the model that this track is of given kind.
46

57
## PidONNXModel
68

@@ -11,12 +13,16 @@ This class represents a single ML model from an ONNX file. It requires the follo
1113
- CCDB Api instance created in an analysis task
1214
- timestamp of the input analysis data -- neded to choose appropriate model
1315
- PID to be checked
14-
- detector setup: what detectors should be used for identification. It is described by enum PidMLDetector. Currently available setups: TPC, TPC+TOF, TPC+TOF+TRD
1516
- minimum certainty for accepting a track to be of given PID
17+
- *p* limits array - specifiying p limits for each detector configuration (TPC, TPC+TOF, TPC+TOF+TRD)
1618

17-
Let's assume your `PidONNXModel` instance is named `pidModel`. Then, inside your analysis task `process()` function, you can iterate over tracks and call: `pidModel.applyModel(track);` to get the certainty of the model. You can also use `pidModel.applyModelBoolean(track);` to receive a true/false answer, whether the track can be accepted based on the minimum certainty provided to the `PidONNXModel` constructor.
19+
Let's assume your `PidONNXModel` instance is named `pidModel`.
20+
Then, inside your analysis task `process()` function, you can iterate over tracks and call: `pidModel.applyModel(track);` to get the certainty of the model.
21+
You can also use `pidModel.applyModelBoolean(track);` to receive a true/false answer, whether the track can be accepted based on the minimum certainty provided to the `PidONNXModel` constructor.
1822

19-
You can check [a simple analysis task example](https://github.com/AliceO2Group/O2Physics/blob/master/Tools/PIDML/simpleApplyPidOnnxModel.cxx). It uses configurable parameters and shows how to calculate the data timestamp. Note that the calculation of the timestamp requires subscribing to `aod::Collisions` and `aod::BCsWithTimestamps`. For Hyperloop tests, you can set `cfgUseFixedTimestamp` to true with `cfgTimestamp` set to the default value.
23+
You can check [a simple analysis task example](https://github.com/AliceO2Group/O2Physics/blob/master/Tools/PIDML/simpleApplyPidOnnxModel.cxx).
24+
It uses configurable parameters and shows how to calculate the data timestamp. Note that the calculation of the timestamp requires subscribing to `aod::Collisions` and `aod::BCsWithTimestamps`.
25+
For Hyperloop tests, you can set `cfgUseFixedTimestamp` to true with `cfgTimestamp` set to the default value.
2026

2127
On the other hand, it is possible to use locally stored models, and then the timestamp is not used, so it can be a dummy value. `processTracksOnly` presents how to analyze on local-only PID ML models.
2228

@@ -31,10 +37,10 @@ This is a wrapper around PidONNXModel that contains several models. It has the p
3137

3238
Then, obligatory parameters for the interface:
3339
- a vector of int output PIDs
34-
- a 2-dimensional LabeledArray of *p*T limits for each PID, for each detector configuration. It describes the minimum *p*T values at which each next detector should be included for predicting given PID
40+
- a 2-dimensional LabeledArray of *p* limits for each PID, for each detector configuration. It describes the minimum *p* values at which each next detector should be included for predicting given PID
3541
- a vector of minimum certainties for each PID for accepting a track to be of this PID
3642
- boolean flag: whether to switch on auto mode. If true, then *p*T limits and minimum certainties can be passed as an empty array and an empty vector, and the interface will fill them with default configuration:
37-
- *p*T limits: same values for all PIDs: 0.0 (TPC), 0.5 (TPC + TOF), 0.8 (TPC + TOF + TRD)
43+
- *p* limits: same values for all PIDs: 0.0 (TPC), 0.5 (TPC + TOF), 0.8 (TPC + TOF + TRD)
3844
- minimum certainties: 0.5 for all PIDs
3945

4046
You can use the interface in the same way as the model, by calling `applyModel(track)` or `applyModelBoolean(track)`. The interface will then call the respective method of the model selected with the aforementioned interface parameters.
@@ -48,20 +54,49 @@ There is again [a simple analysis task example](https://github.com/AliceO2Group/
4854
Currently, only models for run 285064 (timestamp interval: 1524176895000 - 1524212953000) are uploaded to CCDB, so you can use hardcoded timestamp 1524176895000 for tests.
4955

5056
Both model and interface analysis examples can be run with a script:
57+
58+
### Script for Run2 Converted to Run3 data
59+
```bash
60+
#!/bin/bash
61+
62+
config_file="my-config.json"
63+
64+
o2-analysis-tracks-extra-converter --configuration json://$config_file -b |
65+
o2-analysis-timestamp --configuration json://$config_file -b |
66+
o2-analysis-trackextension --configuration json://$config_file -b |
67+
o2-analysis-trackselection --configuration json://$config_file -b |
68+
o2-analysis-multiplicity-table --configuration json://$config_file -b |
69+
o2-analysis-bc-converter --configuration json://$config_file -b |
70+
o2-analysis-collision-converter --configuration json://$config_file -b |
71+
o2-analysis-zdc-converter --configuration json://$config_file -b |
72+
o2-analysis-pid-tof-base --configuration json://$config_file -b |
73+
o2-analysis-pid-tof-beta --configuration json://$config_file -b |
74+
o2-analysis-pid-tof-full --configuration json://$config_file -b |
75+
o2-analysis-pid-tpc-full --configuration json://$config_file -b |
76+
o2-analysis-pid-tpc-base --configuration json://$config_file -b |
77+
o2-analysis-simple-apply-pid-onnx-model --configuration json://$config_file -b
78+
```
79+
Remember to set every setting, which states that helper task should process Run2 data to `true`.
80+
81+
### Script for Run3 data
5182
```bash
5283
#!/bin/bash
5384

5485
config_file="my-config.json"
5586

5687
o2-analysis-timestamp --configuration json://$config_file -b |
57-
o2-analysis-trackextension --configuration json://$config_file -b |
58-
o2-analysis-trackselection --configuration json://$config_file -b |
59-
o2-analysis-multiplicity-table --configuration json://$config_file -b |
60-
o2-analysis-fdd-converter --configuration json://$config_file -b |
61-
o2-analysis-pid-tof-base --configuration json://$config_file -b |
62-
o2-analysis-pid-tof-beta --configuration json://$config_file -b |
63-
o2-analysis-pid-tof-full --configuration json://$config_file -b |
64-
o2-analysis-pid-tpc-full --configuration json://$config_file -b |
65-
o2-analysis-simple-apply-pid-onnx-model --configuration json://$config_file -b
88+
o2-analysis-event-selection --configuration json://$config_file -b |
89+
o2-analysis-trackselection --configuration json://$config_file -b |
90+
o2-analysis-multiplicity-table --configuration json://$config_file -b |
91+
o2-analysis-track-propagation --configuration json://$config_file -b |
92+
o2-analysis-pid-tof-base --configuration json://$config_file -b |
93+
o2-analysis-pid-tof-beta --configuration json://$config_file -b |
94+
o2-analysis-pid-tof-full --configuration json://$config_file -b |
95+
o2-analysis-pid-tpc-full --configuration json://$config_file -b |
96+
o2-analysis-pid-tpc-base --configuration json://$config_file -b |
97+
o2-analysis-simple-apply-pid-onnx-model --configuration json://$config_file -b
6698
```
99+
Remember to set every setting, which states that helper task should process Run3 data to `true`.
100+
101+
67102
Replace "model" with "interface" in the last line if you want to run the interface workflow.

Tools/PIDML/pidOnnxInterface.h

Lines changed: 10 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -36,25 +36,25 @@ auto certainties_v = std::vector<double>{certainties, certainties + nPids};
3636

3737
// default values for the cuts
3838
constexpr double cuts[nPids][nCutVars] = {{0.0, 0.5, 0.8}, {0.0, 0.5, 0.8}, {0.0, 0.5, 0.8}, {0.0, 0.5, 0.8}, {0.0, 0.5, 0.8}, {0.0, 0.5, 0.8}};
39-
4039
// row labels
4140
static const std::vector<std::string> pidLabels = {
4241
"211", "321", "2212", "0211", "0321", "02212"};
4342
// column labels
4443
static const std::vector<std::string> cutVarLabels = {
4544
"TPC", "TPC + TOF", "TPC + TOF + TRD"};
45+
4646
} // namespace pidml_pt_cuts
4747

4848
struct PidONNXInterface {
49-
PidONNXInterface(std::string& localPath, std::string& ccdbPath, bool useCCDB, o2::ccdb::CcdbApi& ccdbApi, uint64_t timestamp, std::vector<int> const& pids, o2::framework::LabeledArray<double> const& pTLimits, std::vector<double> const& minCertainties, bool autoMode) : mNPids{pids.size()}, mPTLimits{pTLimits}
49+
PidONNXInterface(std::string& localPath, std::string& ccdbPath, bool useCCDB, o2::ccdb::CcdbApi& ccdbApi, uint64_t timestamp, std::vector<int> const& pids, o2::framework::LabeledArray<double> const& pLimits, std::vector<double> const& minCertainties, bool autoMode) : mNPids{pids.size()}, mPLimits{pLimits}
5050
{
5151
if (pids.size() == 0) {
5252
LOG(fatal) << "PID ML Interface needs at least 1 output pid to predict";
5353
}
5454
std::set<int> tmp;
5555
for (auto& pid : pids) {
5656
if (!tmp.insert(pid).second) {
57-
LOG(fatal) << "PID M Interface: output pids cannot repeat!";
57+
LOG(fatal) << "PID ML Interface: output pids cannot repeat!";
5858
}
5959
}
6060

@@ -68,9 +68,7 @@ struct PidONNXInterface {
6868
minCertaintiesFilled = minCertainties;
6969
}
7070
for (std::size_t i = 0; i < mNPids; i++) {
71-
for (uint32_t j = 0; j < kNDetectors; j++) {
72-
mModels.emplace_back(localPath, ccdbPath, useCCDB, ccdbApi, timestamp, pids[i], (PidMLDetector)(kTPCOnly + j), minCertaintiesFilled[i]);
73-
}
71+
mModels.emplace_back(localPath, ccdbPath, useCCDB, ccdbApi, timestamp, pids[i], minCertaintiesFilled[i], mPLimits[i]);
7472
}
7573
}
7674
PidONNXInterface() = default;
@@ -84,12 +82,8 @@ struct PidONNXInterface {
8482
float applyModel(const T& track, int pid)
8583
{
8684
for (std::size_t i = 0; i < mNPids; i++) {
87-
if (mModels[i * kNDetectors].mPid == pid) {
88-
for (uint32_t j = 0; j < kNDetectors; j++) {
89-
if (track.pt() >= mPTLimits[i][j] && (j == kNDetectors - 1 || track.pt() < mPTLimits[i][j + 1])) {
90-
return mModels[i * kNDetectors + j].applyModel(track);
91-
}
92-
}
85+
if (mModels[i].mPid == pid) {
86+
return mModels[i].applyModel(track);
9387
}
9488
}
9589
LOG(error) << "No suitable PID ML model found for track: " << track.globalIndex() << " from collision: " << track.collision().globalIndex() << " and expected pid: " << pid;
@@ -100,12 +94,8 @@ struct PidONNXInterface {
10094
bool applyModelBoolean(const T& track, int pid)
10195
{
10296
for (std::size_t i = 0; i < mNPids; i++) {
103-
if (mModels[i * kNDetectors].mPid == pid) {
104-
for (uint32_t j = 0; j < kNDetectors; j++) {
105-
if (track.pt() >= mPTLimits[i][j] && (j == kNDetectors - 1 || track.pt() < mPTLimits[i][j + 1])) {
106-
return mModels[i * kNDetectors + j].applyModelBoolean(track);
107-
}
108-
}
97+
if (mModels[i].mPid == pid) {
98+
return mModels[i].applyModelBoolean(track);
10999
}
110100
}
111101
LOG(error) << "No suitable PID ML model found for track: " << track.globalIndex() << " from collision: " << track.collision().globalIndex() << " and expected pid: " << pid;
@@ -116,12 +106,12 @@ struct PidONNXInterface {
116106
void fillDefaultConfiguration(std::vector<double>& minCertainties)
117107
{
118108
// FIXME: A more sophisticated strategy should be based on pid values as well
119-
mPTLimits = o2::framework::LabeledArray{pidml_pt_cuts::cuts[0], pidml_pt_cuts::nPids, pidml_pt_cuts::nCutVars, pidml_pt_cuts::pidLabels, pidml_pt_cuts::cutVarLabels};
109+
mPLimits = o2::framework::LabeledArray{pidml_pt_cuts::cuts[0], pidml_pt_cuts::nPids, pidml_pt_cuts::nCutVars, pidml_pt_cuts::pidLabels, pidml_pt_cuts::cutVarLabels};
120110
minCertainties = std::vector<double>(mNPids, 0.5);
121111
}
122112

123113
std::vector<PidONNXModel> mModels;
124114
std::size_t mNPids;
125-
o2::framework::LabeledArray<double> mPTLimits;
115+
o2::framework::LabeledArray<double> mPLimits;
126116
};
127117
#endif // TOOLS_PIDML_PIDONNXINTERFACE_H_

0 commit comments

Comments
 (0)