CMS - Adding cross-section values to the datasets #3454

katilp · 2023-10-23T08:46:10Z

Add the cross-section values to 2015 MC datasets, as computed with XGenAnalyzer by @Ari-mu-I

The values are in the logs, e.g.
(link updated) https://github.com/Ari-mu-l/OpenData/blob/main/GenXSecAnalyzer/logs/StandardModelPhysics/Drell-Yan/xsec_16414.log#L1629-L1643

should also available in /eos/user/s/sxiaohe/OpenData/MC2015/StandardModelPhysics/
(although access not permitted, so either ask for permissions or get them through git clone of the repository)

Follow the guidelines at https://cms-opendata-releaseguide.docs.cern.ch/adding_metadata/

Agree with @jmhogan and @Ari-mu-I what are the values to be displayed
Extract them from the logs
Add them to the metadata, agree with @tiborsimko on the structure
- (see the earlier draft in 2015 script for the metadata structure, but note that the actual values are not reliable in the location where this code would read them - that's why the new logs have been generated)

jmhogan · 2023-10-24T14:23:04Z

Here's an idea for text on the record pages:

"For pp collisions at X TeV, this sample has a cross section of (TOTAL +/- UNCERTAINTY) pb, calculated using the method described HERE (link to #3455).

This cross section takes into account a matching efficiency of MATCH and a filtering efficiency of FILTER, based on generator settings and/or filters. If this sample was generated at NLO, it has FRACTION% events with negative weights."

The README of this Github repository explains what you'll find in the json files: https://github.com/Ari-mu-l/OpenData/tree/main. There is more information available than I think we really need to put on the record pages right now, but we can iterate.

And the jsons are in CERNbox here: /eos/user/s/sxiaohe/OpenData/MC2015/ (Xiaohe is making sure this is public)

jmhogan · 2023-10-24T18:26:04Z

Here's a public CERNbox link for the json files: https://cernbox.cern.ch/s/EHpyrdJet939vGy

nancyhamdan · 2023-11-03T20:04:29Z

I developed the script to extract the cross-section values and add them to the metadata of their corresponding datasets (see this PR) and I discovered the following using the script:

Total number of cross-section values json files: 544, Total number of amended datasets: 544
Total number of datasets amended using Format 1: 1
Total number of datasets amended using Format 2: 371
Total number of datasets amended using Format 3: 0
Total number of datasets amended using Format 4: 0
Total number of datasets amended using Format 5: 0
Total number of datasets amended using Format 6: 172

I could only amend 544 datasets using the 544 json files, but from the open data portal I can see that there is a total of 546 datasets listed under the Standard Model Physics category of the CMS 2015 simulated datasets, so I think the json files could be missing two datasets?

There was only one json file following Format 1 that is explained in the README here and it is the only format that has the matching efficiency value. Also, the fraction of events with negative weights value can only be found in formats 1 and 3 but there are no json files following Format 3, so almost all amended datasets would be missing these two values.

katilp · 2023-11-09T08:33:53Z

@riga, we will amend the json schema for the OD datasets with the cross-sections fields. It would be a good moment to check that we follow the same naming conventions. On your side, will you use the naming from McM? I.e.

(updated this to show Nancy's PR)

                record["cross_section"]["total_value"] = cross_sections_json_data["totX_final"]
                record["cross_section"]["total_value_uncertainty"] = cross_sections_json_data["totX_final_err"]
                record["cross_section"]["matching_efficiency"] = ""
                record["cross_section"]["filter_efficiency"] = cross_sections_json_data["filterEff(weights)"]
                record["cross_section"]["neg_weight_fraction"] = ""

NB, we do not read them from McM but from the output of XGenAnalyzer that we run ourselves for OD MC datasets.

katilp · 2024-01-14T17:52:06Z

@nancyhamdan Please remind us if there's still something open/unclear on this issue. Thanks!

katilp · 2024-02-14T17:14:05Z

@tiborsimko : the script from cernopendata/data-curation#210 works fine.

What do we need to amend to display values and add the text drafted above?

Is it here?

katilp · 2024-02-19T14:42:49Z

Update the json schema with the cross-section values.

@jmhogan Should we foresee having recommended cross-section values in the dataset json machine-readable?
For the moment, there's no place to extract them but there might be in the future.

Now we have e.g.:

    "cross_section": {
      "filter_efficiency": "2.113e-03",
      "matching_efficiency": "",
      "neg_weight_fraction": "",
      "total_value": "1.657e+08",
      "total_value_uncertainty": "1.019e+05"
    },

for the generator-level values. We could foresee another set for the recommended values. Not to be filled now but to have it ready when we will be able to get these values programmatically (ongoing work at CAT).

@tiborsimko, what would be a preferred json structure? Just two different objects?

jmhogan · 2024-02-21T19:13:15Z

@katilp in principle having that slot seems fine. There is probably no way to fill it other than some name-string-matching script...

Adds cross-section values to 2015 MC records. Modifies the record template templates/cernopendata_records_ui/records/record_detail.html so that the values get displayed with the text suggested in #3454.

Adds cross-section values to 2015 MC records. Adds cross section field information to JSON Schema. Modifies the detailed record template so that the values get displayed with the text suggested in #3454. Co-authored-by: Tibor Šimko <tibor.simko@cern.ch>

katilp · 2024-03-25T07:56:59Z

For the record:

PR records: add xsec values for 2015 MC and update record template #3581 only added the xsec values for the processes under MC2015 -> StandardModelPhysics
the values for the processes under MC2015 -> HiggsPhysics need to be added: run the scripts with input -c ../MC2015/HiggsPhysics as well (with MC2015 locally)

Similarly, add values in MC2016

get the directory MC2016 from https://cernbox.cern.ch/files/link/public/EHpyrdJet939vGy/MC2016 locally
run the scripts with -c ../MC2015/HiggsPhysics and -c ../MC2015/StandardModelPhysics (but make sure that the final modified json gets the modifications from both runs)

katilp assigned nancyhamdan and joudmas Oct 23, 2023

This was referenced Oct 23, 2023

Additional information to be added for CMS MC datasets #1137

Open

CMS - add a guide page for CMS cross-section values #3455

Closed

katilp added Topic: records Type: enhancement Experiment: CMS CMS: MC cross sections labels Oct 23, 2023

nancyhamdan mentioned this issue Nov 3, 2023

utils: update_fixtures_cross_sections.py cernopendata/data-curation#210

Merged

katilp mentioned this issue Feb 16, 2024

records: add xsec values for 2015 MC and update record template #3581

Merged

katilp unassigned nancyhamdan and joudmas Mar 25, 2024

katilp mentioned this issue Apr 8, 2024

CMS - debug cross-section utility script for case HiggsPhysics/StandardModelPhysics cernopendata/data-curation#231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMS - Adding cross-section values to the datasets #3454

CMS - Adding cross-section values to the datasets #3454

katilp commented Oct 23, 2023 •

edited

Loading

jmhogan commented Oct 24, 2023

jmhogan commented Oct 24, 2023

nancyhamdan commented Nov 3, 2023 •

edited

Loading

katilp commented Nov 9, 2023 •

edited

Loading

katilp commented Jan 14, 2024

katilp commented Feb 14, 2024 •

edited

Loading

katilp commented Feb 19, 2024

jmhogan commented Feb 21, 2024

katilp commented Mar 25, 2024

CMS - Adding cross-section values to the datasets #3454

CMS - Adding cross-section values to the datasets #3454

Comments

katilp commented Oct 23, 2023 • edited Loading

jmhogan commented Oct 24, 2023

jmhogan commented Oct 24, 2023

nancyhamdan commented Nov 3, 2023 • edited Loading

katilp commented Nov 9, 2023 • edited Loading

katilp commented Jan 14, 2024

katilp commented Feb 14, 2024 • edited Loading

katilp commented Feb 19, 2024

jmhogan commented Feb 21, 2024

katilp commented Mar 25, 2024

katilp commented Oct 23, 2023 •

edited

Loading

nancyhamdan commented Nov 3, 2023 •

edited

Loading

katilp commented Nov 9, 2023 •

edited

Loading

katilp commented Feb 14, 2024 •

edited

Loading