Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS - Adding cross-section values to the datasets #3454

Open
3 tasks done
katilp opened this issue Oct 23, 2023 · 9 comments
Open
3 tasks done

CMS - Adding cross-section values to the datasets #3454

katilp opened this issue Oct 23, 2023 · 9 comments

Comments

@katilp
Copy link
Member

katilp commented Oct 23, 2023

Add the cross-section values to 2015 MC datasets, as computed with XGenAnalyzer by @Ari-mu-I

The values are in the logs, e.g.
(link updated) https://github.com/Ari-mu-l/OpenData/blob/main/GenXSecAnalyzer/logs/StandardModelPhysics/Drell-Yan/xsec_16414.log#L1629-L1643

should also available in /eos/user/s/sxiaohe/OpenData/MC2015/StandardModelPhysics/
(although access not permitted, so either ask for permissions or get them through git clone of the repository)

Follow the guidelines at https://cms-opendata-releaseguide.docs.cern.ch/adding_metadata/

  • Agree with @jmhogan and @Ari-mu-I what are the values to be displayed

  • Extract them from the logs

  • Add them to the metadata, agree with @tiborsimko on the structure
    - (see the earlier draft in 2015 script for the metadata structure, but note that the actual values are not reliable in the location where this code would read them - that's why the new logs have been generated)

@jmhogan
Copy link
Contributor

jmhogan commented Oct 24, 2023

Here's an idea for text on the record pages:

"For pp collisions at X TeV, this sample has a cross section of (TOTAL +/- UNCERTAINTY) pb, calculated using the method described HERE (link to #3455).

This cross section takes into account a matching efficiency of MATCH and a filtering efficiency of FILTER, based on generator settings and/or filters. If this sample was generated at NLO, it has FRACTION% events with negative weights."

The README of this Github repository explains what you'll find in the json files: https://github.com/Ari-mu-l/OpenData/tree/main. There is more information available than I think we really need to put on the record pages right now, but we can iterate.

And the jsons are in CERNbox here: /eos/user/s/sxiaohe/OpenData/MC2015/ (Xiaohe is making sure this is public)

@jmhogan
Copy link
Contributor

jmhogan commented Oct 24, 2023

Here's a public CERNbox link for the json files: https://cernbox.cern.ch/s/EHpyrdJet939vGy

@nancyhamdan
Copy link
Member

nancyhamdan commented Nov 3, 2023

I developed the script to extract the cross-section values and add them to the metadata of their corresponding datasets (see this PR) and I discovered the following using the script:

Total number of cross-section values json files: 544, Total number of amended datasets: 544
Total number of datasets amended using Format 1: 1
Total number of datasets amended using Format 2: 371
Total number of datasets amended using Format 3: 0
Total number of datasets amended using Format 4: 0
Total number of datasets amended using Format 5: 0
Total number of datasets amended using Format 6: 172

I could only amend 544 datasets using the 544 json files, but from the open data portal I can see that there is a total of 546 datasets listed under the Standard Model Physics category of the CMS 2015 simulated datasets, so I think the json files could be missing two datasets?

There was only one json file following Format 1 that is explained in the README here and it is the only format that has the matching efficiency value. Also, the fraction of events with negative weights value can only be found in formats 1 and 3 but there are no json files following Format 3, so almost all amended datasets would be missing these two values.

@katilp
Copy link
Member Author

katilp commented Nov 9, 2023

@riga, we will amend the json schema for the OD datasets with the cross-sections fields. It would be a good moment to check that we follow the same naming conventions. On your side, will you use the naming from McM? I.e.

(updated this to show Nancy's PR)

                record["cross_section"]["total_value"] = cross_sections_json_data["totX_final"]
                record["cross_section"]["total_value_uncertainty"] = cross_sections_json_data["totX_final_err"]
                record["cross_section"]["matching_efficiency"] = ""
                record["cross_section"]["filter_efficiency"] = cross_sections_json_data["filterEff(weights)"]
                record["cross_section"]["neg_weight_fraction"] = ""

NB, we do not read them from McM but from the output of XGenAnalyzer that we run ourselves for OD MC datasets.

@katilp
Copy link
Member Author

katilp commented Jan 14, 2024

@nancyhamdan Please remind us if there's still something open/unclear on this issue. Thanks!

@katilp
Copy link
Member Author

katilp commented Feb 14, 2024

@tiborsimko : the script from cernopendata/data-curation#210 works fine.

What do we need to amend to display values and add the text drafted above?

Is it here?

@katilp
Copy link
Member Author

katilp commented Feb 19, 2024

Update the json schema with the cross-section values.

@jmhogan Should we foresee having recommended cross-section values in the dataset json machine-readable?
For the moment, there's no place to extract them but there might be in the future.

Now we have e.g.:

    "cross_section": {
      "filter_efficiency": "2.113e-03",
      "matching_efficiency": "",
      "neg_weight_fraction": "",
      "total_value": "1.657e+08",
      "total_value_uncertainty": "1.019e+05"
    },

for the generator-level values. We could foresee another set for the recommended values. Not to be filled now but to have it ready when we will be able to get these values programmatically (ongoing work at CAT).

@tiborsimko, what would be a preferred json structure? Just two different objects?

@jmhogan
Copy link
Contributor

jmhogan commented Feb 21, 2024

@katilp in principle having that slot seems fine. There is probably no way to fill it other than some name-string-matching script...

tiborsimko pushed a commit that referenced this issue Mar 5, 2024
Adds cross-section values to 2015 MC records.

Modifies the record template
templates/cernopendata_records_ui/records/record_detail.html so that the
values get displayed with the text suggested in #3454.
tiborsimko pushed a commit that referenced this issue Mar 5, 2024
Adds cross-section values to 2015 MC records.

Modifies the record template
templates/cernopendata_records_ui/records/record_detail.html so that the
values get displayed with the text suggested in #3454.
tiborsimko added a commit that referenced this issue Mar 6, 2024
Adds cross-section values to 2015 MC records.

Adds cross section field information to JSON Schema.

Modifies the detailed record template so that the values get displayed
with the text suggested in #3454.

Co-authored-by: Tibor Šimko <tibor.simko@cern.ch>
@katilp
Copy link
Member Author

katilp commented Mar 25, 2024

For the record:

Similarly, add values in MC2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants