Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS: updates for the MC provenance query 2016 #182

Open
katilp opened this issue Aug 4, 2023 · 5 comments
Open

CMS: updates for the MC provenance query 2016 #182

katilp opened this issue Aug 4, 2023 · 5 comments
Assignees

Comments

@katilp
Copy link
Member

katilp commented Aug 4, 2023

The current script gets the provenance information as follows

  • it starts the query from the dataset to be released
  • it goes to the preceding step in the processing chain using input file of the current step
  • if the step is LHE it queries the generator information from gridpacks

As the processing scheme has changed from UL processing (no input datasets before AODSIM as they were transient) this won't work anymore.

The query flow should be changed to go directly to the chain:

For an example dataset /ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM:

On the web GUI:

Query by the output file name:

https://cms-pdmv.cern.ch/mcm/requests?produce=%2FADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8%2FRunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2%2FNANOAODSIM&page=0&shown=140737488355327

image

https://cms-pdmv.cern.ch/mcm/chained_requests?contains=EXO-RunIISummer20UL16NanoAODv9-00205&page=0

image

then for each request of the query and get the dicts in the respective pages.

On the command line

Using pred_id from das

$ dasgoclient -query="dataset=/ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM"  -json | jq .[].dataset[].prep_id
"EXO-RunIISummer20UL16NanoAODv9-00205"
"EXO-RunIISummer20UL16NanoAODv9-00205"
null
"EXO-RunIISummer20UL16NanoAODv9-00205"
  • get the chain from the dictionary:

    • from the web GUI: https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get/EXO-RunIISummer20UL16NanoAODv9-00205
    • command line: curl -s -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get/EXO-RunIISummer20UL16NanoAODv9-00205
    • get the chain(s):
      $ curl -s -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get/EXO-RunIISummer20UL16NanoAODv9-00205 | jq .results.member_of_chain
      [
        "EXO-chain_RunIISummer20UL16GEN_flowRunIISummer20UL16SIM_flowRunIISummer20UL16DIGIPremix_flowRunIISummer20UL16HLT_flowRunIISummer20UL16RECO_flowRunIISummer20UL16MiniAODv2_flowRunIISummer20UL16NanoAODv9-00109"
      ]
      
  • get the id of the chained request

    • from the web GUI: https://cms-pdmv.cern.ch/mcm/chained_requests?contains=EXO-RunIISummer20UL16NanoAODv9-00205&page=0&shown=15
    • command line:
      $ curl -L -s -b cookies.txt https://cms-pdmv.cern.ch/mcm/restapi/chained_requests/get/EXO-chain_RunIISummer20UL16GEN_flowRunIISummer20UL16SIM_flowRunIISummer20UL16DIGIPremix_flowRunIISummer20UL16HLT_flowRunIISummer20UL16RECO_flowRunIISummer20UL16MiniAODv2_flowRunIISummer20UL16NanoAODv9-00109 | jq .results.chain
      [
        "EXO-RunIISummer20UL16GEN-00123",
        "EXO-RunIISummer20UL16SIM-00521",
        "EXO-RunIISummer20UL16DIGIPremix-00521",
        "EXO-RunIISummer20UL16HLT-00521",
        "EXO-RunIISummer20UL16RECO-00521",
        "EXO-RunIISummer20UL16MiniAODv2-00213",
        "EXO-RunIISummer20UL16NanoAODv9-00205"
      ]
      
      • with auth-get-sso-cookie -u https://cms-pdmv.cern.ch/mcm -o cookies.txt (see also docs)
  • then, for each step in the chain, get the full dict or what is needed, e.g.

    $ curl -s -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get/EXO-RunIISummer20UL16GEN-00123 | jq .results.sequences[0].conditions
    "106X_mcRun2_asymptotic_v13"
    $ curl -s -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get/EXO-RunIISummer20UL16GEN-00123 | jq .results.cmssw_release
    "CMSSW_10_6_19_patch2"
    
  • An example with the LHE step:

    $  curl -s -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get/SMP-RunIISummer20UL16MiniAODv2-00038 | jq .results.member_of_chain
    [
      "SMP-chain_RunIISummer20UL16wmLHEGEN_flowRunIISummer20UL16SIM_flowRunIISummer20UL16DIGIPremix_flowRunIISummer20UL16HLT_flowRunIISummer20UL16RECO_flowRunIISummer20UL16MiniAODv2_flowRunIISummer20UL16NanoAODv9-00038"
    ]
    $  curl -L -s -b cookies.txt https://cms-pdmv.cern.ch/mcm/restapi/chained_requests/get/SMP-chain_RunIISummer20UL16wmLHEGEN_flowRunIISummer20UL16SIM_flowRunIISummer20UL16DIGIPremix_flowRunIISummer20UL16HLT_flowRunIISummer20UL16RECO_flowRunIISummer20UL16MiniAODv2_flowRunIISummer20UL16NanoAODv9-00038  | jq .results.chain
    [
      "SMP-RunIISummer20UL16wmLHEGEN-00237",
      "SMP-RunIISummer20UL16SIM-00056",
      "SMP-RunIISummer20UL16DIGIPremix-00053",
      "SMP-RunIISummer20UL16HLT-00056",
      "SMP-RunIISummer20UL16RECO-00056",
      "SMP-RunIISummer20UL16MiniAODv2-00038",
      "SMP-RunIISummer20UL16NanoAODv9-00038"
    ]
    
  • the gridpack location is in the fragment:

    $ curl -s -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get/SMP-RunIISummer20UL16wmLHEGEN-00237 | jq .results.fragment
    "\nimport FWCore.ParameterSet.Config as cms\n\nexternalLHEProducer = cms.EDProducer(\"ExternalLHEProducer\",\n    args = cms.vstring('/cvmfs/cms.cern.ch/phys_generator/gridpacks/slc7_amd64_gcc700/13TeV/madgraph/V5_2.6.5/XToJPsiG_JPsiToMuMu/bbH_HToJPsiG_JPsiToMuMu_slc7_amd64_gcc700_CMSSW_10_6_19_tarball.tar.xz'),\n    nEvents = cms.untracked.uint32(5000),\n    numberOfParameters = cms.uint32(1),\n    outputFile = cms.string('cmsgrid_final.lhe'),\n    scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh')\n)\n\n# Link to datacards\n# https://github.com/cms-sw/genproductions/tree/a46d2f429a051a1594209e3ebec3f5c75aea2ee2/bin/MadGraph5_aMCatNLO/cards/production/13TeV/XToJPsiG_JPsiToMuMu\n\nfrom Configuration.Generator.Pythia8CommonSettings_cfi import *\nfrom Configuration.Generator.MCTunes2017.PythiaCP5Settings_cfi import *\nfrom Configuration.Generator.PSweightsPythia.PythiaPSweightsSettings_cfi import *\n\ngenerator = cms.EDFilter(\"Pythia8HadronizerFilter\",\n    maxEventsToPrint = cms.untracked.int32(1),\n    pythiaPylistVerbosity = cms.untracked.int32(1),\n    filterEfficiency = cms.untracked.double(1.0),\n    pythiaHepMCVerbosity = cms.untracked.bool(False),\n    comEnergy = cms.double(13000.),\n    PythiaParameters = cms.PSet(\n        pythia8CommonSettingsBlock,\n        pythia8CP5SettingsBlock,\n        pythia8PSweightsSettingsBlock,\n        parameterSets = cms.vstring('pythia8CommonSettings',\n                                    'pythia8CP5Settings',\n
                'pythia8PSweightsSettings',\n                                    )\n    )\n)\n\nProductionFilterSequence = cms.Sequence(generator)\n"
    
@katilp
Copy link
Member Author

katilp commented Aug 15, 2023

Start with example datasets:

/ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM
/BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM

Expected changes in the scripts:

Called from interface.py:

  • option create-das-json-store does not need to query configs and parent, we get them all from mcm, in practice:
    • remove these lines for the parent loop
    • for the record, the dasgoclient query for query "dataset" should also return prep_id which can be used for the McM query
  • option create-mcm-store will not need to proceed through input/output datasets, now we can work entirely with prep_ids, in practice:
    • looping through parents can be replaced by querying the chain ( = the list containing pred_ids of the datasets in the provenance chain)
  • get-conf-files will then get the config file list only from McM, some changes in the code may be needed
  • creat-records will have similar updates as were done in the collision record scripts

lhe_generators.py is called separately (see e.g. 2015 readme):

  • NB, the gridpack location for pre-2017 production has changed
    • the config files have e.g. /cvmfs/cms.cern.ch/phys_generator/gridpacks/slc6_amd64_gcc481/13TeV/madgraph/V5_2.2.2/SingleBp/Bpb_M900GeV_W9GeV_Zb_LH_tarball.tar.xz (in an example LHE config file) but the /cvmfs area has them separately:
      -bash-4.2$ ls /cvmfs/cms.cern.ch/phys_generator/gridpacks/
      13p6TeV  gridpacks     pre2017  slc6_amd64_gcc472  slc6_amd64_gcc530  slc7_amd64_gcc630  UL
      14TeV    lhe_merger    RunII    slc6_amd64_gcc481  slc6_amd64_gcc630  slc7_amd64_gcc700  untar
      2017     mg_amg_patch  RunIII   slc6_amd64_gcc491  slc6_amd64_gcc700  slc7_amd64_gcc820
      2018     PdmV          RunUL    slc6_amd64_gcc493  slc7_amd64_gcc10   slc7_amd64_gcc900
      -bash-4.2$ ls /cvmfs/cms.cern.ch/phys_generator/gridpacks/pre2017
      13TeV  14TeV
      

@katilp
Copy link
Member Author

katilp commented Oct 19, 2023

  • Remove the parent loop in das_json_store.py (step 2) and get the information for the "top" dataset only

  • Remove the parent query from mcm_store.py (step 3) and loop over the step in the production chain

  • Restructure the output so that in a new directory chain, the top dataset has subdirs for each step with respective dict and scripts subdirs

    $ tree inputs/mcm-store/
    inputs/mcm-store/
    ├── chain
    │   ├── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM
    │   │   ├── EXO-RunIISummer20UL16DIGIPremix-00521
    │   │   │   ├── dict
    │   │   │   │   └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
    │   │   │   └── scripts
    │   │   │       └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
    │   │   ├── EXO-RunIISummer20UL16GEN-00123
    │   │   │   ├── dict
    │   │   │   │   └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
    │   │   │   └── scripts
    │   │   │       └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
    │   │   ├── EXO-RunIISummer20UL16HLT-00521
    │   │   │   ├── dict
    │   │   │   │   └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
    │   │   │   └── scripts
    │   │   │       └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
    │   │   ├── EXO-RunIISummer20UL16MiniAODv2-00213
    │   │   │   ├── dict
    │   │   │   │   └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
    │   │   │   └── scripts
    │   │   │       └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
    │   │   ├── EXO-RunIISummer20UL16NanoAODv9-00205
    │   │   │   ├── dict
    │   │   │   │   └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
    │   │   │   └── scripts
    │   │   │       └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
    │   │   ├── EXO-RunIISummer20UL16RECO-00521
    │   │   │   ├── dict
    │   │   │   │   └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
    │   │   │   └── scripts
    │   │   │       └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
    │   │   └── EXO-RunIISummer20UL16SIM-00521
    │   │       ├── dict
    │   │       │   └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
    │   │       └── scripts
    │   │           └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
    │   └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM
    │       ├── SMP-RunIISummer20UL16DIGIPremix-00053
    │       │   ├── dict
    │       │   │   └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
    │       │   └── scripts
    │       │       └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
    │       ├── SMP-RunIISummer20UL16HLT-00056
    │       │   ├── dict
    │       │   │   └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
    │       │   └── scripts
    │       │       └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
    │       ├── SMP-RunIISummer20UL16MiniAODv2-00038
    │       │   ├── dict
    │       │   │   └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
    │       │   └── scripts
    │       │       └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
    │       ├── SMP-RunIISummer20UL16NanoAODv9-00038
    │       │   ├── dict
    │       │   │   └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
    │       │   └── scripts
    │       │       └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
    │       ├── SMP-RunIISummer20UL16RECO-00056
    │       │   ├── dict
    │       │   │   └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
    │       │   └── scripts
    │       │       └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
    │       ├── SMP-RunIISummer20UL16SIM-00056
    │       │   ├── dict
    │       │   │   └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
    │       │   └── scripts
    │       │       └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
    │       └── SMP-RunIISummer20UL16wmLHEGEN-00237
    │           ├── dict
    │           │   └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
    │           └── scripts
    │               └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
    ├── dict
    │   ├── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
    │   └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
    └── scripts
        ├── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
        └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
    
  • Query only McM in conf_store.py (step 4) and use the chain/dataset/step subdir as an input to the functions

  • Update the provenance query in dataset_records.py (step 5) similarly, taking into account the changes above

  • Clean away some remaining das back-up queries that would not work as there are no dataset for all provenance steps in das

  • Update the LHE part

@katilp
Copy link
Member Author

katilp commented Oct 19, 2023

For the record, DIGIPremix step has a 22 Mb config file containing the list of files in the pile-up Premix datasets.
For the two test datasets that I use they differ only in naming:

$ ls -l inputs/config-store/
total 44823
-rw-r--r--. 1 kati zh     5917 Oct 19 14:29 086c69c1b826c78c43be2aa70d80e01e.configFile
-rw-r--r--. 1 kati zh     8671 Oct 19 14:29 160526781ab6242177672ffc68eb5568.configFile
-rw-r--r--. 1 kati zh     4319 Oct 19 14:29 481ced9502ea985a73dc7bca8c9ea7a9.configFile
-rw-r--r--. 1 kati zh     4349 Oct 19 14:29 528bf7046404f48fa330df88a6a92123.configFile
-rw-r--r--. 1 kati zh 22907660 Oct 19 14:29 528bf7046404f48fa330df88a6a9594b.configFile
-rw-r--r--. 1 kati zh     4521 Oct 19 14:29 528bf7046404f48fa330df88a6a99098.configFile
-rw-r--r--. 1 kati zh     4850 Oct 19 14:29 528bf7046404f48fa330df88a6a9a53b.configFile
-rw-r--r--. 1 kati zh     9324 Oct 19 14:29 70368b76504c9adbeb8bd6f29a1b6dee.configFile
-rw-r--r--. 1 kati zh    11520 Oct 19 14:29 80266517fa91333a47ed2d1cc3eeddf0.configFile
-rw-r--r--. 1 kati zh    12957 Oct 19 14:29 c8dc83abb237e289eae3cfefea871409.configFile
-rw-r--r--. 1 kati zh     4349 Oct 19 14:29 edf4aef02c2af29980365f11a8f78f77.configFile
-rw-r--r--. 1 kati zh 22907660 Oct 19 14:29 edf4aef02c2af29980365f11a8faa478.configFile
-rw-r--r--. 1 kati zh     4521 Oct 19 14:29 edf4aef02c2af29980365f11a8fade0c.configFile
-rw-r--r--. 1 kati zh     4850 Oct 19 14:29 edf4aef02c2af29980365f11a8fbd0b0.configFile

with

-bash-4.2$ diff inputs/config-store/528bf7046404f48fa330df88a6a9594b.configFile inputs/config-store/edf4aef02c2af29980365f11a8faa478.configFile
5c5
< # with command line options: --python_filename TOP-RunIISummer20UL16DIGIPremix-00281_1_cfg.py --eventcontent PREMIXRAW --customise Configuration/DataProcessing/Utils.addMonitoring --datatier GEN-SIM-DIGI --fileout file:TOP-RunIISummer20UL16DIGIPremix-00281.root --pileup_input dbs:/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX --conditions 106X_mcRun2_asymptotic_v13 --step DIGI,DATAMIX,L1,DIGI2RAW --procModifiers premix_stage2 --nThreads 4 --geometry DB:Extended --filein file:TOP-RunIISummer20UL16SIM-00281.root --datamix PreMix --era Run2_2016 --runUnscheduled --no_exec --mc -n 5807
---
> # with command line options: --python_filename TOP-RunIISummer20UL16DIGIPremix-00291_1_cfg.py --eventcontent PREMIXRAW --customise Configuration/DataProcessing/Utils.addMonitoring --datatier GEN-SIM-DIGI --fileout file:TOP-RunIISummer20UL16DIGIPremix-00291.root --pileup_input dbs:/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX --conditions 106X_mcRun2_asymptotic_v13 --step DIGI,DATAMIX,L1,DIGI2RAW --procModifiers premix_stage2 --nThreads 4 --geometry DB:Extended --filein file:TOP-RunIISummer20UL16SIM-00291.root --datamix PreMix --era Run2_2016 --runUnscheduled --no_exec --mc -n 5081
29c29
<     input = cms.untracked.int32(5807)
---
>     input = cms.untracked.int32(5081)
35c35
<     fileNames = cms.untracked.vstring('file:TOP-RunIISummer20UL16SIM-00281.root'),
---
>     fileNames = cms.untracked.vstring('file:TOP-RunIISummer20UL16SIM-00291.root'),
64c64
<     annotation = cms.untracked.string('--python_filename nevts:5807'),
---
>     annotation = cms.untracked.string('--python_filename nevts:5081'),
76c76
<     fileName = cms.untracked.string('file:TOP-RunIISummer20UL16DIGIPremix-00281.root'),
---
>     fileName = cms.untracked.string('file:TOP-RunIISummer20UL16DIGIPremix-00291.root'),

This is a 22M file and if taken for 40k MC datasets, it will result in 880 G disk space, so we can do it differently...

@katilp
Copy link
Member Author

katilp commented Oct 20, 2023

To do:

  • order the provenance steps LHEGEN/GEN, SIM, DIGI2RAW, HLT, RECO, PAT, NANO
  • mini/nano relations: indicate them in the dataset record
  • do not query the full provenance for both,
    • build the stores only for nano, it should be enough
      • mcm_store/chain done only for nano
      • config files retrieved only once
    • modify the dataset record build so that mini finds the corresponding nano through the parent-child relation from the cache (it now has only the top dataset)
      • add a condition on dataset type in get_all_generator_text
      • add a similar condition for the pile-up
  • update https://cms-pdmv.cern.ch/mcm/ to https://cms-pdmv-prod.web.cern.ch/mcm
  • unify lhe_generators.py with the rest, i.e. integrate to interface.py
  • update dataset/version/gt info in utils.py
  • add the container image field
  • make runtime caches e.g. RUNNUMBER_CACHE = { } etc
  • take the data reprocessing year from pdmv_submission_date

@katilp
Copy link
Member Author

katilp commented Jan 16, 2024

Updates to LHE generator search

Check which inputs are passed to the job in runcmsgrid.sh

  • madgraph - case: /cvmfs/cms.cern.ch/phys_generator/gridpacks/UL/13TeV/slc7_amd64_gcc10/MadGraph5_aMCatNLO/
    Run command (in runcmsgrid.sh):
    cd $LHEWORKDIR/process
    [...]
    ./run.sh $submitting_event $run_random_seed
    
    process/run.sh:
    DIR='./madevent'
    [... or else...]
    [...]
    ${DIR}/bin/gridrun $num_events $seed $gran
    
    etc... gets complex
    Check what happened in gridpack_generation.log and check with GEN conveners.
  • powheg - case: /cvmfs/cms.cern.ch/phys_generator/gridpacks/slc6_amd64_gcc630/13TeV/Powheg
    Run command:
    cat powheg.input
    ../pwhg_main &> log_${process}_${seed}.txt; test $? -eq 0 || fail_exit "pwhg_main error: exit code not 0"
    
    Input: powheg.input
  • jhugen - case: /cvmfs/cms.cern.ch/phys_generator/gridpacks/UL/13TeV/jhugen/
    Run command:
    cd JHUGenerator/
    
    ./JHUGen $(cat ../JHUGen.input) VegasNc2=${nevt} Seed=${rnum} DataFile=undecayed &&
    ./JHUGen $(cat ../JHUGen_decay.input) Seed=${rnum} ReadLHE=undecayed.lhe Seed=${rnum} DataFile=Out
    
    Inputs: /JHUGen.input, JHUGen_decay.input
  • jhugen - case: /cvmfs/cms.cern.ch/phys_generator/gridpacks/UL/13TeV/slc7_amd64_gcc820/JHUGen
    Run command:
    cd JHUGenerator/
    
    ./JHUGen $(cat ../JHUGen.input) VegasNc2=${nevt} Seed=${rnum} DataFile=undecayed &&
    
    Input: JHUGen.input
  • phantom - case: /cvmfs/cms.cern.ch/phys_generator/gridpacks/2017/13TeV/phantom/
    Run command:
    cat r_GEN.in | sed -e s/EVENTSNUM/${nevt}/ > r_tempo.in
    cat r_tempo.in | sed -e s/RANDOMSEED/${rnum}/ > r.in
    rm r_tempo.in
    ./phantom_1_3_p1_slc6_amd64_gcc630/phantom.exe >& log_GEN.txt
    
    Input: r_GEN.in
  • mcfm - case: /cvmfs/cms.cern.ch/phys_generator/gridpacks/2017/13TeV/mcfm
    Run command: ../Bin/mcfm readInput.DAT |& tee log
    Input: ./readInput.DAT

Reminder:

tar -tvf <gridpack name>.tgz: lists the contents of the archive
tar -xf <gridpack name>.tgz : extracts all files
tar -xf <gridpack name>.tgz <file name>: extracts one file only

Note:

  • Files may have been archived with a preceding ./
  • inputs mostly *.dat or *.input (powheg, jhugen)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant