Skip to content

Commit dee5fe2

Browse files
committed
cms-2016-simulated-datasets: richer dataset sample
Uses 1k CMS 2016 MC dataset for a richer dataset sample. Enriches the documentation. Adds output files to global `.gitignore` file.
1 parent f6a86a7 commit dee5fe2

File tree

4 files changed

+4223
-64
lines changed

4 files changed

+4223
-64
lines changed

.gitignore

Lines changed: 33 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,34 @@
1-
# Environments
1+
*.err
2+
*.pyc
23
.env
34
.venv
4-
env/
5-
venv/
6-
7-
*.pyc
8-
*.err
95
cms-2010-collision-datasets/outputs/*.json
106
cms-2010-simulated-datasets/outputs/*.json
7+
cms-2011-collision-datasets-runb-update/inputs/config-store
8+
cms-2011-collision-datasets-runb-update/inputs/das-json-config-store
9+
cms-2011-collision-datasets-runb-update/inputs/das-json-store
10+
cms-2011-collision-datasets-runb-update/outputs/*.json
1111
cms-2011-collision-datasets/code/das.py
1212
cms-2011-collision-datasets/inputs/das-json-store
1313
cms-2011-collision-datasets/outputs/*.xml
14-
cms-2011-collision-datasets-runb-update/inputs/das-json-store
15-
cms-2011-collision-datasets-runb-update/inputs/das-json-config-store
16-
cms-2011-collision-datasets-runb-update/inputs/config-store
17-
cms-2011-collision-datasets-runb-update/outputs/*.json
1814
cms-2011-hlt-triggers/outputs/*.html
1915
cms-2011-hlt-triggers/outputs/*.xml
2016
cms-2011-l1-triggers/outputs/*.xml
2117
cms-2011-simulated-datasets/inputs/das-json-store
2218
cms-2011-simulated-datasets/outputs/*.xml
23-
cms-2012-collision-datasets/inputs/das-json-store
24-
cms-2012-collision-datasets/outputs/*.json
25-
cms-2012-collision-datasets-update/inputs/das-json-store
26-
cms-2012-collision-datasets-update/inputs/das-json-config-store
2719
cms-2012-collision-datasets-update/inputs/config-store
20+
cms-2012-collision-datasets-update/inputs/das-json-config-store
21+
cms-2012-collision-datasets-update/inputs/das-json-store
2822
cms-2012-collision-datasets-update/outputs/*.json
23+
cms-2012-collision-datasets/inputs/das-json-store
24+
cms-2012-collision-datasets/outputs/*.json
2925
cms-2012-event-display-files/inputs/ig/
3026
cms-2012-event-display-files/outputs/*.json
3127
cms-2012-simulated-datasets/inputs/config-store
3228
cms-2012-simulated-datasets/inputs/das-json-store
29+
cms-2012-simulated-datasets/outputs/*.json
3330
cms-2012-simulated-datasets/outputs/create-config-store.sh
3431
cms-2012-simulated-datasets/outputs/create-das-json-store.sh
35-
cms-2012-simulated-datasets/outputs/*.json
36-
cms-2013-hlt-triggers/outputs
37-
cms-2013-simulated-datasets-hi/inputs/das-json-store
38-
cms-2013-simulated-datasets-hi/inputs/mcm-store
39-
cms-2013-simulated-datasets-hi/inputs/config-store
40-
cms-2013-simulated-datasets-hi/outputs/
4132
cms-2013-collision-datasets-hi-ppref/inputs/config-store
4233
cms-2013-collision-datasets-hi-ppref/inputs/das-json-config-store
4334
cms-2013-collision-datasets-hi-ppref/inputs/das-json-store
@@ -46,25 +37,38 @@ cms-2013-collision-datasets-hi/inputs/config-store
4637
cms-2013-collision-datasets-hi/inputs/das-json-config-store
4738
cms-2013-collision-datasets-hi/inputs/das-json-store
4839
cms-2013-collision-datasets-hi/outputs/*.json
49-
cms-2015-collision-datasets/inputs/das-json-store
50-
cms-2015-collision-datasets/inputs/das-json-config-store
51-
cms-2015-collision-datasets/outputs/*.json
40+
cms-2013-hlt-triggers/outputs
41+
cms-2013-simulated-datasets-hi/inputs/config-store
42+
cms-2013-simulated-datasets-hi/inputs/das-json-store
43+
cms-2013-simulated-datasets-hi/inputs/mcm-store
44+
cms-2013-simulated-datasets-hi/outputs/
5245
cms-2015-collision-datasets-hi-ppref/inputs/config-store
53-
cms-2015-collision-datasets-hi-ppref/inputs/das-json-store
5446
cms-2015-collision-datasets-hi-ppref/inputs/das-json-config-store
47+
cms-2015-collision-datasets-hi-ppref/inputs/das-json-store
5548
cms-2015-collision-datasets-hi-ppref/outputs/*.json
49+
cms-2015-collision-datasets/inputs/das-json-config-store
50+
cms-2015-collision-datasets/inputs/das-json-store
51+
cms-2015-collision-datasets/outputs/*.json
52+
cms-2015-simulated-datasets/inputs/config-store
5653
cms-2015-simulated-datasets/inputs/das-json-store
5754
cms-2015-simulated-datasets/inputs/mcm-store
58-
cms-2015-simulated-datasets/inputs/config-store
59-
cms-2015-simulated-datasets/outputs/
6055
cms-2015-simulated-datasets/lhe_generators
61-
cod2-to-cod3/outputs/*.json
62-
opera-2017-multiplicity-studies/outputs/opera-events.json
56+
cms-2015-simulated-datasets/outputs/
57+
cms-2016-simulated-datasets/cookies.txt
58+
cms-2016-simulated-datasets/inputs/config-store
59+
cms-2016-simulated-datasets/inputs/das-json-store
60+
cms-2016-simulated-datasets/inputs/mcm-store
61+
cms-2016-simulated-datasets/lhe_generators
62+
cms-2016-simulated-datasets/outputs/
6363
cms-YYYY-simulated-datasets/cache
6464
cms-YYYY-simulated-datasets/outputs/*.csv
6565
cms-YYYY-simulated-datasets/outputs/*.err
6666
cms-YYYY-simulated-datasets/outputs/*.json
6767
cod2-to-cod3/outputs/*.json
68+
cod2-to-cod3/outputs/*.json
69+
env/
70+
opera-2017-multiplicity-studies/outputs/opera-events.json
6871
opera-2017-multiplicity-studies/outputs/opera-events.json
69-
opera-2019-neutrino-induced-charm/outputs/opera-events.json
7072
opera-2019-electron-neutrinos/outputs/opera-events.json
73+
opera-2019-neutrino-induced-charm/outputs/opera-events.json
74+
venv/

cms-2016-simulated-datasets/README.md

Lines changed: 41 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,11 @@
33
This directory contains helper scripts used to prepare CMS 2016 open data
44
release regarding MC simulated datasets.
55

6-
7-
- `code/` folder contains the python code.
6+
- `code/` folder contains the python code;
87
- `inputs/` folder contains input text files with the list of datasets for each
9-
year and input files.
8+
year and input files;
9+
- `outputs/` folder contains generated JSON records to be included as the CERN
10+
Open Data portal fixtures.
1011

1112
Every step necessary to produce the final `*.json` files is handled by the
1213
`cmc-mc/interface.py` script. Details about it can be queried with the command:
@@ -15,61 +16,75 @@ Every step necessary to produce the final `*.json` files is handled by the
1516
$ python3 code/interface.py --help
1617
```
1718

18-
Make sure to start voms-proxy before creating cache
19+
Please make sure to get the VOMS proxy file before running these scripts:
20+
1921
```console
2022
$ voms-proxy-init --voms cms --rfc --valid 190:00
2123
```
2224

23-
Set the eos path with
25+
Please make sure to set the EOS instance to EOSPUBLIC before running these scripts:
2426

2527
```console
2628
$ export EOS_MGM_URL=root://eospublic.cern.ch
2729
```
30+
Please make sure to have a valid `userkey.nodes.pem` certificate present in
31+
`$HOME/.globus`. If not, you have to run the following on top of the regular
32+
CMS certificate documentation:
33+
34+
```console
35+
$ cd $HOME/.globus
36+
$ ls userkey.nodes.pem
37+
$ openssl pkcs12 -in myCert.p12 -nocerts -nodes -out userkey.nodes.pem # if not present
38+
$ cd -
39+
```
2840

29-
Warning: creating the full local cache might take a long time!
41+
Warning: Creating the full local cache might take a long time.
3042

3143
First step is to create EOS file index cache:
3244

3345
```console
34-
$ python3 ./code/interface.py --create-eos-indexes ../cms-YYYY-simulated-datasets/inputs/CMS-2016-mc-datasets.txt
46+
$ time python3 ./code/interface.py --create-eos-indexes inputs/CMS-2016-mc-datasets.txt
3547
```
3648

37-
This requires the file to be in place in their final location.
38-
39-
For early testing, on lxplus, all steps can be run without the EOS file index cache with the flag `--ignore-eos-store`.
40-
41-
To build sample records (with a limited number of datasets in the input file) do the following:
49+
This requires the data files to be placed in their final location. However, for
50+
early testing on LXPLUS, all steps can be run without the EOS file index cache
51+
by means of adding the command-line option `--ignore-eos-store` to the commands below.
4252

53+
We can now build sample records by doing:
4354

4455
```console
45-
$ python3 ./code/interface.py --create-das-json-store --ignore-eos-store DATASET_LIST
56+
$ time python3 ./code/interface.py --create-das-json-store --ignore-eos-store inputs/CMS-2016-mc-datasets.txt
4657

4758
$ auth-get-sso-cookie -u https://cms-pdmv.cern.ch/mcm -o cookies.txt
48-
$ python3 ./code/interface.py --create-mcm-store --ignore-eos-store DATASET_LIST
59+
$ time python3 ./code/interface.py --create-mcm-store --ignore-eos-store inputs/CMS-2016-mc-datasets.txt
4960

50-
$ openssl pkcs12 -in myCert.p12 -nocerts -nodes -out userkey.nodes.pem # if not present
51-
$ python3 ./code/interface.py --get-conf-files --ignore-eos-store DATASET_LIST
61+
$ time python3 ./code/interface.py --get-conf-files --ignore-eos-store inputs/CMS-2016-mc-datasets.txt
5262

53-
$ python3 code/lhe_generators.py
63+
$ time python3 code/lhe_generators.py
5464

55-
$ python3 ./code/interface.py --create-records --ignore-eos-store DATASET_LIST
56-
$ python3 ./code/interface.py --create-conffiles-records --ignore-eos-store DATASET_LIST
65+
$ time python3 ./code/interface.py --create-records --ignore-eos-store inputs/CMS-2016-mc-datasets.txt
66+
$ time python3 ./code/interface.py --create-conffiles-records --ignore-eos-store inputs/CMS-2016-mc-datasets.txt
5767
```
5868

59-
Note that to build the test records an (empty) input file for DOI's and a recid info file must be present in the inputs directory.
60-
Each step builds a subdirectory with a cache (`das-json-store`, `mcm-store` and `config-store`). They are large, do not upload them to the repository.
69+
Note that to build the test records an (empty) input file for DOIs and a recid
70+
info file must be present in the inputs directory.
6171

62-
The output json file for dataset records go to the `outputs` directory.
72+
Each step builds a subdirectory with a cache (`das-json-store`, `mcm-store` and
73+
`config-store`). They are large, do not upload them to the repository, respect
74+
the `.gitignore`.
6375

76+
The output JSON files for the dataset records will be generated in the
77+
`outputs` directory.
6478

6579
## lhe_generators
6680

6781

6882
```console
6983
python3 code/lhe_generators.py 2> errors > output &
7084
```
71-
- This will get lhe generator parameters from gridpacks for datasets listed in `./inputs/CMS-2016-mc-datasets.txt`
72-
- It works on lxplus or with mounted EOS
73-
- number of threads is set to 20 which is ideal for lxplus
7485

75-
> :warning: There are many cases with various steps to get generator parameters for LHE -see [#97](https://github.com/cernopendata/data-curation/issues/97)-. Thus, in some few cases, the script MIGHT not work as expected so make sure to read it, check errors, and make any necessary tweaks
86+
- This will get lhe generator parameters from gridpacks for datasets listed in `./inputs/CMS-2016-mc-datasets.txt`.
87+
- It works on LXPLUS or with mounted EOS.
88+
- Number of threads is set to 20 which is ideal for LXPLUS.
89+
90+
> :warning: There are many cases with various steps to get generator parameters for LHE -see [#97](https://github.com/cernopendata/data-curation/issues/97)-. Thus, in some few cases, the script MIGHT not work as expected so make sure to read it, check errors, and make any necessary tweaks

0 commit comments

Comments
 (0)