Skip to content

Commit d643ff3

Browse files
committed
jade-2023-raw-datasets: prepare logbook records
Initial version of the JADE logbook record preparation script. To be improved by MPP in the coming weeks.
1 parent a5abd78 commit d643ff3

File tree

7 files changed

+189
-23
lines changed

7 files changed

+189
-23
lines changed

.gitignore

Lines changed: 24 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,36 +8,31 @@ venv/
88
*.err
99
cms-2010-collision-datasets/outputs/*.json
1010
cms-2010-simulated-datasets/outputs/*.json
11+
cms-2011-collision-datasets-runb-update/inputs/config-store
12+
cms-2011-collision-datasets-runb-update/inputs/das-json-config-store
13+
cms-2011-collision-datasets-runb-update/inputs/das-json-store
14+
cms-2011-collision-datasets-runb-update/outputs/*.json
1115
cms-2011-collision-datasets/code/das.py
1216
cms-2011-collision-datasets/inputs/das-json-store
1317
cms-2011-collision-datasets/outputs/*.xml
14-
cms-2011-collision-datasets-runb-update/inputs/das-json-store
15-
cms-2011-collision-datasets-runb-update/inputs/das-json-config-store
16-
cms-2011-collision-datasets-runb-update/inputs/config-store
17-
cms-2011-collision-datasets-runb-update/outputs/*.json
1818
cms-2011-hlt-triggers/outputs/*.html
1919
cms-2011-hlt-triggers/outputs/*.xml
2020
cms-2011-l1-triggers/outputs/*.xml
2121
cms-2011-simulated-datasets/inputs/das-json-store
2222
cms-2011-simulated-datasets/outputs/*.xml
23-
cms-2012-collision-datasets/inputs/das-json-store
24-
cms-2012-collision-datasets/outputs/*.json
25-
cms-2012-collision-datasets-update/inputs/das-json-store
26-
cms-2012-collision-datasets-update/inputs/das-json-config-store
2723
cms-2012-collision-datasets-update/inputs/config-store
24+
cms-2012-collision-datasets-update/inputs/das-json-config-store
25+
cms-2012-collision-datasets-update/inputs/das-json-store
2826
cms-2012-collision-datasets-update/outputs/*.json
27+
cms-2012-collision-datasets/inputs/das-json-store
28+
cms-2012-collision-datasets/outputs/*.json
2929
cms-2012-event-display-files/inputs/ig/
3030
cms-2012-event-display-files/outputs/*.json
3131
cms-2012-simulated-datasets/inputs/config-store
3232
cms-2012-simulated-datasets/inputs/das-json-store
33+
cms-2012-simulated-datasets/outputs/*.json
3334
cms-2012-simulated-datasets/outputs/create-config-store.sh
3435
cms-2012-simulated-datasets/outputs/create-das-json-store.sh
35-
cms-2012-simulated-datasets/outputs/*.json
36-
cms-2013-hlt-triggers/outputs
37-
cms-2013-simulated-datasets-hi/inputs/das-json-store
38-
cms-2013-simulated-datasets-hi/inputs/mcm-store
39-
cms-2013-simulated-datasets-hi/inputs/config-store
40-
cms-2013-simulated-datasets-hi/outputs/
4136
cms-2013-collision-datasets-hi-ppref/inputs/config-store
4237
cms-2013-collision-datasets-hi-ppref/inputs/das-json-config-store
4338
cms-2013-collision-datasets-hi-ppref/inputs/das-json-store
@@ -46,25 +41,31 @@ cms-2013-collision-datasets-hi/inputs/config-store
4641
cms-2013-collision-datasets-hi/inputs/das-json-config-store
4742
cms-2013-collision-datasets-hi/inputs/das-json-store
4843
cms-2013-collision-datasets-hi/outputs/*.json
49-
cms-2015-collision-datasets/inputs/das-json-store
50-
cms-2015-collision-datasets/inputs/das-json-config-store
51-
cms-2015-collision-datasets/outputs/*.json
44+
cms-2013-hlt-triggers/outputs
45+
cms-2013-simulated-datasets-hi/inputs/config-store
46+
cms-2013-simulated-datasets-hi/inputs/das-json-store
47+
cms-2013-simulated-datasets-hi/inputs/mcm-store
48+
cms-2013-simulated-datasets-hi/outputs/
5249
cms-2015-collision-datasets-hi-ppref/inputs/config-store
53-
cms-2015-collision-datasets-hi-ppref/inputs/das-json-store
5450
cms-2015-collision-datasets-hi-ppref/inputs/das-json-config-store
51+
cms-2015-collision-datasets-hi-ppref/inputs/das-json-store
5552
cms-2015-collision-datasets-hi-ppref/outputs/*.json
53+
cms-2015-collision-datasets/inputs/das-json-config-store
54+
cms-2015-collision-datasets/inputs/das-json-store
55+
cms-2015-collision-datasets/outputs/*.json
56+
cms-2015-simulated-datasets/inputs/config-store
5657
cms-2015-simulated-datasets/inputs/das-json-store
5758
cms-2015-simulated-datasets/inputs/mcm-store
58-
cms-2015-simulated-datasets/inputs/config-store
59-
cms-2015-simulated-datasets/outputs/
6059
cms-2015-simulated-datasets/lhe_generators
61-
cod2-to-cod3/outputs/*.json
62-
opera-2017-multiplicity-studies/outputs/opera-events.json
60+
cms-2015-simulated-datasets/outputs/
6361
cms-YYYY-simulated-datasets/cache
6462
cms-YYYY-simulated-datasets/outputs/*.csv
6563
cms-YYYY-simulated-datasets/outputs/*.err
6664
cms-YYYY-simulated-datasets/outputs/*.json
6765
cod2-to-cod3/outputs/*.json
66+
cod2-to-cod3/outputs/*.json
67+
jade-2023-raw-datasets/outputs/*.json
68+
opera-2017-multiplicity-studies/outputs/opera-events.json
6869
opera-2017-multiplicity-studies/outputs/opera-events.json
69-
opera-2019-neutrino-induced-charm/outputs/opera-events.json
7070
opera-2019-electron-neutrinos/outputs/opera-events.json
71+
opera-2019-neutrino-induced-charm/outputs/opera-events.json

README.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ Specific data ingestion and curation campaigns:
5858
- `cms-run2-hlt-triggers <cms-run2-hlt-triggers>`_ -- helper scripts for the CMS Run2 data release (HLT triggers)
5959
- `cms-run2-ultra-legacy-production <cms-run2-ultra-legacy-production>`_ - helper scripts for CMS Run2 ultra-legacy production
6060
- `cod2-to-cod3 <cod2-to-cod3>`_ - record migration from version 2 to version 3
61+
- `jade-2023-first-release <jade-2023-first-release>`_ - helper scripts for the initial release of JADE data
6162
- `opera-2017-multiplicity-studies <opera-2017-multiplicity-studies>`_ - helper scripts for the release of OPERA multiplicity studies
6263
- `opera-2019-electron-neutrinos <opera-2019-electron-neutrinos>`_ - helper scripts for the release of OPERA electron neutrino events
6364
- `opera-2019-neutrino-induced-charm <opera-2019-neutrino-induced-charm>`_ - helper scripts for the release of OPERA charm events

jade-2023-raw-datasets/README.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
========================
2+
jade-2023-raw-datasets
3+
========================
4+
5+
This directory contains helper scripts used to prepare the initial release of
6+
JADE data in 2023. Includes raw datasets with accompanying logbooks and notes.
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
#!/usr/bin/env python
2+
3+
"""
4+
Create JADE logbook records.
5+
"""
6+
7+
import json
8+
import re
9+
10+
recid_start = 26100
11+
year_published = "2023"
12+
13+
14+
def create_record(recid, path, size, checksum):
15+
"""Create record for the given JADE logbook."""
16+
17+
rec = {}
18+
19+
try:
20+
logbook_number = re.match(r"^.*Log([0-9]+)\.pdf", path).groups(0)[0]
21+
except Exception:
22+
logbook_number = "FIXME"
23+
24+
logbook_period = "1970"
25+
26+
rec["abstract"] = {}
27+
rec["abstract"][
28+
"description"
29+
] = f"This is JADE logbook {logbook_number} from {logbook_period}. FIXME add more description."
30+
31+
rec["accelerator"] = "DESY-PETRA"
32+
33+
rec["collaboration"] = {}
34+
rec["collaboration"]["name"] = "JADE collaboration"
35+
rec["collaboration"]["recid"] = "451"
36+
37+
rec["collections"] = [
38+
"JADE-Logbooks",
39+
]
40+
41+
rec["date_created"] = [
42+
logbook_period,
43+
]
44+
rec["date_published"] = year_published
45+
46+
rec["distribution"] = {}
47+
rec["distribution"]["formats"] = [
48+
"pdf",
49+
]
50+
rec["distribution"]["number_files"] = 1
51+
rec["distribution"]["size"] = size
52+
53+
rec["experiment"] = "JADE"
54+
55+
rec["files"] = []
56+
rec["files"].append(
57+
{
58+
"checksum": "adler32:" + checksum,
59+
"size": size,
60+
"uri": path,
61+
}
62+
)
63+
64+
rec["license"] = {}
65+
rec["license"]["attribution"] = "CC0"
66+
67+
rec["publisher"] = "CERN Open Data Portal"
68+
69+
rec["recid"] = str(recid)
70+
71+
rec["title"] = f"JADE logbook number {logbook_number}"
72+
73+
rec["type"] = {}
74+
rec["type"]["primary"] = "Supplementaries"
75+
rec["type"]["secondary"] = [
76+
"Logbook",
77+
]
78+
79+
return rec
80+
81+
82+
def create_records():
83+
"""Create records."""
84+
with open("./inputs/eos-file-information-logbooks.txt", "r") as f:
85+
records = []
86+
recid = recid_start
87+
for line in f.readlines():
88+
match = re.match(r"^path=(.*) size=(.*) checksum=(.*)$", line.strip())
89+
if match:
90+
path, size, checksum = match.groups()
91+
size = int(size)
92+
records.append(create_record(recid, path, size, checksum))
93+
recid += 1
94+
return records
95+
96+
97+
def print_records(records):
98+
"""Print records."""
99+
print(
100+
json.dumps(
101+
records,
102+
indent=2,
103+
sort_keys=True,
104+
ensure_ascii=False,
105+
separators=(",", ": "),
106+
)
107+
)
108+
109+
110+
def main():
111+
"Do the job."
112+
113+
records = create_records()
114+
print_records(records)
115+
116+
117+
if __name__ == "__main__":
118+
main()
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log01.pdf size=37913903 checksum=5537cb30
2+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log02.pdf size=38811120 checksum=8eb74619
3+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log03.pdf size=50652976 checksum=970d45ef
4+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log04.pdf size=50162033 checksum=dbcf6160
5+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log05.pdf size=47007176 checksum=8af84e12
6+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log06.pdf size=48306806 checksum=da5c1a3e
7+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log07.pdf size=50320784 checksum=ae36da39
8+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log08.pdf size=41087938 checksum=50e06402
9+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log09.pdf size=41056855 checksum=8c6b28a5
10+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log10.pdf size=40359158 checksum=a3936655
11+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log11.pdf size=41105169 checksum=20a2305c
12+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log12.pdf size=39146329 checksum=9e547ac6
13+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log13.pdf size=38893971 checksum=8004225d
14+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log14.pdf size=43056879 checksum=ed017f53
15+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log15.pdf size=41640363 checksum=8f39d465
16+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log16.pdf size=39921243 checksum=9f2a1c0f
17+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log17.pdf size=43588114 checksum=9efd7aa2
18+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log18.pdf size=40612426 checksum=3b083c0b
19+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log19.pdf size=42436620 checksum=996c06fe
20+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log20.pdf size=40799092 checksum=ea108b6a
21+
path=root://eospublic.cern.ch//eos/opendata/jade/documentation/logbooks/Log21.pdf size=17001411 checksum=5d91f901

jade-2023-raw-datasets/run.sh

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/sh
2+
3+
## 1) create EOS file indexes:
4+
5+
# lxplus> eos find --xurl --size --checksum /eos/opendata/jade/upload/documentation/LogBooks | grep pdf > ./inputs/eos-file-information-logbooks.txt
6+
7+
## 2) create JADE logbook records
8+
9+
mkdir -p outputs
10+
python ./code/create_logbook_records.py > ./outputs/jade-logbooks.json
11+
12+
## 6) check the validity of resulting JSON files
13+
14+
jsonlint -q ./outputs/*.json
15+
16+
## 7) copy them to CERN Open Data fixtures directory
17+
18+
\cp outputs/*.json ../../opendata.cern.ch/cernopendata/modules/fixtures/data/records

run-tests.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ check_black() {
1212
cms-YYYY-run-numbers/code/*.py \
1313
cms-2013-collision-datasets-hi-ppref/code/*.py \
1414
cms-2015-collision-datasets-hi-ppref/code/*.py \
15+
jade-2023-raw-datasets/code/*.py \
1516
utils/*.py
1617
}
1718

0 commit comments

Comments
 (0)