Skip to content

Commit 3550670

Browse files
authored
Merge pull request #3 from columnflow/sync_overleaf
Adding Selector Section
2 parents c802e1a + 5400b2a commit 3550670

File tree

3 files changed

+129
-29
lines changed

3 files changed

+129
-29
lines changed

sections/calibrator.tex

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,22 +14,22 @@ \section{Writing a Calibrator}\label{sec:calibrator}
1414
% TODO: include code here?
1515
%\begin{lstlisting}[language=python]
1616
% # coding: utf-8
17-
%
17+
%
1818
% """
1919
% Jet energy calibration methods.
2020
% """
21-
%
21+
%
2222
% from columnflow.calibration import Calibrator, calibrator
2323
% from columnflow.calibration.cms.jets import jec, jer
2424
% from columnflow.util import maybe_import
25-
%
25+
%
2626
% ak = maybe_import("awkward")
27-
%
28-
%
27+
%
28+
%
2929
% # custom jec calibrator that only runs nominal correction
3030
% jec_nominal = jec.derive("jec_nominal", cls_dict={"uncertainty_sources": []})
31-
%
32-
%
31+
%
32+
%
3333
% @calibrator(
3434
% uses={jec_nominal},
3535
% produces={jec_nominal},
@@ -42,29 +42,29 @@ \section{Writing a Calibrator}\label{sec:calibrator}
4242
% """
4343
% # correct jet energy scale
4444
% events = self[jec_nominal](events, **kwargs)
45-
%
45+
%
4646
% # jet energy resolution smearing (MC only)
4747
% if self.dataset_inst.is_mc:
4848
% events = self[jer](events, **kwargs)
49-
%
49+
%
5050
% return events
51-
%
52-
%
51+
%
52+
%
5353
% @jet_energy.init
5454
% def jet_energy_init(self: Calibrator) -> None:
5555
% # return immediately if dataset object has not been loaded yet
5656
% if not getattr(self, "dataset_inst", None):
5757
% return
58-
%
58+
%
5959
% # add columns producs by JER smearing calibrator (MC only)
6060
% if self.dataset_inst.is_mc:
6161
% self.uses.add(jer)
6262
% self.produces.add(jer)
63-
%
63+
%
6464
%\end{lstlisting}
6565

66-
First the relevant modules are imported.
67-
Note that \code{awkward} is loaded with the \code{maybe\_import} mechanism.
66+
First some modules are imported. Note that \code{awkward} is loaded with
67+
the \code{maybe\_import} mechanism.
6868
This is necessary due to the encapsulated structure of the underlying software stack.
6969
In the scope of this exercise, we don't want to consider all the different sources of uncertainties that are associated with jet calibration yet.
7070
Therefore, we use the \code{derive} mechanism of \CCSPStlye{TaskArrayFunctions} to define a new class called \code{jec\_nominal}, which inherits from the original \code{jec} \CCSPStlye{Calibrator} but overwrites the corresponding class member variable.
@@ -81,4 +81,4 @@ \section{Writing a Calibrator}\label{sec:calibrator}
8181

8282
\begin{exercise}{Writing a Calibrator}%[h4l/calibration/jet.py]
8383
Familiarize yourself with how the \code{jet\_energy} \CCSPStlye{Calibrator} works.
84-
\end{exercise}
84+
\end{exercise}

sections/producer.tex

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ \section{Writing a Producer}\label{sec:producer}
33
The \CCSPStlye{Producer} class is used to calculate higher-level variables and define new array columns to be written to disk.
44
The corresponding task is called \CCSPStlye{cf.ProduceColumns} (see Ref.~\cite{cf_repo} for detailed information).
55
Naturally, we only want to compute these new variables for the relevant events for our analysis.
6-
Thus, the producers are executed after the selection step in the task graph.
6+
Thus, the producers are executed after the selection step in the task graph.
77

88
The \code{H4L} analysis includes three exemplary \CCSPStlye{Producer}s in \code{h4l/production/example.py}.
99
You will notice that the script starts by importing all relevant modules, including CMS specific ones.
@@ -14,21 +14,21 @@ \section{Writing a Producer}\label{sec:producer}
1414
This class requires the transverse jet momentum \code{Jet.pt}, which must be added to its \code{uses} set.
1515
Additionally, it produces two new array columns, the total jet transverse momentum \code{ht} and the number of jets in an event \code{n\_jet}, which are both added to its \code{produces} set.
1616
Each of these new variables is computed and then added to the \code{events} array with the \code{set\_ak\_column} function.
17-
This is necessary to make these variables available outside of the \CCSP{Producer}, e.g.\ for writing the information to disk.
17+
This is necessary to make these variables available outside of the \CCSPStlye{Producer}, e.g.\ for writing the information to disk.
1818
Note that for the case of \code{n\_jet}, we specified that the column element must be an \code{int} value.
1919

20-
The second \CCSPStlye{Producer} class \code{cutflow\_features} allows us to define and store features to be used for cutflow plots. Here, in addition to \code{Jet.pt} we also require \code{mc\_weight} and \code{category\_ids} to be added to the \code{uses} set. Note that both of these are \CCSPStlye{Producer}s themselves which you can find by following the import path at the beginning of the script.
20+
The second \CCSPStlye{Producer} class \code{cutflow\_features} allows us to define and store features to be used for cutflow plots. Here, in addition to \code{Jet.pt} we also require \code{mc\_weight} and \code{category\_ids} to be added to the \code{uses} set. Note that both of these are \CCSPStlye{Producer}s themselves which you can find by following the import path at the beginning of the script.
2121

2222
The \CCSPStlye{Producer} class \code{mc\_weight} reads in the \code{genWeight} column and, if existent, the \code{LHEWeight} column, both stored in \code{events}. Since these columns are required, they are both added to the \code{uses} set of \code{mc\_weight}. By extension, when we call \code{mc\_weight} in our \code{uses} set, we are calling these columns as well. The \code{mc\_weight} class simply decides which one of these weights to use and saves the decision as a new column, also named \code{mc\_weight}, which is included in its \code{produces} set. At this point, we also have the option to add the \code{mc\_weight} class to our own \code{produces} set. In this way, the new column also gets created and saved to disk.
2323

24-
Meanwhile, the \code{category\_ids} class assigns each event an array of category ids, which it stores as a new column also named \code{category\_ids}. Thus, we must also add this class to our \code{produces} set. The topic of defining categories is discussed in detail in the Section \ref{sec:categories}. Now that we have access to both these \CCSPStlye{Producer} classes, the \code{cutflow\_features} class can use them to attribute MC weights (if the dataset passed to it is tagged as an MC dataset) and category ids to \code{events}. It then creates a new column in the updated \code{events} object named \code{cutflow.jet1\_pt} which saves the transverse momentum of the most energetic jet in each event stored in \code{Jet.pt\text{[:,0]}}. If the event does not contain jets, it instead saves an \code{EMPTY\_FLOAT} value.
24+
Meanwhile, the \code{category\_ids} class assigns each event an array of category ids, which it stores as a new column also named \code{category\_ids}. Thus, we must also add this class to our \code{produces} set. The topic of defining categories is discussed in detail in the Section \ref{sec:categories}. Now that we have access to both these \CCSPStlye{Producer} classes, the \code{cutflow\_features} class can use them to attribute MC weights (if the dataset passed to it is tagged as an MC dataset) and category ids to \code{events}. It then creates a new column in the updated \code{events} object named \code{cutflow.jet1\_pt} which saves the transverse momentum of the most energetic jet in each event stored in \code{Jet.pt\text{[:,0]}}. If the event does not contain jets, it instead saves an \code{EMPTY\_FLOAT} value.
2525

2626
The last \CCSPStlye{Producer} class defined is \code{example} and follows the same structure as the two previously explained \CCSPStlye{Producer}s.
2727
First, it starts by creating the \code{cutflow.jet1\_pt} column by using the \CCSPStlye{Producer} class \code{features} called at \code{\text{events=self[features](events, **kwargs)}}.
2828
It then applies category ids and deterministic seeds to the updated \code{events} object.
2929
Lastly, two additional modules are called in this example.
30-
First, the \code{normalization_weights} producer is used to reweight the cross section of simulated events to the values that are provided in the metadata database (in this case \code{cmsdb}).
31-
Additionally, the \code{muon_weights} producer applies scale factors for muons provided by the CMS Muon POG to facilitate a better compatibility of data and simulation.
30+
First, the \code{normalization\_weights} producer is used to reweight the cross section of simulated events to the values that are provided in the metadata database (in this case \code{cmsdb}).
31+
Additionally, the \code{muon\_weights} producer applies scale factors for muons provided by the CMS Muon POG to facilitate a better compatibility of data and simulation.
3232

3333
\begin{exercise}{Understanding some basic Producers}
3434
Familiarize yourself with the \CCSPStlye{Producer} classes mentioned above.
@@ -38,7 +38,6 @@ \section{Writing a Producer}\label{sec:producer}
3838
Note that to perform four-vector calculations, you need to import \code{attach\_coffea\_behavior} from \code{columnflow.production.util}.
3939
You will need to use kinematic information from both the \code{events.Electron} and \code{events.Muon} collections and create a new column which stores your calculated invariant mass.
4040

41-
\begin{exercise}{Writing a Producer}{h4l/production/invariant_mass.py}
42-
Write a \CCSPStlye{Producer} class which computes the four lepton invariant mass.
41+
\begin{exercise}{Writing a Producer}[h4l/production/invariant\_mass.py]
42+
Write a \CCSPStlye{Producer} class which computes the four lepton invariant mass.
4343
\end{exercise}
44-

sections/selector.tex

Lines changed: 105 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,114 @@ \section{Writing a Selector}\label{sec:selector}
44
This is a crucial step in the workflow since the decision to keep or reject objects or even whole events is performed here.
55
Since the selection usually depends on for example four-momenta of the objects within the events, it is executed after the calibration.
66
The corresponding task is called \CCSPStlye{cf.SelectEvents}.
7+
78
For more information, please consider Ref.~\cite{cf_repo}.
89

9-
\begin{table}[t]
10-
\Caption{Selection criteria for leptons}{Shown are the selection criteria for electrons (muons) at the 'loose' and 'tight'}
10+
\renewcommand{\arraystretch}{1.5}
11+
\begin{table}[h!]
12+
\centering
13+
\begin{tabular}{|m{4cm}|m{5cm}|m{5.5cm}|}
14+
\hline
15+
& \textbf{Electrons} & \textbf{Muons} \\ \hline
16+
\textbf{Kinematic cuts} &
17+
\begin{itemize}[leftmargin=*]
18+
\item $p_T^e > 7$ GeV
19+
\item $|\eta^e| < 2.5$
20+
\end{itemize} &
21+
\begin{itemize}[leftmargin=*]
22+
\item $p_T^\mu > 5$ GeV
23+
\item $|\eta^\mu| < 2.5$
24+
\end{itemize} \\ \hline
25+
\textbf{Vertex cuts} &
26+
\begin{itemize}[leftmargin=*]
27+
\item $d_{xy} < 0.5$
28+
\item $d_z < 1$ cm
29+
\item $SIP < 4$
30+
\end{itemize} &
31+
\begin{itemize}[leftmargin=*]
32+
\item $d_{xy} < 0.5$
33+
\item $d_z < 1$ cm
34+
\item $SIP < 4$
35+
\end{itemize} \\ \hline
36+
\textbf{Isolation \& ID for \newline 'tight' working point} & Dedicated BDT targeting \newline prompt electrons. & Select only muons within \newline a well-defined cone ($R=0.35$). \\ \hline
37+
\end{tabular}
38+
\Caption{Selection criteria for leptons.}{Shown are the selection cuts for electrons/muons at the 'loose' working point, with the last row defining the extra requirement for the leptons to pass the 'tight' working point.}
39+
\label{leptonSelection}
1140
\end{table}
41+
1242
In this part of the tutorial, we will write selections for electrons and muons.
13-
\textbf{\underline{Loose Electrons}}
43+
In the script \code{h4l/selection/lepton.py} you can find the base structure to implement two \CCSPStlye{Selector} modules, \code{electron\_selection} and \code{muon\_selection}.
44+
Each of these objects uses the relevant event information for its implementation.
45+
46+
47+
\textbf{\underline{Electron Selection}}
48+
49+
For \code{electron\_selection}, the electron kinematic information is first loaded into the \code{uses} set.
50+
Then, information that is dependent on the nanoAOD version is loaded, in this case which MVA (Multi-Variate Analysis) flag should be used.
51+
Notice that we use the union operator \code{|} to append either \code{Electron.mvaFall17V2Iso} or \code{Electron.mvaHZZIso} to the set containing the kinematic variables.
52+
Lastly, to perform four-vector calculations, we also require \code{attach\_coffea\_behavior}, which is imported at the beginning of the script from \code{columnflow.selection.util}.
53+
54+
55+
This \CCSPStlye{Selector} object also has two more dependencies, \code{exposed} and \code{sandbox}.
56+
The first one determines whether or not the \CCSPStlye{Selector} object is available from the command line.
57+
As for \code{sandbox}, it specifies the software enviorment where this \CCSPStlye{Selector} will be executed. A detailed description can be found in \href{https://columnflow.readthedocs.io/en/latest/user_guide/sandbox.html}{Sandboxes and Their Use in Columnflow}.
58+
59+
\newpage
1460
\begin{itemize}
15-
\item
61+
\item {
62+
\textbf{\underline{Loose Electrons}} -- Within the main body of \code{electron\_selection}, all selections should be applied.
63+
Note that the minimum transverse momentum has already been specified in \code{min\_pt}.
64+
The actual value, in GeV, is set in \code{h4l/config/config\_h4l.py}, and depends on the argument \code{working\_point}.
65+
In the config file, a dictionary stores two possible values, $15$ for a \code{'tight'} working point (default value), and $7$ for a \code{'loose'} working point.
66+
Both the transverse momentum and the pseudorapidity selection criteria have already been applied in \code{default\_mask}.
67+
You should now complete the mask with the remaining selection criteria from Table \ref{leptonSelection}.
68+
}
69+
\item {
70+
\textbf{\underline{Tight Electrons}} -- Finally, you should also add a condition that applies the identification criteria for when \code{working\_point} is set to \code{'tight'}.
71+
Both the fSCeta and BDT values are set in the function \code{return\_electron\_id\_cuts}, which can be found in \code{h4l/selection/util.py}.
72+
}
1673
\end{itemize}
74+
75+
After all selections have been applied, the final part of the module sorts all events by their momentum and applies the \code{default\_mask}.
76+
The indices of selected events are then stored in \code{selected\_electron\_idx}.
77+
The \code{electron\_selection} module finally returns both all events and a \CCSPStlye{SelectionResult} class instance.
78+
We initiate the \CCSPStlye{SelectionResult} instance by setting the \code{objects} and \code{aux} (i.e. auxiliary) arguments.
79+
Within \code{objects}, a nested dictionary saves \code{selected\_electron\_idx} as a value to an \code{Electron} key.
80+
The selection mask itself, \code{default\_mask} is stored in \code{aux}.
81+
82+
\begin{exercise}{Writing a Selector -- Electron Selection}[h4l/selection/lepton\_solution.py]
83+
Refering to Table \ref{leptonSelection}, fill in the missing information in the \CCSPStlye{Selector} module \code{electron\_selection} defined in \code{h4l/selection/lepton.py}.
84+
\end{exercise}
85+
86+
\vspace{0.8cm}
87+
88+
\textbf{\underline{Muon Selection}}
89+
90+
The \code{muon\_selection} module behaves very similarly. In this case, a dedicated software environment is not required.
91+
There is also no information dependent on the nanoAOD version.
92+
Besides the kinematic information, the \code{uses} set also loads muon quality criteria (e.g. if it is a global or tracker muon), identification and isolation information.
93+
94+
\begin{itemize}
95+
\item {
96+
\textbf{\underline{Loose Muons}} -- Within the main body of \code{muon\_selection}, a selection mask is now defined \code{selected\_muon\_mask}.
97+
Similarly to the electron selection, the minimum transverse momentum and pseudorapidity are already defined.
98+
You should now expand this mask such that:
99+
\begin{enumerate}
100+
\item you require either a global or tracker muon (for tracker muons \code{nStations} should be a positive);
101+
\item discard standalone muons if the reconstructed tracks are only present in the muon system (i.e. for standalone muons, you should require a positive number of \code{nTrackerLayers});
102+
\item apply the remaining selection criteria from Table \ref{leptonSelection}.
103+
\end{enumerate}
104+
}
105+
\item {
106+
\textbf{\underline{Tight Muons}} -- You should now add three conditions to \code{selected\_muon\_mask}:
107+
\begin{enumerate}
108+
\item enforce that the low momentum muons ($< 200$ GeV) are ParticleFlow candidates (use the variable \code{isPFcand});
109+
\item enforce that the high momentum muons ($\geq 200$ GeV) are ParticleFlow candidates OR have a positive \code{highPtId};
110+
\item use the variable \code{pfRelIso03\_all} to apply the condition in Table \ref{leptonSelection}.
111+
\end{enumerate}
112+
}
113+
\end{itemize}
114+
115+
\begin{exercise}{Writing a Selector -- Muon Selection}[h4l/selection/lepton\_solution.py]
116+
Again refering to Table \ref{leptonSelection}, fill in the missing information in the \CCSPStlye{Selector} module \code{muon\_selection} defined in \code{h4l/selection/lepton.py}.
117+
\end{exercise}

0 commit comments

Comments
 (0)