Merge pull request #3 from columnflow/sync_overleaf

pkausw · web-flow · commit 3550670366d8 · 2025-02-04T10:36:19.000+01:00
Adding Selector Section
diff --git a/sections/calibrator.tex b/sections/calibrator.tex
@@ -14,22 +14,22 @@ \section{Writing a Calibrator}\label{sec:calibrator}
 % TODO: include code here?
 %\begin{lstlisting}[language=python]
 %	# coding: utf-8
-%	
+%
 %	"""
 %	Jet energy calibration methods.
 %	"""
-%	
+%
 %	from columnflow.calibration import Calibrator, calibrator
 %	from columnflow.calibration.cms.jets import jec, jer
 %	from columnflow.util import maybe_import
-%	
+%
 %	ak = maybe_import("awkward")
-%	
-%	
+%
+%
 %	# custom jec calibrator that only runs nominal correction
 %	jec_nominal = jec.derive("jec_nominal", cls_dict={"uncertainty_sources": []})
-%	
-%	
+%
+%
 %	@calibrator(
 %	uses={jec_nominal},
 %	produces={jec_nominal},
@@ -42,29 +42,29 @@ \section{Writing a Calibrator}\label{sec:calibrator}
 %	"""
 %	# correct jet energy scale
 %	events = self[jec_nominal](events, **kwargs)
-%	
+%
 %	# jet energy resolution smearing (MC only)
 %	if self.dataset_inst.is_mc:
 %	events = self[jer](events, **kwargs)
-%	
+%
 %	return events
-%	
-%	
+%
+%
 %	@jet_energy.init
 %	def jet_energy_init(self: Calibrator) -> None:
 %	# return immediately if dataset object has not been loaded yet
 %	if not getattr(self, "dataset_inst", None):
 %	return
-%	
+%
 %	# add columns producs by JER smearing calibrator (MC only)
 %	if self.dataset_inst.is_mc:
 %	self.uses.add(jer)
 %	self.produces.add(jer)
-%	
+%
 %\end{lstlisting}
 
-First the relevant modules are imported.
-Note that \code{awkward} is loaded with the \code{maybe\_import} mechanism.
+First some modules are imported. Note that \code{awkward} is loaded with
+the \code{maybe\_import} mechanism.
 This is necessary due to the encapsulated structure of the underlying software stack.
 In the scope of this exercise, we don't want to consider all the different sources of uncertainties that are associated with jet calibration yet.
 Therefore, we use the \code{derive} mechanism of \CCSPStlye{TaskArrayFunctions} to define a new class called \code{jec\_nominal}, which inherits from the original \code{jec} \CCSPStlye{Calibrator} but overwrites the corresponding class member variable.
@@ -81,4 +81,4 @@ \section{Writing a Calibrator}\label{sec:calibrator}
 
 \begin{exercise}{Writing a Calibrator}%[h4l/calibration/jet.py]
 	Familiarize yourself with how the \code{jet\_energy} \CCSPStlye{Calibrator} works.
-\end{exercise}
+\end{exercise}
diff --git a/sections/producer.tex b/sections/producer.tex
@@ -3,7 +3,7 @@ \section{Writing a Producer}\label{sec:producer}
 The \CCSPStlye{Producer} class is used to calculate higher-level variables and define new array columns to be written to disk.
 The corresponding task is called \CCSPStlye{cf.ProduceColumns} (see Ref.~\cite{cf_repo} for detailed information).
 Naturally, we only want to compute these new variables for the relevant events for our analysis.
-Thus, the producers are executed after the selection step in the task graph. 
+Thus, the producers are executed after the selection step in the task graph.
 
 The \code{H4L} analysis includes three exemplary \CCSPStlye{Producer}s in \code{h4l/production/example.py}.
 You will notice that the script starts by importing all relevant modules, including CMS specific ones.
@@ -14,21 +14,21 @@ \section{Writing a Producer}\label{sec:producer}
 This class requires the transverse jet momentum \code{Jet.pt}, which must be added to its \code{uses} set.
 Additionally, it produces two new array columns, the total jet transverse momentum \code{ht} and the number of jets in an event \code{n\_jet}, which are both added to its \code{produces} set.
 Each of these new variables is computed and then added to the \code{events} array with the \code{set\_ak\_column} function.
-This is necessary to make these variables available outside of the \CCSP{Producer}, e.g.\ for writing the information to disk.
+This is necessary to make these variables available outside of the \CCSPStlye{Producer}, e.g.\ for writing the information to disk.
 Note that for the case of \code{n\_jet}, we specified that the column element must be an \code{int} value.
 
-The second \CCSPStlye{Producer} class \code{cutflow\_features} allows us to define and store features to be used for cutflow plots. Here, in addition to \code{Jet.pt} we also require \code{mc\_weight} and \code{category\_ids} to be added to the \code{uses} set. Note that both of these are \CCSPStlye{Producer}s themselves which you can find by following the import path at the beginning of the script. 
+The second \CCSPStlye{Producer} class \code{cutflow\_features} allows us to define and store features to be used for cutflow plots. Here, in addition to \code{Jet.pt} we also require \code{mc\_weight} and \code{category\_ids} to be added to the \code{uses} set. Note that both of these are \CCSPStlye{Producer}s themselves which you can find by following the import path at the beginning of the script.
 
 The  \CCSPStlye{Producer} class \code{mc\_weight} reads in the \code{genWeight} column and, if existent, the \code{LHEWeight} column, both stored in \code{events}. Since these columns are required, they are both added to the \code{uses} set of \code{mc\_weight}. By extension, when we call \code{mc\_weight} in our \code{uses} set, we are calling these columns as well. The \code{mc\_weight} class simply decides which one of these weights to use and saves the decision as a new column, also named \code{mc\_weight}, which is included in its \code{produces} set. At this point, we also have the option to add the \code{mc\_weight} class to our own \code{produces} set. In this way, the new column also gets created and saved to disk.
 
-Meanwhile, the \code{category\_ids} class assigns each event an array of category ids, which it stores as a new column also named \code{category\_ids}. Thus, we must also add this class to our \code{produces} set. The topic of defining categories is discussed in detail in the Section \ref{sec:categories}. Now that we have access to both these \CCSPStlye{Producer} classes, the \code{cutflow\_features} class can use them to attribute MC weights (if the dataset passed to it is tagged as an MC dataset) and category ids to \code{events}. It then creates a new column in the updated \code{events} object named \code{cutflow.jet1\_pt} which saves the transverse momentum of the most energetic jet in each event stored in \code{Jet.pt\text{[:,0]}}. If the event does not contain jets, it instead saves an \code{EMPTY\_FLOAT} value. 
+Meanwhile, the \code{category\_ids} class assigns each event an array of category ids, which it stores as a new column also named \code{category\_ids}. Thus, we must also add this class to our \code{produces} set. The topic of defining categories is discussed in detail in the Section \ref{sec:categories}. Now that we have access to both these \CCSPStlye{Producer} classes, the \code{cutflow\_features} class can use them to attribute MC weights (if the dataset passed to it is tagged as an MC dataset) and category ids to \code{events}. It then creates a new column in the updated \code{events} object named \code{cutflow.jet1\_pt} which saves the transverse momentum of the most energetic jet in each event stored in \code{Jet.pt\text{[:,0]}}. If the event does not contain jets, it instead saves an \code{EMPTY\_FLOAT} value.
 
 The last \CCSPStlye{Producer} class defined is \code{example} and follows the same structure as the  two previously explained \CCSPStlye{Producer}s.
 First, it starts by creating the \code{cutflow.jet1\_pt} column by using the \CCSPStlye{Producer} class \code{features} called at \code{\text{events=self[features](events, **kwargs)}}.
 It then applies category ids and deterministic seeds to the updated \code{events} object.
 Lastly, two additional modules are called in this example.
-First, the \code{normalization_weights} producer is used to reweight the cross section of simulated events to the values that are provided in the metadata database (in this case \code{cmsdb}).
-Additionally, the \code{muon_weights} producer applies scale factors for muons provided by the CMS Muon POG to facilitate a better compatibility of data and simulation.
+First, the \code{normalization\_weights} producer is used to reweight the cross section of simulated events to the values that are provided in the metadata database (in this case \code{cmsdb}).
+Additionally, the \code{muon\_weights} producer applies scale factors for muons provided by the CMS Muon POG to facilitate a better compatibility of data and simulation.
 
 \begin{exercise}{Understanding some basic Producers}
 	Familiarize yourself with the \CCSPStlye{Producer} classes mentioned above.
@@ -38,7 +38,6 @@ \section{Writing a Producer}\label{sec:producer}
 Note that to perform four-vector calculations, you need to import \code{attach\_coffea\_behavior} from \code{columnflow.production.util}.
 You will need to use kinematic information from both the \code{events.Electron} and \code{events.Muon} collections and create a new column which stores your calculated invariant mass.
 
-\begin{exercise}{Writing a Producer}{h4l/production/invariant_mass.py}
-	Write a \CCSPStlye{Producer} class which computes the four lepton invariant mass. 
+\begin{exercise}{Writing a Producer}[h4l/production/invariant\_mass.py]
+	Write a \CCSPStlye{Producer} class which computes the four lepton invariant mass.
 \end{exercise}
- 
diff --git a/sections/selector.tex b/sections/selector.tex
@@ -4,13 +4,114 @@ \section{Writing a Selector}\label{sec:selector}
 This is a crucial step in the workflow since the decision to keep or reject objects or even whole events is performed here.
 Since the selection usually depends on for example four-momenta of the objects within the events, it is executed after the calibration.
 The corresponding task is called \CCSPStlye{cf.SelectEvents}.
+
 For more information, please consider Ref.~\cite{cf_repo}.
 
-\begin{table}[t]
-	\Caption{Selection criteria for leptons}{Shown are the selection criteria for electrons (muons) at the 'loose' and 'tight'}
+\renewcommand{\arraystretch}{1.5}
+\begin{table}[h!]
+    \centering
+    \begin{tabular}{|m{4cm}|m{5cm}|m{5.5cm}|}
+    \hline
+    & \textbf{Electrons} & \textbf{Muons} \\ \hline
+    \textbf{Kinematic cuts} &
+    \begin{itemize}[leftmargin=*]
+    \item $p_T^e > 7$ GeV
+    \item $|\eta^e| < 2.5$
+    \end{itemize} &
+    \begin{itemize}[leftmargin=*]
+        \item $p_T^\mu > 5$ GeV
+        \item $|\eta^\mu| < 2.5$
+    \end{itemize} \\ \hline
+    \textbf{Vertex cuts} &
+    \begin{itemize}[leftmargin=*]
+        \item $d_{xy} < 0.5$
+        \item $d_z < 1$ cm
+        \item $SIP < 4$
+    \end{itemize} &
+    \begin{itemize}[leftmargin=*]
+        \item $d_{xy} < 0.5$
+        \item $d_z < 1$ cm
+        \item $SIP < 4$
+    \end{itemize} \\ \hline
+    \textbf{Isolation \& ID for \newline 'tight' working point} & Dedicated BDT targeting \newline prompt electrons. & Select only muons within \newline a well-defined cone ($R=0.35$). \\ \hline
+    \end{tabular}
+    \Caption{Selection criteria for leptons.}{Shown are the selection cuts for electrons/muons at the 'loose' working point, with the last row defining the extra requirement for the leptons to pass the 'tight' working point.}
+    \label{leptonSelection}
 \end{table}
+
 In this part of the tutorial, we will write selections for electrons and muons.
-\textbf{\underline{Loose Electrons}}
+In the script \code{h4l/selection/lepton.py} you can find the base structure to implement two \CCSPStlye{Selector} modules, \code{electron\_selection} and \code{muon\_selection}.
+Each of these objects uses the relevant event information for its implementation.
+
+
+\textbf{\underline{Electron Selection}}
+
+For \code{electron\_selection}, the electron kinematic information is first loaded into the \code{uses} set.
+Then, information that is dependent on the nanoAOD version is loaded, in this case which MVA (Multi-Variate Analysis) flag should be used.
+Notice that we use the union operator \code{|} to append either \code{Electron.mvaFall17V2Iso} or \code{Electron.mvaHZZIso} to the set containing the kinematic variables.
+Lastly, to perform four-vector calculations, we also require \code{attach\_coffea\_behavior}, which is imported at the beginning of the script from \code{columnflow.selection.util}.
+
+
+This \CCSPStlye{Selector} object also has two more dependencies, \code{exposed} and \code{sandbox}.
+The first one determines whether or not the \CCSPStlye{Selector} object is available from the command line.
+As for \code{sandbox}, it specifies the software enviorment where this \CCSPStlye{Selector} will be executed. A detailed description can be found in \href{https://columnflow.readthedocs.io/en/latest/user_guide/sandbox.html}{Sandboxes and Their Use in Columnflow}.
+
+\newpage
 \begin{itemize}
-    \item
+    \item {
+        \textbf{\underline{Loose Electrons}} -- Within the main body of \code{electron\_selection}, all selections should be applied.
+        Note that the minimum transverse momentum has already been specified in \code{min\_pt}.
+        The actual value, in GeV, is set in \code{h4l/config/config\_h4l.py}, and depends on the argument \code{working\_point}.
+        In the config file, a dictionary stores two possible values, $15$ for a \code{'tight'} working point (default value), and $7$ for a \code{'loose'} working point.
+        Both the transverse momentum and the pseudorapidity selection criteria have already been applied in \code{default\_mask}.
+        You should now complete the mask with the remaining selection criteria from Table \ref{leptonSelection}.
+    }
+    \item {
+        \textbf{\underline{Tight Electrons}} -- Finally, you should also add a condition that applies the identification criteria for when \code{working\_point} is set to \code{'tight'}.
+        Both the fSCeta and BDT values are set in the function \code{return\_electron\_id\_cuts}, which can be found in \code{h4l/selection/util.py}.
+    }
 \end{itemize}
+
+After all selections have been applied, the final part of the module sorts all events by their momentum and applies the \code{default\_mask}.
+The indices of selected events are then stored in \code{selected\_electron\_idx}.
+The \code{electron\_selection} module finally returns both all events and a \CCSPStlye{SelectionResult} class instance.
+We initiate the \CCSPStlye{SelectionResult} instance by setting the \code{objects} and \code{aux} (i.e. auxiliary) arguments.
+Within \code{objects}, a nested dictionary saves \code{selected\_electron\_idx} as a value to an \code{Electron} key.
+The selection mask itself, \code{default\_mask} is stored in \code{aux}.
+
+\begin{exercise}{Writing a Selector -- Electron Selection}[h4l/selection/lepton\_solution.py]
+	Refering to Table \ref{leptonSelection}, fill in the missing information in the \CCSPStlye{Selector} module \code{electron\_selection} defined in \code{h4l/selection/lepton.py}.
+\end{exercise}
+
+\vspace{0.8cm}
+
+\textbf{\underline{Muon Selection}}
+
+The \code{muon\_selection} module behaves very similarly. In this case, a dedicated software environment is not required.
+There is also no information dependent on the nanoAOD version.
+Besides the kinematic information, the \code{uses} set also loads muon quality criteria (e.g. if it is a global or tracker muon), identification and isolation information.
+
+\begin{itemize}
+    \item {
+        \textbf{\underline{Loose Muons}} -- Within the main body of \code{muon\_selection}, a selection mask is now defined \code{selected\_muon\_mask}.
+        Similarly to the electron selection, the minimum transverse momentum and pseudorapidity are already defined.
+        You should now expand this mask such that:
+        \begin{enumerate}
+            \item you require either a global or tracker muon (for tracker muons \code{nStations} should be a positive);
+            \item discard standalone muons if the reconstructed tracks are only present in the muon system (i.e. for standalone muons, you should require a positive number of \code{nTrackerLayers});
+            \item apply the remaining selection criteria from Table \ref{leptonSelection}.
+        \end{enumerate}
+    }
+    \item {
+        \textbf{\underline{Tight Muons}} -- You should now add three conditions to \code{selected\_muon\_mask}:
+        \begin{enumerate}
+            \item enforce that the low momentum muons ($< 200$ GeV) are ParticleFlow candidates (use the variable \code{isPFcand});
+            \item enforce that the high momentum muons ($\geq 200$ GeV) are ParticleFlow candidates OR have a positive \code{highPtId};
+            \item use the variable \code{pfRelIso03\_all} to apply the condition in Table \ref{leptonSelection}.
+        \end{enumerate}
+    }
+\end{itemize}
+
+\begin{exercise}{Writing a Selector -- Muon Selection}[h4l/selection/lepton\_solution.py]
+	Again refering to Table \ref{leptonSelection}, fill in the missing information in the \CCSPStlye{Selector} module \code{muon\_selection} defined in \code{h4l/selection/lepton.py}.
+\end{exercise}