Skip to content

Commit c802e1a

Browse files
authored
Merge pull request #2 from columnflow/sync_overleaf
Sync overleaf
2 parents 7695f5e + 6974a85 commit c802e1a

File tree

1 file changed

+41
-2
lines changed

1 file changed

+41
-2
lines changed

Diff for: sections/producer.tex

+41-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,44 @@
11
\section{Writing a Producer}\label{sec:producer}
22

3-
The \CCSPStlye{Producer} class is used to calculate higher-level observables and define new columns to be written o disk. The corresponding task is called \CCSPStlye{cf.ProduceColumns} (see Ref.~\cite{cf_repo} for detailed info). Naturally, we only want to compute these new variables for the relevant events for our analysis. Thus, the producers are executed after the selection step.
3+
The \CCSPStlye{Producer} class is used to calculate higher-level variables and define new array columns to be written to disk.
4+
The corresponding task is called \CCSPStlye{cf.ProduceColumns} (see Ref.~\cite{cf_repo} for detailed information).
5+
Naturally, we only want to compute these new variables for the relevant events for our analysis.
6+
Thus, the producers are executed after the selection step in the task graph.
47

5-
In this part of the tutorial, we will write a producer which calculates the four lepton invariant mass.
8+
The \code{H4L} analysis includes three exemplary \CCSPStlye{Producer}s in \code{h4l/production/example.py}.
9+
You will notice that the script starts by importing all relevant modules, including CMS specific ones.
10+
\columnflow provides \CCSPStlye{Producer}s to compute commonly used event information, such as MC, pdf or pileup weights, which can be found under \code{columnflow.production.cms}.
11+
We also load both \code{numpy} and \code{awkward} with the \code{maybe\_import} mechanism to account for differences between the software environments for different parts of the Task graph
12+
13+
We start by defining a new \CCSPStlye{Producer} class named \code{features}.
14+
This class requires the transverse jet momentum \code{Jet.pt}, which must be added to its \code{uses} set.
15+
Additionally, it produces two new array columns, the total jet transverse momentum \code{ht} and the number of jets in an event \code{n\_jet}, which are both added to its \code{produces} set.
16+
Each of these new variables is computed and then added to the \code{events} array with the \code{set\_ak\_column} function.
17+
This is necessary to make these variables available outside of the \CCSP{Producer}, e.g.\ for writing the information to disk.
18+
Note that for the case of \code{n\_jet}, we specified that the column element must be an \code{int} value.
19+
20+
The second \CCSPStlye{Producer} class \code{cutflow\_features} allows us to define and store features to be used for cutflow plots. Here, in addition to \code{Jet.pt} we also require \code{mc\_weight} and \code{category\_ids} to be added to the \code{uses} set. Note that both of these are \CCSPStlye{Producer}s themselves which you can find by following the import path at the beginning of the script.
21+
22+
The \CCSPStlye{Producer} class \code{mc\_weight} reads in the \code{genWeight} column and, if existent, the \code{LHEWeight} column, both stored in \code{events}. Since these columns are required, they are both added to the \code{uses} set of \code{mc\_weight}. By extension, when we call \code{mc\_weight} in our \code{uses} set, we are calling these columns as well. The \code{mc\_weight} class simply decides which one of these weights to use and saves the decision as a new column, also named \code{mc\_weight}, which is included in its \code{produces} set. At this point, we also have the option to add the \code{mc\_weight} class to our own \code{produces} set. In this way, the new column also gets created and saved to disk.
23+
24+
Meanwhile, the \code{category\_ids} class assigns each event an array of category ids, which it stores as a new column also named \code{category\_ids}. Thus, we must also add this class to our \code{produces} set. The topic of defining categories is discussed in detail in the Section \ref{sec:categories}. Now that we have access to both these \CCSPStlye{Producer} classes, the \code{cutflow\_features} class can use them to attribute MC weights (if the dataset passed to it is tagged as an MC dataset) and category ids to \code{events}. It then creates a new column in the updated \code{events} object named \code{cutflow.jet1\_pt} which saves the transverse momentum of the most energetic jet in each event stored in \code{Jet.pt\text{[:,0]}}. If the event does not contain jets, it instead saves an \code{EMPTY\_FLOAT} value.
25+
26+
The last \CCSPStlye{Producer} class defined is \code{example} and follows the same structure as the two previously explained \CCSPStlye{Producer}s.
27+
First, it starts by creating the \code{cutflow.jet1\_pt} column by using the \CCSPStlye{Producer} class \code{features} called at \code{\text{events=self[features](events, **kwargs)}}.
28+
It then applies category ids and deterministic seeds to the updated \code{events} object.
29+
Lastly, two additional modules are called in this example.
30+
First, the \code{normalization_weights} producer is used to reweight the cross section of simulated events to the values that are provided in the metadata database (in this case \code{cmsdb}).
31+
Additionally, the \code{muon_weights} producer applies scale factors for muons provided by the CMS Muon POG to facilitate a better compatibility of data and simulation.
32+
33+
\begin{exercise}{Understanding some basic Producers}
34+
Familiarize yourself with the \CCSPStlye{Producer} classes mentioned above.
35+
\end{exercise}
36+
37+
In this \code{H4L} analysis we want to calculate the four-lepton invariant mass, which should exhibit a peak near the Higgs rest mass for signal events.
38+
Note that to perform four-vector calculations, you need to import \code{attach\_coffea\_behavior} from \code{columnflow.production.util}.
39+
You will need to use kinematic information from both the \code{events.Electron} and \code{events.Muon} collections and create a new column which stores your calculated invariant mass.
40+
41+
\begin{exercise}{Writing a Producer}{h4l/production/invariant_mass.py}
42+
Write a \CCSPStlye{Producer} class which computes the four lepton invariant mass.
43+
\end{exercise}
44+

0 commit comments

Comments
 (0)