|
1 | 1 | \section{Writing a Producer}\label{sec:producer}
|
2 | 2 |
|
3 |
| -The \CCSPStlye{Producer} class is used to calculate higher-level observables and define new columns to be written o disk. The corresponding task is called \CCSPStlye{cf.ProduceColumns} (see Ref.~\cite{cf_repo} for detailed info). Naturally, we only want to compute these new variables for the relevant events for our analysis. Thus, the producers are executed after the selection step. |
| 3 | +The \CCSPStlye{Producer} class is used to calculate higher-level variables and define new array columns to be written to disk. |
| 4 | +The corresponding task is called \CCSPStlye{cf.ProduceColumns} (see Ref.~\cite{cf_repo} for detailed information). |
| 5 | +Naturally, we only want to compute these new variables for the relevant events for our analysis. |
| 6 | +Thus, the producers are executed after the selection step in the task graph. |
4 | 7 |
|
5 |
| -In this part of the tutorial, we will write a producer which calculates the four lepton invariant mass. |
| 8 | +The \code{H4L} analysis includes three exemplary \CCSPStlye{Producer}s in \code{h4l/production/example.py}. |
| 9 | +You will notice that the script starts by importing all relevant modules, including CMS specific ones. |
| 10 | +\columnflow provides \CCSPStlye{Producer}s to compute commonly used event information, such as MC, pdf or pileup weights, which can be found under \code{columnflow.production.cms}. |
| 11 | +We also load both \code{numpy} and \code{awkward} with the \code{maybe\_import} mechanism to account for differences between the software environments for different parts of the Task graph |
| 12 | + |
| 13 | +We start by defining a new \CCSPStlye{Producer} class named \code{features}. |
| 14 | +This class requires the transverse jet momentum \code{Jet.pt}, which must be added to its \code{uses} set. |
| 15 | +Additionally, it produces two new array columns, the total jet transverse momentum \code{ht} and the number of jets in an event \code{n\_jet}, which are both added to its \code{produces} set. |
| 16 | +Each of these new variables is computed and then added to the \code{events} array with the \code{set\_ak\_column} function. |
| 17 | +This is necessary to make these variables available outside of the \CCSP{Producer}, e.g.\ for writing the information to disk. |
| 18 | +Note that for the case of \code{n\_jet}, we specified that the column element must be an \code{int} value. |
| 19 | + |
| 20 | +The second \CCSPStlye{Producer} class \code{cutflow\_features} allows us to define and store features to be used for cutflow plots. Here, in addition to \code{Jet.pt} we also require \code{mc\_weight} and \code{category\_ids} to be added to the \code{uses} set. Note that both of these are \CCSPStlye{Producer}s themselves which you can find by following the import path at the beginning of the script. |
| 21 | + |
| 22 | +The \CCSPStlye{Producer} class \code{mc\_weight} reads in the \code{genWeight} column and, if existent, the \code{LHEWeight} column, both stored in \code{events}. Since these columns are required, they are both added to the \code{uses} set of \code{mc\_weight}. By extension, when we call \code{mc\_weight} in our \code{uses} set, we are calling these columns as well. The \code{mc\_weight} class simply decides which one of these weights to use and saves the decision as a new column, also named \code{mc\_weight}, which is included in its \code{produces} set. At this point, we also have the option to add the \code{mc\_weight} class to our own \code{produces} set. In this way, the new column also gets created and saved to disk. |
| 23 | + |
| 24 | +Meanwhile, the \code{category\_ids} class assigns each event an array of category ids, which it stores as a new column also named \code{category\_ids}. Thus, we must also add this class to our \code{produces} set. The topic of defining categories is discussed in detail in the Section \ref{sec:categories}. Now that we have access to both these \CCSPStlye{Producer} classes, the \code{cutflow\_features} class can use them to attribute MC weights (if the dataset passed to it is tagged as an MC dataset) and category ids to \code{events}. It then creates a new column in the updated \code{events} object named \code{cutflow.jet1\_pt} which saves the transverse momentum of the most energetic jet in each event stored in \code{Jet.pt\text{[:,0]}}. If the event does not contain jets, it instead saves an \code{EMPTY\_FLOAT} value. |
| 25 | + |
| 26 | +The last \CCSPStlye{Producer} class defined is \code{example} and follows the same structure as the two previously explained \CCSPStlye{Producer}s. |
| 27 | +First, it starts by creating the \code{cutflow.jet1\_pt} column by using the \CCSPStlye{Producer} class \code{features} called at \code{\text{events=self[features](events, **kwargs)}}. |
| 28 | +It then applies category ids and deterministic seeds to the updated \code{events} object. |
| 29 | +Lastly, two additional modules are called in this example. |
| 30 | +First, the \code{normalization_weights} producer is used to reweight the cross section of simulated events to the values that are provided in the metadata database (in this case \code{cmsdb}). |
| 31 | +Additionally, the \code{muon_weights} producer applies scale factors for muons provided by the CMS Muon POG to facilitate a better compatibility of data and simulation. |
| 32 | + |
| 33 | +\begin{exercise}{Understanding some basic Producers} |
| 34 | + Familiarize yourself with the \CCSPStlye{Producer} classes mentioned above. |
| 35 | +\end{exercise} |
| 36 | + |
| 37 | +In this \code{H4L} analysis we want to calculate the four-lepton invariant mass, which should exhibit a peak near the Higgs rest mass for signal events. |
| 38 | +Note that to perform four-vector calculations, you need to import \code{attach\_coffea\_behavior} from \code{columnflow.production.util}. |
| 39 | +You will need to use kinematic information from both the \code{events.Electron} and \code{events.Muon} collections and create a new column which stores your calculated invariant mass. |
| 40 | + |
| 41 | +\begin{exercise}{Writing a Producer}{h4l/production/invariant_mass.py} |
| 42 | + Write a \CCSPStlye{Producer} class which computes the four lepton invariant mass. |
| 43 | +\end{exercise} |
| 44 | + |
0 commit comments