Inclusive \(M_{\mathrm {T2}}\) search

The backgrounds in jets-plus-\(p_{\mathrm {T}} ^\text {miss}\) final states arise from three categories of SM processes.

The lost-lepton (LL) background: events with a lepton from a W boson decay where the lepton is either out of acceptance, not reconstructed, not identified, or not isolated. This background originates mostly from \({\text {W}} + \text {jets}\) and \({{\text {t}} {\bar{{{\text {t}}}}}} + \text {jets}\) events, with smaller contributions from more rare processes, such as diboson or \({{\text {t}} {\bar{{{\text {t}}}}}} {\text {V}} \) production.

The irreducible background: \({\text {Z}} + \text {jets}\) events, where the Z boson decays to neutrinos. This background is the most difficult to distinguish from the final states arising from potential signals. It is a major background in nearly all search regions, its importance decreasing with increasing \(N_{{\text {b}}}\).

The instrumental background: mostly multijet events with no genuine \(p_{\mathrm {T}} ^\text {miss}\). These events enter a search region due to either significant jet momentum mismeasurements or sources of anomalous noise. This is a subdominant background compared to others, after events are selected, as described in Sect. 3.1.

The backgrounds are estimated from data control regions. In the presence of BSM physics, these control regions could be affected by signal contamination. Although the expected signal contamination is typically negligible, its potential impact is accounted for in the interpretation of the results, as further described in Sect. 6.

Estimation of the background from events with leptonic W boson decays

The LL background is estimated from control regions with exactly one lepton candidate (e or \(\upmu \)) selected using the same triggers and preselection criteria used for the signal regions, with the exception of the lepton veto, which is inverted. The transverse mass \(M_{\mathrm {T}}\) determined using the lepton candidate and the \({\vec p}_{\mathrm {T}}^{\text {miss}}\) is required to satisfy \(M_{\mathrm {T}} <100\,\,\text {Ge}\text {V} \), in order to suppress the potential signal contamination of the control regions. Selected events are binned according to the same criteria as the search regions. The background in each signal bin, \(N^{\mathrm {SR}}_{\mathrm {LL}}\), is obtained by scaling the number of events in the control region, \(N^{\mathrm {CR}}_{1\ell }\), using transfer factors \(R^{0\ell /1\ell }_{\mathrm {MC}}\), as detailed below:

For events with \(N_{\mathrm {j}} =1\): $$\begin{aligned}&N^{\mathrm {SR}}_{\mathrm {LL}} \left( p_{\mathrm {T}} ^{\text {jet}},N_{{\text {b}}} \right)

onumber \\&\quad = N^{\mathrm {CR}}_{1\ell } \left( p_{\mathrm {T}} ^{\text {jet}},N_{{\text {b}}} \right) \, R^{0\ell /1\ell }_{\mathrm {MC}} \left( p_{\mathrm {T}} ^{\text {jet}},N_{{\text {b}}} \right) . \end{aligned}$$ (2)

For events with \(N_{\mathrm {j}} \ge 2\): $$\begin{aligned}&N^{\mathrm {SR}}_{\mathrm {LL}} \left( \varOmega ,M_{\mathrm {T2}} \right) = N^{\mathrm {CR}}_{1\ell } \left( \varOmega ,M_{\mathrm {T2}} \right) \,

onumber \\&\quad \times R^{0\ell /1\ell }_{\mathrm {MC}} \left( \varOmega ,M_{\mathrm {T2}} \right) \, k_{\mathrm {LL}} \left( M_{\mathrm {T2}} |\varOmega \right) , \end{aligned}$$ (3) where: $$\begin{aligned} \varOmega \equiv \left( H_{\mathrm {T}},N_{\mathrm {j}},N_{{\text {b}}} \right) . \end{aligned}$$ (4)

The single-lepton control regions have 1–2 times as many events as the corresponding signal regions. The factor \(R^{0\ell /1\ell }_{\mathrm {MC}}\) accounts for lepton acceptance and efficiency, as well as the expected contribution from the decay of W bosons to hadrons through an intermediate \(\uptau \) lepton. It is obtained from MC simulation, and corrected for the measured differences in the lepton efficiencies between data and simulation.

For events with \(N_{\mathrm {j}} \ge 2\), the factor \(k_{\mathrm {LL}}\) is one, except at high \(M_{\mathrm {T2}}\) values, where the single-lepton control sample has insufficient data to allow \(N^{\mathrm {CR}}_{1\ell }\) to be measured in each (\(H_{\mathrm {T}}\), \(N_{\mathrm {j}}\), \(N_{{\text {b}}}\), \(M_{\mathrm {T2}}\)) bin. In such cases, \(N^{\mathrm {CR}}_{1\ell }\) is integrated over the remaining \(M_{\mathrm {T2}}\) bins of the same (\(H_{\mathrm {T}}\), \(N_{\mathrm {j}}\), \(N_{{\text {b}}}\)) region, and the distribution in \(M_{\mathrm {T2}}\) across these bins is taken from simulation and applied through the factor \(k_{\mathrm {LL}}\).

The MC modeling of \(M_{\mathrm {T2}}\) is checked in data, in single-lepton events with either \(N_{{\text {b}}} =0\) or \(N_{{\text {b}}} \ge 1\), as shown in the left and right panels of Fig. 1, respectively. The predicted distributions in the comparison are obtained by summing all the relevant regions, after normalizing MC event yields to data and distributing events among the \(M_{\mathrm {T2}}\) bins according to the expectation from simulation.

Fig. 1 Distributions of the \(M_{\mathrm {T2}}\) variable in data and simulation for the single-lepton control region, after normalizing the simulation to data in bins of \(H_{\mathrm {T}}\), \(N_{\mathrm {j}}\), and \(N_{{\text {b}}}\), for events with no b-tagged jets (left), and events with at least one b-tagged jet (right). The hatched bands on the top panels show the MC statistical uncertainty, while the solid gray bands in the ratio plots show the systematic uncertainty in the \(M_{\mathrm {T2}}\) shape. The bins have different widths, denoted by the horizontal bars Full size image

Uncertainties arising from the limited size of the control samples and from theoretical and experimental considerations are evaluated and propagated to the final estimate. The dominant uncertainty in \(R^{0\ell /1\ell }_{\mathrm {MC}}\) is due to the modeling of the lepton efficiency (for electrons, muons, and hadronically decaying \(\uptau \) leptons) and jet energy scale (JES), and is of order 15–20%. The uncertainty in the \(M_{\mathrm {T2}}\) extrapolation via \(k_{\mathrm {LL}}\), which is as large as 40%, arises primarily from the JES, the relative fractions of \({\text {W}} + \text {jets}\) and \({{\text {t}} {\bar{{{\text {t}}}}}} + \text {jets}\) events, and the choice of the renormalization (\(\mu _{\mathrm {R}}\)) and factorization (\(\mu _{\mathrm {F}}\)) scales used in the event generation.

The uncertainties in the LL background prediction are summarized in Table 2 together with their typical size ranges across the search bins.

Table 2 Summary of systematic uncertainties in the lost-lepton background prediction, together with their typical size ranges across the search bins Full size table

Estimation of the background from \({\text {Z}} ({{\upnu }} \bar{{{\upnu }}})+\text {jets}\)

The \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \) background is estimated from a \({\text {Z}} \rightarrow \ell ^{+}\ell ^{-}\) (\(\ell = {\text {e}},{\upmu } \)) control sample selected using dilepton triggers. The trigger efficiency, measured from a sample of events in data with large \(H_{\mathrm {T}}\), is found to be greater than 97% in the selected kinematic range.

The leptons in the control sample are required to be of the same flavor and have opposite charge. The \(p_{\mathrm {T}}\) of the leading and trailing leptons must be at least 100 and 30\(\,\,\text {Ge}\text {V}\), respectively. Finally, the invariant mass of the lepton pair must be within 20\(\,\,\text {Ge}\text {V}\) of the Z boson mass.

After requiring that the \(p_{\mathrm {T}}\) of the dilepton system is at least 200\(\,\,\text {Ge}\text {V}\) (corresponding to the \(M_{\mathrm {T2}} >200\,\,\text {Ge}\text {V} \) requirement), the preselection requirements are applied based on kinematic variables recalculated after removing the dilepton system from the event to replicate the \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \) kinematic properties. For events with \(N_{\mathrm {j}} = 1\), one control region is defined for each bin of jet \(p_{\mathrm {T}}\). For events with at least two jets, the selected events are binned in \(H_{\mathrm {T}}\), \(N_{\mathrm {j}}\), and \(N_{{\text {b}}}\), but not in \(M_{\mathrm {T2}}\), to increase the dilepton event yield in each control region.

The contribution to each control region from flavor-symmetric processes, most importantly t \(\bar{{{\text {t}}}}\) production, is estimated using different-flavor (DF) \({\text {e}} {\upmu } \) events obtained with the same selection criteria as same-flavor (SF) \({\text {e}} {\text {e}} \) and \({\upmu } {\upmu } \) events. The background in each signal bin is then obtained using transfer factors.

For events with \(N_{\mathrm {j}} =1\), according to: $$\begin{aligned}&N^{\mathrm {SR}}_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}} \left( p_{\mathrm {T}} ^{\text {jet}},N_{{\text {b}}} \right) = \Bigl [N^{\mathrm {CRSF}}_{\ell \ell } \left( p_{\mathrm {T}} ^{\text {jet}},N_{{\text {b}}} \right)

onumber \\&\quad - N^{\mathrm {CRDF}}_{\ell \ell } \left( p_{\mathrm {T}} ^{\text {jet}},N_{{\text {b}}} \right) \, R^{\mathrm {SF}/\mathrm {DF}} \Bigr ]

onumber \\&\quad \times R^{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}/Z\rightarrow \ell ^{+}\ell ^{-}}_{\mathrm {MC}} \left( p_{\mathrm {T}} ^{\text {jet}},N_{{\text {b}}} \right) . \end{aligned}$$ (5)

For events with \(N_{\mathrm {j}} \ge 2\), according to: $$\begin{aligned}&N^{\mathrm {SR}}_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}} \left( \varOmega ,M_{\mathrm {T2}} \right) = \Bigl [N^{\mathrm {CRSF}}_{\ell \ell } \left( \varOmega \right)

onumber \\&\quad - N^{\mathrm {CRDF}}_{\ell \ell } \left( \varOmega \right) \, R^{\mathrm {SF}/\mathrm {DF}} \Bigr ]

onumber \\&\quad \times R^{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}/Z\rightarrow \ell ^{+}\ell ^{-}}_{\mathrm {MC}} \left( \varOmega \right) \, k_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}}\left( M_{\mathrm {T2}} ~|~\varOmega \right) , \end{aligned}$$ (6) where \(\varOmega \) is defined in Eq. (4).

Here \(N^{\mathrm {CRSF}}_{\ell \ell }\) and \(N^{\mathrm {CRDF}}_{\ell \ell }\) are the number of SF and DF events in the control region, while \(R^{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}/{\text {Z}} \rightarrow \ell ^{+}\ell ^{-}}_{\mathrm {MC}}\) and \(k_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}}\) are defined below. The factor \(R^{\mathrm {SF}/\mathrm {DF}}\) accounts for the difference in acceptance and efficiency between SF and DF events. It is determined as the ratio of the number of SF to DF events in a t \(\bar{{{\text {t}}}}\) enriched control sample, obtained with the same selection criteria as the \({\text {Z}} \rightarrow \ell ^{+}\ell ^{-}\) sample, but inverting the requirements on the \(p_{\mathrm {T}}\) and the invariant mass of the lepton pair. A measured value of \(R^{\mathrm {SF}/\mathrm {DF}}=1.06\pm 0.15\) is observed to be stable with respect to event kinematic variables, and is applied in all regions. Figure 2 (left) shows \(R^{\mathrm {SF}/\mathrm {DF}}\) measured as a function of the number of jets.

Fig. 2 (Left) Ratio \(R^{\mathrm {SF}/\mathrm {DF}}\) in data as a function of \(N_{\mathrm {j}}\). The solid black line enclosed by the red dashed lines corresponds to a value of \(1.06\pm 0.15\) that is observed to be stable with respect to event kinematic variables, while the two dashed black lines denote the statistical uncertainty in the \(R^{\mathrm {SF}/\mathrm {DF}}\) value. (Right) The shape of the \(M_{\mathrm {T2}}\) distribution in \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \) simulation compared to the one obtained from the \({\text {Z}} \rightarrow \ell ^{+}\ell ^{-}\) data control sample, in a region with \(1200<H_{\mathrm {T}} <1500\) \(\,\,\text {Ge}\text {V}\) and \(N_{\mathrm {j}} \ge 2\), inclusive in \(N_{{\text {b}}}\). The solid gray band on the ratio plot shows the systematic uncertainty in the \(M_{\mathrm {T2}}\) shape. The bins have different widths, denoted by the horizontal bars Full size image

For events with \(N_{\mathrm {j}} =1\), an estimate of the \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \) background in each search bin is obtained from the corresponding dilepton control region via the factor \(R^{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}/{\text {Z}} \rightarrow \ell ^{+}\ell ^{-}}_{\mathrm {MC}}\), which accounts for the acceptance and efficiency to select the dilepton pair and the ratio of branching fractions for the \({\text {Z}} \rightarrow \ell ^{+}\ell ^{-}\) and \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \) decays. For events with at least two jets, an estimate of the \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \) background is obtained analogously in each (\(H_{\mathrm {T}}\), \(N_{\mathrm {j}}\), \(N_{{\text {b}}}\)) region, integrated over \(M_{\mathrm {T2}}\). The factor \(R^{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}/{\text {Z}} \rightarrow \ell ^{+}\ell ^{-}}_{\mathrm {MC}}\) is obtained from simulation, including corrections for the differences in the lepton efficiencies between data and simulation.

For events with \(N_{\mathrm {j}} \ge 2\), the factor \(k_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}}\) accounts for the distribution in bins of \(M_{\mathrm {T2}}\) of the estimated background in each (\(H_{\mathrm {T}}\), \(N_{\mathrm {j}}\), \(N_{{\text {b}}}\)) region. This distribution is constructed using \(M_{\mathrm {T2}}\) shape templates from dilepton data and \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \) simulation in each (\(H_{\mathrm {T}}\), \(N_{\mathrm {j}}\), \(N_{{\text {b}}}\)) region. The templates obtained from data are used at low values of \(M_{\mathrm {T2}}\), where the amount of data is sufficient. On the other hand, at high values of \(M_{\mathrm {T2}}\) we use the templates from simulation.

Studies with simulated samples have demonstrated that the shape of the \(M_{\mathrm {T2}}\) distribution of the function \(k_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}}\) is independent of \(N_{{\text {b}}}\) for a given \(H_{\mathrm {T}}\) and \(N_{\mathrm {j}}\) selection, and that the shape is also independent of \(N_{\mathrm {j}}\) for \(H_{\mathrm {T}} >1500\,\,\text {Ge}\text {V} \). The dilepton control sample supports this observation. Therefore, functions \(k_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}}\) are obtained for each (\(H_{\mathrm {T}}\), \(N_{\mathrm {j}}\)) region, integrated over \(N_{{\text {b}}}\). For \(H_{\mathrm {T}} >1500\,\,\text {Ge}\text {V} \), only one function \(k_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}}\) is constructed, integrating also over \(N_{\mathrm {j}}\).

The MC modeling of the \(M_{\mathrm {T2}}\) variable is validated in data using control samples enriched in \({\text {Z}} \rightarrow \ell ^{+}\ell ^{-}\) events, in each bin of \(H_{\mathrm {T}}\), as shown in the right panel of Fig. 2 for events with \(1200<H_{\mathrm {T}} <1500\,\,\text {Ge}\text {V} \).

The largest uncertainty in the estimate of the invisible Z background in most regions results from the limited size of the dilepton control sample. The dominant uncertainty of about 5% in the ratio \(R^{Z\rightarrow {{\upnu }} \bar{{{\upnu }}}/Z\rightarrow \ell ^{+}\ell ^{-}}_{\mathrm {MC}}\) reflects the uncertainty in the differences between the lepton efficiencies in data and simulation. The uncertainty in the \(k_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}}\) factor arises from data statistical uncertainty for bins at low values of \(M_{\mathrm {T2}}\), where the function \(k_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}}\) is obtained from data, while for bins at high values of \(M_{\mathrm {T2}}\), where the function \(k_{{\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}}}\) is obtained from simulation, it is due to the uncertainties in the JES and the choice of the \(\mu _{\mathrm {R}}\) and \(\mu _{\mathrm {F}}\). These can result in effects as large as 40%.

The uncertainties in the \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \) background prediction are summarized in Table 3 together with their typical size ranges across the search bins.

Table 3 Summary of systematic uncertainties in the \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \) background prediction, together with their typical size ranges across the search bins Full size table

Estimation of the multijet background

The background from SM events comprised uniquely of jets produced through the strong interaction (multijet events) is estimated from control regions in data selected using triggers that require \(H_{\mathrm {T}}\) to exceed thresholds ranging from 125 (180) to 900 (1050)\(\,\,\text {Ge}\text {V}\) in 2016 (2017–2018) data samples. In addition, events are required to have at least two jets with \(p_{\mathrm {T}} > 10\,\,\text {Ge}\text {V} \).

The rebalance and smear (R&S) method used to estimate the multijet background consists of two steps. First, multijet data events are rebalanced by adjusting the \(p_{\mathrm {T}}\) of the jets such that the resulting \(p_{\mathrm {T}} ^\text {miss}\) is approximately zero. This rebalancing is performed through a likelihood maximization, accounting for the jet energy resolution [100, 101]. The output of the rebalancing step is an inclusive sample of multijet events with approximately zero \(p_{\mathrm {T}} ^\text {miss}\) that are used as a seed for the second step, the smearing. In the smearing step, the \(p_{\mathrm {T}}\) of the rebalanced jets is smeared according to the jet response function, in order to model the instrumental effects that lead to nonzero \(p_{\mathrm {T}} ^\text {miss}\). The smearing step is repeated many times for each rebalanced event. The output of each smearing step is an independent sample of events, which serves to populate the tails of kinematic distributions such as \(p_{\mathrm {T}} ^\text {miss}\) and \(M_{\mathrm {T2}}\), and to obtain a more precise estimate of the multijet background than would be possible using only simulation.

The method makes use of jet response templates, i.e., distributions of the ratio of reconstructed jet \(p_{\mathrm {T}}\) to generator-level jet \(p_{\mathrm {T}}\). The templates are derived from simulation in bins of jet \(p_{\mathrm {T}}\) and \(\eta \), separately for b-tagged and non-b-tagged jets. Systematic uncertainties are assessed to cover for the modeling of the core and of the tails of the jet response templates.

Of all jets in the event, a jet qualifies for use in the R&S procedure if it has \(p_{\mathrm {T}} >10\,\,\text {Ge}\text {V} \), and if it is not identified as a jet from pileup [131] in the case that \(p_{\mathrm {T}} <100\,\,\text {Ge}\text {V} \). All other jets are left unchanged but are still used in the calculation of \({\vec p}_{\mathrm {T}}^{\text {miss}}\) and other jet-related quantities. An event with n qualifying jets is rebalanced by varying the \(p_{\mathrm {T}} ^\text {reb}\) of each jet, which is an estimate of the true jet \(p_{\mathrm {T}}\), to maximize the likelihood function

$$\begin{aligned} L = \prod _{i=1}^n \text {P} \left( p_{\text {T},i}^{\text {reco}} | p_{\text {T},i}^{\text {reb}} \right) \, G\left( \frac{p_{\text {T},\text {reb,x}}^\text {miss}}{\sigma _\text {T}^{\text {soft}}}\right) \, G\left( \frac{p_{\text {T},\text {reb,y}}^\text {miss}}{\sigma _\text {T}^{\text {soft}}}\right) , \end{aligned}$$ (7)

where

$$\begin{aligned} G(x) \equiv \mathrm {e}^{-x^2/2}, \end{aligned}$$ (8)

and

$$\begin{aligned} \vec {p}_{\text {T},\text {reb}}^{\text {miss}} \equiv {\vec p}_{\mathrm {T}}^{\text {miss}}- \sum _{i=1}^n \left( \vec {p}_{\text {T},i}^\text {reb} - \vec {p}_{\text {T},i}^\text {reco} \right) . \end{aligned}$$ (9)

The term \(\text {P} ( p_{\text {T},i}^{\text {reco}} | p_{\text {T},i}^{\text {reb}} )\) in Eq. (7) is the probability for a jet with \(p_{\mathrm {T}}\) of \(p_{\text {T},i}^{\text {reb}}\) to be assigned a \(p_{\mathrm {T}}\) of \(p_{\text {T},i}^{\text {reco}}\) after reconstruction. This probability is taken directly from the jet response templates. The two G(x) terms in Eq. (7) enforce an approximate balancing condition. The \(\vec {p}_{\text {T},\text {reb}}^{\text {miss}}\) terms in Eq. (7) represent the \({\vec p}_{\mathrm {T}}^{\text {miss}}\) after rebalancing, and are obtained by simply propagating the changes in jet \(p_{\mathrm {T}}\) from rebalancing to \({\vec p}_{\mathrm {T}}^{\text {miss}}\). For the balancing of the x and y components of the \({\vec p}_{\mathrm {T}}^{\text {miss}}\), we use \(\sigma _\text {T}^{\text {soft}}=20\) \(\,\,\text {Ge}\text {V}\), which is approximately the width of the distributions of the x and y components of \({\vec p}_{\mathrm {T}}^{\text {miss}}\) in minimum bias events. This parameter represents the inherent missing energy due to low-\(p_{\mathrm {T}}\) jets, unclustered energy, and jets from pileup that cannot be eliminated by rebalancing. A systematic uncertainty is assessed to cover for the effects of the variation of \(\sigma _\text {T}^\text {soft}\).

The rebalanced events are used as input to the smearing procedure, where the \(p_{\mathrm {T}}\) of each qualifying jet is rescaled by a random factor drawn from the corresponding jet response template, and all kinematic quantities are recalculated accordingly.

The background from multijet events is estimated by applying the signal region selection requirements to the above rebalanced and smeared sample, except events are only used if \(p_{\text {T},\text {reb}}^\text {miss}<100\,\,\text {Ge}\text {V} \) to remove potential contamination from electroweak sources. This additional requirement is found to be fully efficient for multijet events, in simulation. Hence, no correction is applied to the prediction.

Systematic uncertainties are summarized in Table 4 together with their typical size ranges across the search bins.

Table 4 Summary of systematic uncertainties in the multijet background prediction, together with their typical size ranges across the search bins Full size table

The resulting background prediction is validated in data using control regions enriched in multijet events. The results of the validation in a control region selected by inverting the \(\varDelta \phi _{\text {min}}\) requirement are shown in Fig. 3. The electroweak backgrounds (LL and \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \)) in this control region are estimated from data using transfer factors from leptonic control regions as described above. In regions where the number of events in the data leptonic control regions are insufficient, the electroweak background is taken from simulation. The observation is found to agree with the prediction, within the uncertainties.

Fig. 3 Validation of the R&S multijet background prediction in control regions in data selected with \(\varDelta \phi _{\text {min}} <0.3\). Electroweak backgrounds (LL and \({\text {Z}} \rightarrow {{\upnu }} \bar{{{\upnu }}} \)) are estimated from data. In regions where the amount of data is insufficient to estimate the electroweak backgrounds, the corresponding yields are taken directly from simulation. The bins on the horizontal axis correspond to the (\(H_{\mathrm {T}}\), \(N_{\mathrm {j}}\), \(N_{{\text {b}}}\)) topological regions. The gray band on the ratio plot represents the total uncertainty in the prediction Full size image

Search for disappearing tracks

In the search for disappearing tracks, the SM background consists of events with charged hadrons or leptons that interact in the tracker or are poorly reconstructed, as well as tracks built out of incorrect combinations of hits. The background is estimated from data, leveraging the orthogonal definition of STCs and selected STs (Sect. 3.2.2), as described by Eq. (10).

$$\begin{aligned} N_{\mathrm {ST}}^\text {est} = f_{\text {short}} \, N_{\mathrm {STC}}^\text {obs}, \end{aligned}$$ (10)

where \(N_{\mathrm {ST}}\) is the number of selected short tracks, \(N_{\mathrm {STC}}\) is the number of selected short track candidates, and \(f_{\text {short}}\) is defined as:

$$\begin{aligned} f_{\text {short}} = N_{\mathrm {ST}}^\text {obs} / N_{\mathrm {STC}}^\text {obs}. \end{aligned}$$ (11)

The \(f_{\text {short}}\) ratio is measured directly in data, in a control region of events selected using the same triggers and preselection criteria used for the signal regions, except the selection on \(p_{\mathrm {T}} ^\text {miss}\) is relaxed to \(p_{\mathrm {T}} ^\text {miss} >30\,\,\text {Ge}\text {V} \) for all \(H_{\mathrm {T}}\) values, and the selection on \(M_{\mathrm {T2}}\) is shifted to \(60<M_{\mathrm {T2}} <100\,\,\text {Ge}\text {V} \). We exploit the empirical invariance of this ratio with respect to the \(H_{\mathrm {T}}\) and \(p_{\mathrm {T}} ^\text {miss}\) selection criteria, as observed in data control regions, to reduce the statistical uncertainty in the measurement. The \(f_{\text {short}}\) ratio is therefore measured in data separately for each \(N_{\mathrm {j}}\), track \(p_{\mathrm {T}}\), track length category, and inclusively in \(H_{\mathrm {T}}\). The \(f_{\text {short}}\) values are measured separately in 2016 and 2017–2018 data, mainly to account for the upgrade of the CMS tracking detector after 2016. Since a reliable measurement in data of the \(f_{\text {short}}\) ratio for long (L) tracks is not achievable because of the insufficient number of events, the value measured in data for medium (M) length tracks is used instead, after applying a correction based on simulation:

$$\begin{aligned} f_{\text {short}} (\text {L})_\text {data}^\text {est} = f_{\text {short}} (\text {M})_\text {data} \, f_{\text {short}} (\text {L})_{\mathrm {MC}}/f_{\text {short}} (\text {M})_{\mathrm {MC}}. \end{aligned}$$ (12)

A systematic uncertainty in the measured values of \(f_{\text {short}}\) is assigned to cover for the empirically motivated assumption of its invariance with respect to \(H_{\mathrm {T}}\) and \(p_{\mathrm {T}} ^\text {miss}\). Its size is determined by varying the \(H_{\mathrm {T}}\) and \(p_{\mathrm {T}} ^\text {miss}\) selection requirements in data events with \(60<M_{\mathrm {T2}} <100\,\,\text {Ge}\text {V} \). For long tracks, a conservative systematic uncertainty of 100% is assigned, as a correction based on simulation is used and there are insufficient data to study the effect of \(H_{\mathrm {T}}\) and \(p_{\mathrm {T}} ^\text {miss}\) variations.

The \(f_{\text {short}}\) ratio is then used to predict the expected background in events with \(M_{\mathrm {T2}} >100\,\,\text {Ge}\text {V} \), as described in Eq. (10).

In the presence of BSM physics, the above-defined control regions could be affected by signal contamination. Although the expected signal contamination is typically negligible, its potential impact is accounted for in the interpretation of the results, as further described in Sect. 6.

The background prediction is validated in data in an intermediate \(M_{\mathrm {T2}}\) region (\(100<M_{\mathrm {T2}} <200\,\,\text {Ge}\text {V} \)). No excess event yield is observed. The event categorization in this validation region is identical to the signal region, allowing for a bin-by-bin validation of the background prediction.

Figure 4 shows the result of the background prediction validation in 2016 data and in 2017–2018 data. We find good agreement between the observation and the background prediction in the validation region. An additional systematic uncertainty is assigned to cover for discrepancies exceeding statistical uncertainties. The uncertainties in the background prediction are summarized in Table 5 together with their typical size ranges across the search bins.

Fig. 4 Validation of the background prediction method in (upper) 2016 and (lower) 2017–2018 data with \(100<M_{\mathrm {T2}} <200\,\,\text {Ge}\text {V} \), for the disappearing tracks search. The red histograms represent the predicted backgrounds, while the black markers are the observed data counts. The cyan bands represent the statistical uncertainty in the prediction. The gray bands represent the total uncertainty in the prediction. The labels on the x axes are explained in Tables 24 and 25 of Appendix B.2. Regions whose predictions use the same measurement of \(f_{\text {short}}\) are grouped by the vertical dashed lines. Bins with no entry in the ratio have zero predicted background Full size image