Extracting Angular Observables without a Likelihood
and Applications to Rare Decays
Abstract
Our goal is to obtain a complete set of angular observables arising in a generic multibody process. We show how this can be achieved without the need to carry out a likelihood fit of the angular distribution to the measured events. Instead, we apply the method of moments that relies both on the orthogonality of angular functions and the estimation of integrals by Monte Carlo techniques. The big advantage of this method is that the joint distribution of all observables can be easily extracted, even for very few events. The method of moments is shown to be robust against mismodeling of the angular distribution. Our main result is an explicit algorithm that accounts for systematic uncertainties from detectorresolution and acceptance effects. Finally, we present the necessary processdependent formulae needed for direct application of the method to several rare decays of interest.
I Introduction
Our initial motivation for studying what we wish to call the method of moments is the determination of angular observables in the rare FCNCmediated decay . However, the method we describe in the following is general and applies to all decay or scattering processes that can be formulated in terms of an orthogonal basis of angular functions. We find a previous work Dighe et al. (1999) that advocates this method chiefly for the determination of angular observables in nonleptonic decays but also mentions the applicability to semileptonic decays. Our aim is to improve upon this previous work by studying the uncertainties that are introduced by mismodeling of the angular distribution, and by working out a recipe to determine and unfold detector effects. The latter is crucial for the application of the method to real data. We show that the method of moments has several major advantages over the usual approach based on likelihood fits:

Likelihood fits have convergence problems for a small number of events, and can require reparametrizations and/or approximations for a successful fit to the signal PDF. As an example, see the LHCb analysis of the angular distribution in decays Aaij et al. (2013).
The method of moments does not require any such reparametrizations or approximations.

Likelihood fits can be unstable in case the underlying physical model is only partially known. This can lead to overestimating the number of physical parameters, and consequently inhibits the convergence of the fits. As an example, this type of problem occured in toy studies of the decay as reported in Egede and Reece (2008). It was subsequently solved when a missing symmetry relation between the angular observables was found and applied, thereby reducing the number of fit parameters.
In contrast, we will show that the method of moments does not require information on the correlations between model parameters as an input. Instead, it yields the correlations as an output, at the expense of somewhat larger uncertainties.

Mismodeling the underlying physics model can result in systematic bias in likelihood fits.
We will show that the method of moments is insensitive to a certain type of mismodeling; i.e. introducing a cutoff in a partialwave expansion of the signal PDF.

Using the method of moments, the joint probability distribution of the angular observables rapidly converges towards a multivariate Gaussian distribution. This allows an easy transfer of correlation information from the experiments to interested theorists.
We continue with basic definitions that pertain to angular observables, and our results in the subsequent sections. Let denote the set of all angles, and let denote the set of all other nonangular kinematic variables needed to fully specify the final state of the process under study. For example, may include invariant masses or centerofmass energies. We define an angular observable as a coefficient in the probability density function (PDF), , of the process by
(1) 
Here, the dependence on the decay angles has been explicitly factored out in terms of the angular functions . We assume there exists a dual basis of functions such that the orthonormality relations
(2) 
hold with representing the full angular phase space relevant to the process. For particle decays, is generally expressed in terms of the fully differential decay width,
(3) 
where is the total decay width. For a scattering process, one can similarly use
(4) 
where the total cross section is used for the
normalization. Since the determination of the total decay width or
total cross section can be quite difficult, we emphasize that
different normalizations for can be used. For instance, the total
decay width (or cross section) of the process of interest can be
replaced by the corresponding quantity of a controlchannel
process. This change of normalization is equivalent to a linear rescaling of
the angular observables ; thus ratios or similar suitable combinations
of the angular observables are not affected by a change of normalization.
Our method is an extension of the classical method of moments with orthogonal functions (James, 2006, sec. 8.2). The only difference is that conventionally the angular functions are assumed selfdual, . However, it suffices that the system of angular functions can be transformed into an orthonormal basis. We find it convenient to work in the basis of Legendre polynomials that are not selfdual. Our approach covers the selfdual case, provided that one replace appropriately. Using the ansatz
(5) 
the dual basis needs to be worked out case by case through solving the linear system of equations (2). For a selection of hadron decays with a quark in the initial state and two leptons in the final state, we list the dual bases in a series of appendices A through C. Note that a similar analysis was done in Dighe et al. (1999) for the decays and .
In the remainder of this letter we discuss how to obtain an angular observable in an experimental setup where each recorded event is (approximately) distributed according to . We establish the statistical basics in section II. Section III is dedicated to the impact of systematic effects such as mismodeling the underlying physics or detector acceptance effects. Numerical studies for one uniangular and one tripleangular distribution are provided in section IV.
Ii SampleBased Determination
The orthonormality relations eq. (2) imply that a single angular observable can be projected out of the full PDF as
(6) 
where denotes a dual basis of angular
functions, and represents the entire angular phase space. In general,
may differ from . This is the
case for our selection of applications in appendices A through
C.
It is sensible to refer to the angular observable as the
moment of the PDF . We emphasize that a relation of
type eq. (6) holds for any combination of a density
written as in eq. (1) and an orthonormal basis of angular
functions ; i.e., there is no unique
basis of angular functions. For the proof we refer to ref. Dighe et al. (1999).
Integration over the nonangular variables yields
(7) 
The remainder of this section describes the method of moments, in which we replace the analytical integration by Monte Carlo (MC) estimates. The central tenet of MC integration is the fact that the expectation value of some function under the probability density ,
(8) 
can be approximated by the consistent and unbiased estimator (James, 2006, sec. 8.2)
(9) 
due to the strong law of large numbers for , assuming
that the variates , , are distributed as
.
Throughout this letter we denote all MC estimators with a wide hat.
Application of eq. (9) then yields
(10) 
It is often of interest to obtain observables integrated over certain bins of . We define
(11)  
(12)  
(13) 
where the argument of the indicator function is to be interpreted componentwise. Application of eq. (9) immediately yields
(14) 
For notational simplicity, let us forget about the integration and consider only . In the limit , the central limit theorem (CLT) implies that the random vector
(15) 
follows a multivariate Gaussian distribution centered on the true value with the covariance estimated as
(16)  
In our physics applications, the parameter space is compact and each
is bounded. Hence the requisites for the most basic
version of the CLT to hold — finite mean and covariance of
— are automatically satisfied.
In our numerical analysis the sample covariance rapidly converges towards the true covariance
matrix; see also section IV.
Compared to the usual maximumlikelihood approach, we find for the method of moments:

The angular observable can be determined independently of any other observable . It is therefore much more robust to physics assumptions needed to define the full likelihood. In particular, this means one does not have to be specific regarding the form of newphysics contributions; in fact, one does not even need to be able to explicitly formulate the likelihood at all.

It is superior for a small number of samples . Likelihood fits tend to be numerically unstable if lots of parameters need to be estimated from sparse data. This is more severe if the mode of the likelihood is near the boundary of the physically allowed region Lehmann and Casella (1998). For some of these decays of interest, there are only events recorded per bin.

The estimate is unbiased for any . In contrast, the maximumlikelihood estimate has a bias of order Cox and Snell (1968). In practice, one should keep in mind the biasvariance tradeoff: it is a well known phenomenon that removing the bias usually leads to an increase in variance of the sampling distribution of the estimator and vice versa (James, 2006, sec. 7.3). From a Bayesian decisiontheory point of view, both contribute similarly to the expected loss associated with deciding on just one value of the unknown parameter. One should therefore not prefer the method of moments over likelihood fits just because the former reduces the bias (Jaynes and Bretthorst, 2003, sections 13.8,17.2). In fact, for the results discussed below in section IV, the likelihood fits — if they converge — exhibit a negligible bias and produce uncertainties – smaller than those from the method of moments.

The approximate multivariate Gaussian distribution of allows easier and more precise transfer of the information in the data to interested theorists for more accurate fits of standardmodel and newphysics parameters Altmannshofer and Straub (2013); DescotesGenon et al. (2013a); Beaujean et al. (2014), or for more precise predictions of optimized observables; see e.g., Egede et al. (2008, 2010); Bobeth et al. (2010); Becirevic and Schneider (2012); Bobeth et al. (2013); Matias et al. (2012); DescotesGenon et al. (2013b) for definitions of such optimized observables in decays, Faller et al. (2014) for application to the decay , and Böer et al. (2015) for observables in ). While the likelihood also approaches a multivariate Gaussian as , the two methods differ in their utility as input for theorists if is not well inside the physical region. For example, suppose there are two angular observables that are constrained to a triangular region by phasespace or unitarity arguments as
(17) It may (and often does) happen in practice that is close or even outside the allowed region such that a significant part of the probability mass covers unphysical values. In a Bayesian fit, one would take as the sampling distribution of the “data”, and simply set a uniform prior on the triangle in the plane defined by eq. (17) to have a well defined problem. This could be trivially combined with other independent information in a global fit. Someone with a different physics model might have to consider a different physical region, and could incorporate it just as easily.
For a likelihood fit, the constraint needs to be part of the analysis performed by the experimental collaboration, and the resulting likelihood as a function of may be distinctly not Gaussian. Communicating such a result has proved to be challenging due to technical reasons (such as data formats, size etc.) This leads to the undesirable situation that only the mode and standard errors are reported, and theorists often include the results as independent measurements with a Gaussian distribution and disregard the boundary problem as well as correlations altogether.
Iii Sources of Systematic Uncertainties
In section II, we assume that the PDF describes the underlying physics accurately, and that the experiment observes each event with perfect accuracy. In order to estimate systematic uncertainties, we lift these assumptions.
iii.1 Mismodeling due to Contributions by Higher Partial Waves
In several interesting processes we might only have an approximate
result for . In this section, we focus on one particular class of
mismodeling of the signal PDF: the angularmomentum cutoff in partialwave expansions. This mismodeling potentially affects a large number
of decays and scattering processes. For the sake of clarity we take
the interesting class of fourbody decays
as an example^{1}^{1}1This includes the rare mediated decay , and the suppressed decay . For both examples the PDF is known in
the smallwidth approximation and when assuming a pure wave
resonant final state. An extension to interference has been
studied for
Blake et al. (2013); Becirevic and Schneider (2012), and for Faller et al. (2014). For a first study
of ,
and interference, see Das et al. (2014)
.
Within existing analyses, the PDFs of these decays are usually expressed in terms of one or a few
partial waves of the dimeson system. However, the angular momentum of the dimeson system is unbounded
from above, and gives rise to an infinite set of angular observables.
For the selected class of decays, we can describe that problem as follows: The PDF has a fixed dependence on the dilepton helicity angle and the azimuthal angle . (See also appendix C for details on the angular distribution.) However, at the level of decay amplitudes the dimeson system can have an arbitrarily large total angular momentum ; only its third component is restricted to . It is then convenient to compute the angular distribution explicitly in terms of and , but leave the angular observables dependent on the remaining helicity angle of the dimeson:
(18) 
We advocate here that is a sensible procedure to perform an expansion in terms of Legendre polynomials with respect to the remaining angle . Here, the angularmomentum indices and follow from the usual rules for addition of the angular momenta and of the partialwave expansion of the underlying amplitude and its complex conjugate: , and .
For the decay at hand, we consider the partialwave expansion for the angular observables (see Lee et al. (1992) and appendix D)
(19) 
which reads
(20) 
The normalization factor is defined in eq. (64). Within our example we have (cf. also appendix C and Lee et al. (1992))
(21) 
The angular observables – as defined in eq. (20) – have
the merit of a well defined total angular momentum, and thus are physically
distinguishable. As a consequence of the orthogonality of the Legendre
polynomials, any mismodeling (or rather, lack of modelling) of higher
partialwave observables does not affect the method of moments as
discussed in the previous section. That is to say, adding further (orthogonal) terms
to the PDF only appends observables to , but does not change the leading
elements. The same applies to the covariance.
Unfortunately, this benefit on the experimental side is accompanied with a
theoretical draw back. Each observable consists of an
infinite sum of bilinears of partialwave amplitudes. It remains for
theoretical analyses to estimate or calculate the impact of partial waves
beyond the S and P wave contributions. (For a first
study has been carried out where contributions up to the D wave are
investigated Das et al. (2014)).
We wish to emphasize that detector acceptance effects systematically affect the expansion for any basis of angular functions, including the one suggested in this section. Nevertheless, the expansion in terms of Legendre polynomials as suggested above provides means to cope with these effects, as we discuss in the following subsection.
iii.2 Recipe for Including Detector Effects
Ascertaining a detector’s performance to detect signal events with accurate determination of the event’s angles is generally a difficult task. In the ideal case, one would have an explicit probabilistic model of the detector acceptance and could thus write down the full forward model from which the measured events arise. In practice, that is not feasible, and one is forced to simplify the model. The standard approach is to generate the true particle events , , from a PDF assumed to describe the bare physical process, and to propagate those particles through a detailed simulation of the detector. The observable traces that the particles leave in the detector are fed into reconstruction algorithms resulting in the detector events , with . In general, the distribution of the detected events is
(22) 
Here is the probability distribution of the true events, the normalization constant is given by
(23) 
The kernel is usually decomposed as
(24) 
where the PDF models the resolution effects and the unnormalized density is the detector acceptance function. Perfect resolution corresponds to
(25) 
In what follows, we propose a systematic method to unfold all effects of through MC simulations and the method of moments, using that can be expanded  at least formally  in Legendre polymials. For illustration, we proceed with the explicit example of a uniangular^{2}^{2}2The generalization of this section to multiangular PDFs is straighforward. It can be achieved by promoting to a vector, promoting the Legendre polynomials to products of independent polynomials or spherical harmonics, and promoting the indices to multiindices. PDF with . Let us define the PDF in terms of the Legendre polymials (i.e., ) and angular observables as
(26) 
where denotes an angularmomentumlike index associated with the observables. Normalization of is equivalent to choosing . Requiring implies for . More stringent relations between the might hold, but are of no concern here. For later use, we define
(27) 
and note that
(28) 
is a valid PDF. The dual basis of angular functions follows then from the normalization and orthogonality of the Legendre polynomials, and one therefore has
(29)  
(30) 
We now define the simulated raw moments as
(31) 
which are instrumental to our recipe. Monte Carlo estimators of these moments can be constructed from specifically crafted detector events , , where
(32) 
In words, for each it is required to generate events from a toy physical distribution , for which and all other observables are set to zero. Next, propagate these events through a detector simulation. The normalization is chosen such that . We emphasize that can be estimated as , where corresponds to the number of simulated true events. The estimators then read
(33) 
Linearity of the integral over and convergence of the expansion of in terms of Legendre polynomials ensures that
(34) 
We call the matrix the unfolding matrix, which is specific to the decay at hand. Given our definition of in eq. (27) it is easy to see that
(35) 
and its MC estimator can be obtained through the replacements .
In order to finally extract the angular observables from data, we use the measured raw moments. Their MC estimator — based on the detected events , — reads
(36) 
We then obtain MC estimators of the angular observables via
(37) 
Apparently we now face a circular dependence. On the one hand, the estimators and thus also are proportional to , the ratio of detected events over occuring events. On the other hand, depends by construction (see eq. (23)) on , and thus on the true value of the angular observables . This dependence is broken by the fact that the MC estimators need to fulfill the selfconsistency condition
(38) 
(For the processdependent conditions in the multiangular case see eq. (55) and
eq. (61).)
This selfconsistency condition is tightly related to the determination of
the branching ratio of the underlying decay, and we therefore suggest to
carry out a combined analysis for the determination of the branching ratio
and the extraction of the angular observables.
Moreover, in the applications to decays only ratios and similar independent combinations
of the angular observables are of interest.
We note that as determined from eq. (37) does not fully correspond to for arbitrary detector acceptance . Assuming an expansion of in terms of Legendre polynomials up to a given order , we have to calculate the raw moments up to . The MC estimators of the corrected angular observables then take the following structure:
(39) 
The method is consistent as long as the MC estimators for the ‘‘superfluous’’
observables are compatible with zero. ^{3}^{3}3This holds only for the
physical model of uniangular decay distributions. If the physical model
involves a partialwave expansion with cutoff, these superfluous
observables correspond to higher partial waves that have been suppressed in
the physical model; see appendix C for such a case. The value depends
on the setup of the particle detector under consideration, and remains to
be determined just as in studies that carry out a likelihood fit.
The accuracy of the unfolding process as outlined above critically depends on
both the accuracy of the detector simulation, as well as the uncertainties
induced by the MC estimates. For an experimental analysis, one would now turn
to the determination of the distribution of as a function of the
detector setup. This would involve the determination of both and
for a number of detector configurations, and subsequent profiling
or marginalization. While such considerations of any detector simulation are
beyond the scope of this work, we can, however, comment on the MCinduced
uncertainties. As usual, one needs to find a balance between compute time and
accuracy. For a uniangular distribution and MC samples, we find
that the error on the mean of each matrix element is .
This suggests that the soinduced systematic error can be driven below any
statistical uncertainty.
An alternative method to unfold detector effects is weighting the data on an
eventbyevent basis, with each weight corresponding to the inverse of the detection
efficiency.
Let us conclude this section by commenting that parts of the unfolding matrix are universal in a sense: They can be reused in analyses with a similar underlying decay. Therefore, computing resources spent on improving the accuracy of are not wasted.
Iv Toy Studies
We now study the performance of the proposed method. In order to do so, we simulate
individual events for two separate physical processes: one uniangular, and one triangular decay
distribution. We repeat the analysis for varying sample sizes, ranging from
to events. Our toy analyses are based on SM predictions for angular observables
in the decays and .
In order to faithfully investigate the performance of the method of moments, we repeat our
numerical studies for several bins in the kinematic range ,
as well as . Here, the bin width is chosen either
as or . This setup is meant to ensure that a wide spectrum of possible
values for the angular observables is investigated.
Our findings can be summarized as follows:

In all studied cases we observed not a single bias in the distribution of the of any observables ,
(40) Here refers to the true (input) value for the angular observables, refers to the mean of the pseudo measurement via the MC estimate, and refers to an estimator of the standard deviation of the pseudo measurement. All pull distributions obtained in our studies can be successfully fitted to a Gaussian distribution. Out of the large number of studied distributions, we only show the pull distributions for the observables and obtained from SMlike decays as representative examples. We generated toy studies with events per study in figure 2.

We study the MC estimate for the absolute uncertainty with respect to the angular observable as a function of the number of simulated events . As expected for a multivariate Gaussian distribution, we find that the absolute uncertainty is well fitted by
(41) with , regardless of the absolute size of . The latter can best be shown for the example of uncertainties of two observables. Taking again () and () for SMlike decays, we show the absolute uncertainty in figure 3. We find that the the method of moments yields uncertainties on that are roughly – larger than those obtained from maximumlikelihood fits and for the same number of events. However, we wish to note that said fits only produce a limited subset of the angular observables, and the statistical error of the fit is expected to increase with the number of fit parameters until their errors saturate the statistical errors of the methodofmoments estimators (Cowan, 1998, sec. 8).

We also compare the results as obtained by the method of moments with results obtained by a conventional likelihood fit. In particular, we study the correlation between the methodofmoment estimators and the maximumlikelihoodfit estimators. We run toy analyses, with simulated events per analysis. We show the joint distribution of the two estimators in figure 4. The two estimators are highly correlated. The distribution of the difference of the estimators exhibits now bias, which is due to the large number of simulated events. Still, we find that statistical uncertainty on the difference of the two estimators is sizeable and can easily become half as large as the statistical uncertainty of either estimator.
We emphasize that the above results have been obtained for a flat acceptance function. We also study the behavior of the unfolded angular observables. For simplicity we limit our study to the decay , with its three angular observables to . We express the acceptance function in terms of Legendre polynomials ,
(42) 
This acceptance function approximates the one used in a recent study of the
angular observables in decays Aaij et al. (2014).
Focusing on effects of the unfolding process itself, we use the analytical
expression for the unfolding matrix, which we compute from the raw moments as
defined in eq. (31). Simulting toy analyses with
up to simulated events each, we find that the previous bullet points
still hold; i.e., we do not find any bias, and the distribution is well
described by a multivariate Gaussian distribution. The latter only holds as
long as the number of events per experiment exceeds .
All of our toy studies, as summarized above, show consistently that the joint distribution of the angular observables converges rapidly towards a multivariate Gaussian distribution. We therefore propose to publish the results in the form of the physical components of and .
V Summary
We have carried out a combined analytical and numerical study of the method of
moments; a method for the extraction of angular observables from the angular
distribution of a general multibody process. We have studied the performance
of the method of moments using pseudo data derived from the SM predictions for
one uniangular decay () and one triangular decay (). From this, we find rapid convergence of the joint
likelihood of the angular observables towards a multivariate Gaussian. We draw
the conclusion that this method exhibits several benefits in the determination
of angular observables when compared with a maximumlikelihood fit.
First, we find no bias in the determinations of the angular
observables even for a small number of events. However, due to fewer
model assumptions, the uncertainty on the mean values increases by
roughly – compared
to likelihood fits.
Second, in the absence of detector effects, the method of moments does
not rely on model assumptions for the partialwave composition of the
PDF. This is explicitly shown for the case of higher
partial waves in multibody final states.
Third, we develop a systematic method for the determination of
detector effects that lead to dilution and mixing of the angular
observables. We present an algorithm to calculate the necessary
unfolding matrix, which is computationally feasible only when using
the method
of moments. The algorithm also accounts for higher partial waves.
Fourth, the joint distribution of the angular observables resulting
from the method of moments is well approximated by a multivariate
Gaussian distribution even for small number of events both
for the ideal uniform acceptance and a realistic example. This
facilitates the precise transfer of correlation information
to subsequent theoretical analyses.
Last but not least, the resulting distribution arises without the need
for additional model constraints. Thus more observables can be
inferred from the same data than in a likelihood fit. In addition,
the results from the method of moments can be more easily averaged or combined; e.g., in global fits.
In conclusion, we argue that the method of moments is a competitive alternative to maximumlikelihood fits if angular distributions are involved. We wish to raise the interesting prospect of extending this method to applications that feature PDFs composed from nonangular orthogonal bases.
Acknowledgements.
We thank Ulrik Egede for insightful discussions and helpful comments on the manuscript. We are grateful to Robert Fleischer, Gudrun Hiller, Martin Jung and Konstantinos Petridis for helpful discussions, as well as Claus Grupen for helpful advice in searching the literature. N.S. acknowledges the support of the Swiss National Science Foundation, PP00P2144674. The work of D.v.D has been supported by the Bundesministerium für Bildung und Forschung (BMBF).Appendix A Application to
The PDF for the decay has been calculated for the most complete basis of dimensionsix operators. It reads Bobeth et al. (2007, 2013)
(43)  
with the conventional observables through , and . We conveniently use the Legendre polynomials , ,
(44) 
as our basis of angular functions. Our basis of angular observables then translates to the conventional basis as
(45) 
In this case, the dual basis is simply given by such that
(46) 
Appendix B Application to
The PDF for the decay — in the presence of StandardModel operators and their chiralityflipped counter parts — reads Böer et al. (2015)
(47)  
where denotes the dilepton mass squared, and denote the
helicity angles in the dilepton and
systems, respectively, and denotes the azimuthal angle.
The index should be interpreted as a multiindex, ,
where and denote the total angular momentum in the
dilepton and the system, respectively, and
is the third component of either of the angular momenta.
Our choice of an orthonormal basis reads
(48) 
and its dual is
(49) 
The correspondence between out choice of angular observables in the angular momentum basis, and the angular observables as defined in reference Böer et al. (2015) reads
(50) 
and
(51)  
and
(52)  
where the decay width is
(53) 
The dual basis is chosen such that
(54) 
For the purpose of unfolding acceptance effects as laid down in section III.2, it is instrumental
to know that , and that .
The recipe’s generating PDFs are therefore , with
(55) 
and where is now also a multiindex representing .
Appendix C Application to
The PDF for the decay — up to and including Pwave contributions — has been calculated for the most general basis of dimensionsix operators. It reads, expressed in terms of the angular observables Blake et al. (2013); Bobeth et al. (2013)
(56)  
where is the dilepton helicity angle; is the helicity angle; is the azimuthal angle; and is the square of the dilepton mass. The differential decay width reads
(57) 
It is convenient to define the basis of angular functions and its dual in terms of associated Legendre polynomials . The index should thus be interpreted as a multiindex , , where and denote the total angular momentum in the dilepton and the system, respectively, and is the third component of either of the angular momenta. We use the same bases of angular functions as given in eq. (48) and eq. (49) for the decay . In that case, the angular observables correspond to the usual choice of observables via
(58) 
and
(59)  