X Privacy & Cookies This site uses cookies. By continuing, you agree to their use. Learn more, including how to control cookies. Got It!

Advertisements

A part of my PhD focused on mixture models and, as always with mixture models, this created more problems than I solved.

Mixture models are complicated and many researchers focus more on nonparametric extensions, instead of finite mixture models. Anyway, the research on this field has never ended, probably because they are extremely flexible models and, in practical problems, a parametric model is sometimes preferred than a nonparametric model.

The starting point of this chapter of my PhD was trying to see if the Jeffreys prior could be used in this setting…guess what? The answer is NO! Computing all the derivatives took me so much time! And I made so many errors for days and days and, in the end, the multivariate Jeffreys prior for the complete set of parameters leads to an improper posterior!

Anyway, never get discouraged.

We started to think if there was something good in the Jeffreys prior. Were there good features we could still use and propose to an audience?

Indeed the prior for the weights have an interesting form

It was concentrated around the extreme of the simplex (the picture is for a three-component mixture model), no matter the families of distribution the components come from. How much the Jeffreys prior is concentrated around one of the two extreme depends on the symmetry between the components and how much information (in terms of Fisher information) the component is bringing (a focus on this is in the Supplementary Material of the paper and in the arxiv version).

This is interesting because a reasonable and objective prior for the weights of a mixture model should lead to a conservative behaviour of the posterior distribution, i.e. if we fix a high number of components, the analysis should be able to discover the meaningless components. And the Jeffreys prior allows us to automatically define the level of trust about a particular component through the Fisher information matrix. This is along the line of the work by Judith and Kerrie about overfitted mixture.

With respect to the standard Dirichlet prior with parameters 1/2, the Jeffreys prior represents a more automatic choice (where the choice of the parameters of the Dirichlet prior may seem arbitrary, even when following Judith and Kerrie‘s paper) , which may differ depending on the type of distributions involved in the analysis.

This is a useful feature in an applied context, where, even if people decide to use a finite mixture model, there is always the possibility to have assumed too many components.

We have seen in the paper, through simulation and in some very known datasets used in the mixture literature, that this property is confirmed in applied settings.

An interesting part of the paper came from some chats with some researchers of the Department of Electronic and Electrical Engineering, University College London, UK. They talked with me about a recent trend in computer network systems, the deployment of network functions in software.

“The so-called “software dataplanes” represent an alternative to traditional hardware switched and routers, reducing costs and enhancing programmability. The monitoring of IP packets is, among all possible network functions, one of the most suitable for a software deployment. However, the monitoring has a huge cost in terms of consumed CPU (processing) time by packet. The main reason for this is that each incoming packet triggers the retrieval, from a large hash-table, of all the information related to the packet flow (i.e. the packet’s family). This operation is generally called flow-entry retrieval. The time required for the flow-entry retrieval (retrieval time) mainly depends on whether such information is available in one of the processor caches (e.g. L1, L2, L3) or in memory.”

This project let the group working on a project which has been present in Japan in the 13th CNSM conference in November 2017.

From the point of view of a statistician (me!) the retrieval times may be modelled with a mixture of heavy tailed components (for instance, Gumbel distributions), to see how many clusters of times there are in the analysis. In particular in the case of heavy tail distributions, it is important to understand if there is a extra-component or if the data we see in the tail are already explained by a conservative approach (with less components).

In this respect, we have seen that the Jeffreys prior maintains the property of conservativeness of the number of components (a sort of Occam’s razor) and represents an automatic choice for the prior distribution for the weights of a mixture models.

So we have applied it on the galaxy dataset, on an environmental dataset, on a clinical dataset, on a network dataset in the paper…….and now I am currently using it on a biological dataset, within the CRyPTIC project against Larry Wasserman’s suggestion (from few years ago) “I have decided that mixtures, like tequila, are inherently evil and should be avoided at all costs.“

Advertisements