Nonparametric inference in multivariate mixtures


We consider mixture models in which the components of data vectors from any given subpopulation are statistically independent, or independent in blocks. We argue that if, under this condition of independence, we take a nonparametric view of the problem and allow the number of subpopulations to be quite general, the distributions and mixing proportions can often be estimated root-n consistently. Indeed, we show that, if the data are k-variate and there are p subpopulations, then for each p ⩾ 2 there is a minimal value of k, kp say, such that the mixture problem is always nonparametrically identifiable, and all distributions and mixture proportions are nonparametrically identifiable when k ⩾ kp. We treat the case p = 2 in detail, and there we show how to construct explicit distribution, density and mixture-proportion estimators, converging at conventional rates. Other values of p can be addressed using a similar approach, although the methodology becomes rapidly more complex as p increases.

In Biometrika