What is Better: GMM of Two Gaussians or Two Clusters With One Gaussian?

In this report, we provide a theoretical discussion on temporal data cluster analysis: does the data come from one source or two sources; is it better to cluster the data into two clusters or leave it as one cluster. Here we analyse only the simplest case: when the data comes from two symmetric Gaussian probability-density-functions (pdfs), i.e., with same variance and same absolute value of the mean, with the same prior probability per Gaussian. The data consists of segments with an a-priori known segment length. It will be shown that if the data belongs to two different Gaussian models, the likelihood of two clusters is always higher or equal than the one of a GMM with two Gaussians for any mean, variance, and segment length. If the data belongs to the GMM, the likelihood of two clusters might be either higher or less than the GMM one.

Related material