Mixture models form the essential basis of data clustering within a statistical framework. Here, the estimation of the parameters of a mixture of Gaussian densities is considered. In this particular context, it is well known that the maximum likelihood approach is statistically ill posed, i.e. the likelihood function is not bounded above, because of singularities at the boundary of the parameter domain. We show that such a degeneracy can be avoided by penalizing the likelihood function using a suited type of penalty function. Recently, the resulting penalized maximum likelihood estimator has been proved to be asymptotically well-behaved. Local maximization of the likelihood function can be performed by mean of Green's modified EM algorithm: provided that an inverse gamma is chosen as penalty function, EM re-estimation equations are still explicit and automatically ensure that the estimates are not singular. Numerical examples are provided in the finite data case, showing the performances of the penalized estimator compared to the standard one. Our penalized approach is also compared to a constrained approach, which, up to the authors knowledge, represents the only alternate solution to likelihood degeneracy. Our contribution mainly addresses the case of an independent, identically distributed mixture of Gaussian densities, but the more general case of dependent classes is also tackled, with a particular reference to the important case of hidden Markov models.