This paper discusses the problems raised by the optimization of a mutual information-based objective function, in the context of a multimodal speaker detection. As no approximation is used, this function is highly nonlinear and plagued by numerous local minima. Three different optimization methods are compared. The Differential Evolution algorithm is deemed to be the best for the problem at hand and, consequently, is used to perform the speaker detection.