In this paper we propose a novel information theoretic criterion for optimizing the linear combination of classifiers in multi stream automatic speech recognition. We discuss an objective function that achieves a trade-off between the minimization of a bound on the Bayes probability of error and the minimization of the divergence between the individual classifier outputs and their combination. The method is compared with the conventional inverse entropy and minimum entropy combinations on both small and large vocabulary automatic speech recognition tasks. Results reveal that it outperforms other linear combination rules. Furthermore we discuss the advantages of the proposed approach and the extension to other (non-linear) combination rules.