In this paper, we propose a novel algorithm for dimensionality reduction that uses as a criterion the mutual information (MI) between the transformed data and their cor- responding class labels. The MI is a powerful criterion that can be used as a proxy to the Bayes error rate. Further- more, recent quadratic nonparametric implementations of MI are computationally efficient and do not require any prior assumptions about the class densities. We show that the quadratic nonparametric MI can be formulated as a kernel objective in the graph embedding framework. Moreover, we propose its linear equivalent as a novel linear dimensionality reduction algorithm. The derived methods are compared against the state-of-the-art dimensionality reduction algorithms with various classifiers and on various benchmark and real-life datasets. The experimental results show that nonparametric MI as an optimization objective for dimensionality reduction gives comparable and in most of the cases better results compared with other dimensionality reduction methods.