Although linear perfect diffusion primitives, i.e. MDS matrices, are widely used in block ciphers, e.g. AES, very little systematic work has been done on how to find ``efficient'' ones. In this paper we attempt to do so by considering software implementations on various platforms. These considerations lead to interesting combinatorial problems: how to maximize the number of occurrences of 1 in those matrices, and how to minimize the number of pairwise different entries. We investigate these problems and construct efficient $4\times4$ and $8\times8$ MDS matrices to be used e.g. in block ciphers.