Sawatlon, BoodsarinWodrich, Matthew D.Meyer, BenjaminFabrizio, AlbertoCorminboeuf, Clemence2019-12-112019-12-112019-12-112019-08-2110.1002/cctc.201900597https://infoscience.epfl.ch/handle/20.500.14299/163899WOS:000498036500058The speed and precision of machine-learning (ML) techniques in determining quantum chemical properties has resulted in a considerable computational speed up in comparison to traditional quantum chemical methods, and now allows a desired property of thousands of molecules to be assessed virtually instantaneously. The large databases that result from employing ML can, in turn, be mined with the goal of uncovering relationships that may be missed through more commonly used small scale screening procedures. Due to its prominent place in chemistry, catalysis represents a particularly fruitful playground, where drawing connections between the quantum chemical properties of catalysts and their overall catalytic performance may lead to the identification of new, highly functional species. In this spirit, we previously trained ML models to predict the performance of 18000 prospective catalysts for a Suzuki coupling reaction using molecular volcano plots. Here, we apply concepts from big data to probe a type of "C-C cross-coupling genome" that explores results from many different named cross-coupling reactions. The use of interactive dimensionality-reducing data-clustering maps facilitates the identification of relationships between the thermodynamics of different catalysts and the chemical properties of their constituent metal and ligands. Analyzing large numbers of species in this manner leads to the identification of not only unexpected catalysts that have thermodynamically ideal profiles to catalyze C-C cross-coupling reactions, but also reveals a wealth of interesting chemical trends regarding the influence played by different metals and ligands, as well as their unique combinations.Chemistry, PhysicalChemistrybig datamachine learningdft calculationsvolcano plotslinear scaling relationshipsmolecular-orbital methodsligand knowledge-basevalence basis-setsmachine-learning predictionhigh-throughput discoveryn-heterocyclic carbenessplit-valenceorganometallic chemistryreductive eliminationoxidative additionData Mining the C-C Cross-Coupling Genometext::journal::journal article::research article