Ceriotti, MicheleImbalzano, Giulio2021-11-122021-11-122021-11-12202110.5075/epfl-thesis-8457https://infoscience.epfl.ch/handle/20.500.14299/182996Molecular simulations allow to investigate the behaviour of materials at the atomistic level, shedding light on phenomena that cannot be directly observed in experiments. Accurate results can be obtained with ab initio methods, while simulations of large-scale systems are usually possible only with coarse approximations of the molecular interactions. Machine learning interatomic potentials (MLIP) combine the strengths of the two methods in a framework that allows iterative refinement, opening the doors to the investigation of complex systems. Currently, the training of a MLIP is still human-centered. The success or failure is often dictated by the complexity of the system and by the experience of the user with the software. In this thesis, we want to provide some methods that would make the training and validation of the potentials easier and more general, even for complex, heterogeneous systems. We begin by comparing the learning ability of three widely adopted frameworks that have been developed by the community, proving that a well-constructed set of input features allow to learn at similar accuracy datasets of water dimers and trimers. Then, we compare heuristic methods based on the intrinsic correlations of the dataset to automatically identify the "best" inputs out of a larger set of candidates, which results in an accurate description of the system at a low computational cost. This allows to simplify the construction of potentials that use symmetry functions or smooth overlap of atomic positions as inputs. Finally, we introduce and implement a method to cheaply compute the uncertainty of the thermodynamic properties obtained through MD simulations with MLIPs. This method can be used either to assess the confidence of a given result obtained with a MLIP -necessary when we make quantitative predictions of properties- or to safely explore the phase space of interest, with the aid of a fall-back potential that takes over when the MLIP cannot be trusted. We showcase these methods with a real example, in which we train a potential for the complex GaAs system. The MLIP that we have developed is able to accurately predict the behavior across the whole phase diagram, spanning liquid and solid, metallic and semiconducting phases. In this endeavour we investigate a variety of methods to obtain a comprehensive dataset of structures that are fed into the MLIP. To demonstrate the transferability of the potential, we compute multiple properties, some of which (e.g. the liquid surface tension) are well beyond the limits of ab initio methods. We compare these results to our reference calculations and to experiments, finding a good agreement, within the limits of the selected level of theory (DFT at the GGA level). Finally, we use our GaAs MLIP to investigate the behaviour of liquid gallium in contact with the polar [111] surface of solid GaAs. Recent experimental findings assign an important role to the pre-ordering of the liquid at the interface during the growth of GaAs nanowires, pointing to the polarity as one of the main drivers for the correct growth. Our simulations allow to investigate this pre-ordering with increased detail, supporting and complementing the experimental observations. Furthermore, we explore the free energy of As atoms in the liquid Ga, to understand the behaviour of As atoms during the growth to help identifying the ideal growth conditions.enMachine learning potentialsmolecular dynamicsCUR selectionUncertainty estimationDFTgallium arsenideGaAs NanowiresTransferable machine-learning models of complex materials: the case of GaAsthesis::doctoral thesis