Data-Efficient and Fast Machine Learning Molecular Dynamics through Integrated Active Learning and Knowledge Distillation
We develop data-efficient machine learning interatomic potentials (MLIPs) for fast molecular dynamics simulations combining DeePMD and MACE models within an active learning and knowledge distillation framework.Using liquid water as a case study, we first independently train DeePMD and MACE models from scratch through active learning.We find that MACE requires around 3.5 times less training data than DeepMD, but its inference speed is 10 times lower.We also show that starting from a pretrained foundation model based on the MACE architecture further reduces the training data by a factor of 7, resulting in a fine-tuned foundation model with a 25 times data reduction compared to DeePMD.To overcome the limitation associated with the lower inference speed of MACE potentials, we next develop a knowledge distillation scheme to train a DeePMD potential from the fine-tuned foundation model through an inexpensive active learning workflow.The distilled model is generated with #10 times less computer time than the DeePMD model trained from scratch, while showing the same fast inference speed.Comparison with ab initio calculations shows that all the models reach the same level of accuracy in reproducing structural, vibrational, and diffusive properties of liquid water.Our approach enables practical, data-efficient training of customized MLIPs with high speed and accuracy.
chemrxiv.15002964_v1.pdf
Main Document
Submitted version (Preprint)
openaccess
CC BY
1.95 MB
Adobe PDF
26f9ac0bef042c8d1ce148eba0912e16