Abstract

Irregular sampling of time series in electronic health records (EHRs) presents a challenge for the development of machine learning models. Additionally, the pattern of missing data in certain clinical variables is not random, but depends on the decisions of clinicians and the state of the patient. Point process is a mathematical framework for analyzing event sequence data that is consistent with irregular sampling patterns. To tackle the challenges posed by EHRs, we propose a transformer event encoder with point process loss that encodes the pattern of laboratory tests in EHRs. We conduct experiments on two real-world EHR databases to evaluate our proposed approach. Firstly, we learn the transformer event encoder jointly with an existing state encoder in a self-supervised learning approach which gives superior performance in negative log-likelihood and future event prediction. Additionally, we propose an algorithm for aggregating attention weights that can reveal the interaction between the events. Secondly, we transfer and freeze the learned transformer event encoder to the downstream task for the outcome prediction (mortality and sepsis shock), where it outperforms state-of-the-art models for handling irregularly-sample time series. Our results demonstrate that our approach can improve representation learning in EHRs and can be useful for clinical prediction tasks.

Details