SzCORE as a benchmark: report from the seizure detection challenge at the 2025 AI in Epilepsy and Neurological Disorders Conference
Reliable automatic seizure detection from long-term electroencephalogram recordings (EEG) remains an open challenge in machine learning as current models often fail to generalize across patients or clinical settings. Manual EEG review remains the standard of care, highlighting the need for robust models and standardized evaluation. To rigorously assess current machine learning algorithms for automatic seizure detection in long-term EEG from the epilepsy monitoring unit, we organized a challenge. A private dataset of continuous EEG recordings from 65 subjects, totalling 4,360 hours of data, was utilized to evaluate algorithm performance. Expert neurophysiologists annotated these recordings, establishing the ground truth for seizure events. Algorithms were required to generate accurate onset and duration annotations, with performance measured using event-based metrics, including sensitivity, precision, F1-score, and false positive rate per day. The SzCORE framework was employed to ensure standardized evaluation across submissions. The event-based F1-score served as the primary ranking criterion, reflecting the clinical significance of detecting seizures while minimizing false positives. The challenge attracted 30 submissions from 19 teams or individuals, with 28 algorithms successfully evaluated. Results revealed significant performance variability among state-of-the-art approaches, with the top F1 score of 43% (sensitivity 37%, precision 45%), highlighting the persistent difficulty of this task for current machine learning methodologies. This independent evaluation also exposed a notable gap between self-reported efficacies and challenge performance, underscoring the critical need for standardized, rigorous benchmarking in developing clinically viable ML models. A comparison with previous challenges and commercial systems indicates that the best algorithm in this contest surpassed prior methods. Critically, the challenge infrastructure transitions into a continuously open benchmarking platform, fostering reproducible research and accelerating the development of robust seizure detection algorithms by allowing ongoing submissions and integration of additional private datasets. Clinical centres can also adopt this platform to evaluate seizure detection algorithms on their EEG data using a standardized, reproducible framework.
SzCORE_2025_Challenge.pdf
Main Document
Submitted version (Preprint)
openaccess
CC BY
1.53 MB
Adobe PDF
3b0854521a42cb220c3ca16c1ffbb7b3