SzCORE as a benchmark: report from the seizure detection challenge at the 2025 AI in Epilepsy and Neurological Disorders Conference

Dan, Jonathan; Shahbazinia, Amirhossein; Kechris, Christodoulos; Atienza, David

preprint

Dan, Jonathan

•

Shahbazinia, Amirhossein

•

Kechris, Christodoulos

May 19, 2025

Reliable automatic seizure detection from long-term electroencephalogram recordings (EEG) remains an open challenge in machine learning as current models often fail to generalize across patients or clinical settings. Manual EEG review remains the standard of care, highlighting the need for robust models and standardized evaluation. To rigorously assess current machine learning algorithms for automatic seizure detection in long-term EEG from the epilepsy monitoring unit, we organized a challenge. A private dataset of continuous EEG recordings from 65 subjects, totalling 4,360 hours of data, was utilized to evaluate algorithm performance. Expert neurophysiologists annotated these recordings, establishing the ground truth for seizure events. Algorithms were required to generate accurate onset and duration annotations, with performance measured using event-based metrics, including sensitivity, precision, F1-score, and false positive rate per day. The SzCORE framework was employed to ensure standardized evaluation across submissions. The event-based F1-score served as the primary ranking criterion, reflecting the clinical significance of detecting seizures while minimizing false positives. The challenge attracted 30 submissions from 19 teams or individuals, with 28 algorithms successfully evaluated. Results revealed significant performance variability among state-of-the-art approaches, with the top F1 score of 43% (sensitivity 37%, precision 45%), highlighting the persistent difficulty of this task for current machine learning methodologies. This independent evaluation also exposed a notable gap between self-reported efficacies and challenge performance, underscoring the critical need for standardized, rigorous benchmarking in developing clinically viable ML models. A comparison with previous challenges and commercial systems indicates that the best algorithm in this contest surpassed prior methods. Critically, the challenge infrastructure transitions into a continuously open benchmarking platform, fostering reproducible research and accelerating the development of robust seizure detection algorithms by allowing ongoing submissions and integration of additional private datasets. Clinical centres can also adopt this platform to evaluate seizure detection algorithms on their EEG data using a standardized, reproducible framework.

Name

SzCORE_2025_Challenge.pdf

Type

Main Document

Version

Submitted version (Preprint)

Access type

openaccess

License Condition

CC BY

Size

1.53 MB

Format

Adobe PDF

Checksum (MD5)

3b0854521a42cb220c3ca16c1ffbb7b3