Effects of hardware heterogeneity on the performance of SVM Alzheimer's disease classifier
Fully automated machine learning methods based on structural magnetic resonance imaging data can assist radiologists in the diagnosis of Alzheimer's disease (AD). These algorithms require large data sets to learn the separation of subjects with and without AD. Training and test data may come from heterogeneous hardware settings which can potentially affect the performance of disease classification. A total of 518 MRI sessions from 226 healthy controls and 191 individuals with probable AD were used to systematically investigate the effect of different hardware (i.e. vendor, field strength, coil system) on the performance of support vector machine (SVM) classifiers. Data from the multicenter Alzheimer's Disease Neuroimaging Initiative (ADNI) were used in this study. We compared the change of the SVM decision value resulting from (a) changes in hardware against the effect of disease and (b) changes resulting simply from rescanning the same subject on the same machine. Maximum accuracy of 87% was obtained with a training set of all 417 subjects. Classifiers trained with 95 subjects in each diagnostic group and acquired with heterogeneous scanner settings had an empirical detection accuracy of 83.9 ± 2.2 % when tested on an independent set of the same size. These results mirror the accuracy reported in recent studies. Encouragingly, classifiers trained on images acquired with homogenous and heterogeneous hardware settings had equivalent cross validation performances. Two scans of the same subject acquired on the same machine had very similar decision values and were generally classified into the same group. Higher variation was introduced when two acquisitions of the same subject were performed on two scanners with different field strengths. The variation was unbiased and similar for both diagnostic groups. The findings of the study encourage the pooling of data from different sites to increase the number of training samples and thereby improving performance of disease classifiers. Although small, a change in hardware could lead to a change of the decision value and thus diagnostic grouping. Based on these results we recommend to report the level of diagnostic confidence when these methods are applied in a clinical setting involving different sets of hardware.