Harmonizing neuropsychological test data across prospective studies.
INTRODUCTION: Alzheimer's disease (AD) research relies on large datasets and advanced statistical models. However, individual population studies often lack sufficient sample size for conclusive results. Harmonizing cognitive test data across studies can address this gap, despite differences in testing protocols. This study harmonizes cognitive data from three major AD cohorts to support robust clinical-pathological modelling. METHODS: Information from the Alzheimer's Disease Neuroimaging Initiative (N = 1446); Australian Imaging, Biomarkers and Lifestyle (N = 1764); and Open Access Series of Imaging Studies-3 (N = 440) were integrated, including cognitive scores, demographics, genetics, and clinical and neuroimaging data. Neuropsychological tests relevant to AD were harmonized using MissForest, a machine learning-based imputation method. Validation involved assessing imputation accuracy and analyzing composite cognitive scores across clinical-pathological groups. RESULTS: Imputation showed high accuracy (mean absolute error ≤ test-retest variability in cognitively unimpaired participants). Composite scores reflected known disease patterns with significant stratification across clinical-pathological groups. DISCUSSION: The validated harmonization approach demonstrated reliable imputation, enabling more powerful AD models and supporting future diagnostic and therapeutic advances.