Test-retest reliability of FreeSurfer measures of neurodegeneration.
Reliable structural brain measurements are essential for studying neurodegeneration and for designing adequately powered aging and Alzheimer's disease (AD) research. We evaluated the test-retest reliability of FreeSurfer 7.1 morphometric measures in 100 older adults (mean age 73.5 years) ranging from cognitively unimpaired to dementia. Each participant underwent two T1-weighted 3T MRI scans on the same scanner within a short interval (mean 5.5 weeks), minimizing biological change. Segmentation was performed in both standard cross-sectional and longitudinal FreeSurfer modes, focusing on AD-relevant volumes of entorhinal cortex, hippocampus, lateral ventricles, choroid plexus, and the AD cortical thickness signature. Reliability was quantified using absolute and root-mean-square test-retest differences, standard deviation of differences, and intraclass correlation coefficients. Longitudinal processing improved precision by 15-50% across most measures compared with cross-sectional processing, with the largest gain observed for entorhinal thickness. Larger, anatomically well-defined regions (e.g., hippocampus, AD signature) demonstrated higher reliability than small structures or those with complex geometry (e.g., entorhinal cortex, choroid plexus). Image quality, indexed by the Euler characteristic, was the only factor significantly associated with measurement variability; reliability was unrelated to age, sex, cognitive status, inter-scan interval, or amyloid/tau PET burden. Power analyses indicated that detecting a 1% within-individual change requires sample sizes ranging from 36 (AD signature) to >300 (entorhinal cortex). We observed low reliability of choroid plexus volumetry by FreeSurfer 7. These results provide practical benchmarks for expected FreeSurfer measurement variability in older adults. They highlight the advantages of longitudinal processing and rigorous quality control for research on brain aging and AD.