Deep learning-based unlearning of dataset bias for MRI harmonisation and confound removal

NeuroImage 2021
Nicola Dinsdale1
Mark Jenkinson1,2,3
Ana Namburete4

1Wellcome Centre for Integrative Neuroimaging, University of Oxford
2Australian Institute for Machine Learning (AIML), University of Adelaide
3South Australian Health and Medical Research Institute (SAHMRI)
4Ultrasound NeuroImage Analysis Group, University of Oxford

[Paper]
[GitHub]

Abstract

Increasingly large MRI neuroimaging datasets are becoming available, including many highly multi-site multi-scanner datasets. Combining the data from the different scanners is vital for increased statistical power; however, this leads to an increase in variance due to nonbiological factors such as the differences in acquisition protocols and hardware, which can mask signals of interest. We propose a deep learning based training scheme, inspired by domain adaptation techniques, which uses an iterative update approach to aim to create scanner-invariant features while simultaneously maintaining performance on the main task of interest, thus reducing the influence of scanner on network predictions. We demonstrate the framework for regression, classification and segmentation tasks with two different network architectures. We show that not only can the framework harmonise many-site datasets but it can also adapt to many data scenarios, including biased datasets and limited training labels. Finally, we show that the framework can be extended for the removal of other known confounds in addition to scanner. The overall framework is therefore flexible and should be applicable to a wide range of neuroimaging studies.

Pipeline

We present an iterative framework which can simply be applied to any feedforward architecture and classification or segmentation task. By alternating between optimising for the main task and removing the scanner information, the network is able to perform the task of interest while being invariant to the scanner. The framework can be trivially extended to the removal of additional confounds.

 [GitHub]

Results

Age Prediction Task - Harmonise across datasets
Harmonisation: Comparing lines 7 and 15 (blue) shows the increase in performance across the datasets (MAE) while also reducing the scanner classification accuracy from almost perfect to the desired random chance.
Generalisation: Comparing lines 5 and 13 (orange) shows the large increase in performance on the unseen site (OASIS).

Even in the presence of significant dataset bias



Segmentation Task - Harmonise across datasets

Even with limited training data for one site
When we have no labelled examples for one site, the unlearning process allows us to achieve a large increase in performance while removing the scanner information.


Acknowledgements

ND is supported by the Engineering and Physical Sciences Research Council (EPSRC) and Medical Research Council (MRC) [grant number EP/L016052/1]. MJ is supported by the National Institute for Health Research (NIHR), Oxford Biomedical Research Centre (BRC), and this research was funded by the Wellcome Trust [215573/Z/19/Z]. The Wellcome Centre for Integrative Neuroimaging is supported by core funding from the Wellcome Trust [203139/Z/16/Z]. AN is grateful for support from the UK Royal Academy of Engineering under the Engineering for Development Research Fellowships scheme. This research has been conducted in part using the UK Biobank Resource under Application Number 8107. We are grateful to UK Biobank for making the data available, and to all UK Biobank study participants, who generously donated their time to make this resource possible. Analysis was carried out on the clusters at the Oxford Biomedical Research Computing (BMRC) facility and FMRIB (part of the Wellcome Centre for Integrative Neuroimaging). BMRC is a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute, supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. The computational aspects of this research were supported by the Wellcome Trust Core Award [Grant Number 203141/Z/16/Z] and the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. The primary support for the ABIDE dataset by Adriana Di Martino was provided by the (NIMH K23MH087770) and the Leon Levy Foundation. Primary support for the work by Michael P. Milham and the INDI team was provided by gifts from Joseph P. Healy and the Stavros Niarchos Foundation to the Child Mind Institute, as well as by an NIMH award to MPM ( NIMH R03MH096321). Data were provided in part by OASIS: Principal Investigators: T. Benzinger, D. Marcus, J. Morris; NIH P50AG00561, P30NS09857781, P01AG026276, P01AG003991, R01AG043434, UL1TR000448, R01EB009352. AV-45 doses were provided by Avid Radiopharmaceuticals, a wholly owned subsidiary of Eli Lilly. This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.