neuroCombat can be installed in R by typing the following commands:
ComBat estimates scanner-specific location and scale parameters, for each feature separately, and pools information across features using empirical Bayes to improve the estimation of those parameters for small sample size studies.
neuroCombat function is the main function. It requires two mandatory arguments: - a data matrix (p x n)
dat for which the p rows are features, and the n columns are participants. - a numeric or character vector
batch of length n indicating the site/scanner/study id.
For illustration purpose, let’s simulate an imaging dataset with n=10 participants, acquired on 2 scanners, with 5 participants each, with p=10000 voxels per scan.
We use the function
neuroCombat to harmonize the data across the 2 scanners:
data.harmonized <- neuroCombat(dat=dat, batch=batch)
By default, this uses parametric adjustments. To following command must be used for non-parametric adjustments:
data.harmonized <- neuroCombat(dat=dat, batch=batch, parametric=FALSE)
The harmonized matrix is stored in
data.harmonized object also contains the different parameters estimated by ComBat: -
delta.hat: Estimated location and shift (L/S) parameters before empirical Bayes. -
delta.star: Empirical Bayes estimated L/S parameters. -
b.prior: esimated prior distributions parameters.
neuroCombat also accepts an optional argument,
mod, which is a matrix containing biological covariates, including the outcome of interest. This is recommended to ensure that biological variability is preserved in the harmonization process. For instance, for a study with age and disease covariates,
we first create a model matrix for these two biological covariates using the
mod is a n x 3 matrix, containing an intercept, age and a dummy variable for the second level of the disease variable (the first level is taken as the baseline group). Note that including an intercept in the model matrix will not change the results of the algorithm; ComBat automatically removes the intercept from the model matrix when fitting the models. We now harmonize the data:
combat.harmonized <- neuroCombat(dat=dat, batch=batch, mod=mod)
Sometimes, it is preferable not to pool information across features, for instance if: - (1) The number of features is substantially smaller than the number of participants (p << n) or - (2) The prior distributions used in ComBat do not fit well the data - (3) The site effects are only present for a small subset of features
An example of (2) is studies with site/scanner effects that are highly heteregenous across features, for instance differential scanner effects between white matter (WM) or grey matter (GM) voxels exist. To run the ComBat model without empirical Bayes, which boils down to fitting a location/shift (L/S) model for each feature separately, the option
eb=FALSE can be used:
data.harmonized <- neuroCombat(dat=dat, batch=batch, eb=FALSE)