simlr.Rd
simlr minimizes reconstruction error across related modalities. That is, simlr will reconstruct each modality matrix from a basis set derived from the other modalities. The basis set can be derived from SVD, ICA or a simple sum of basis representations. This function produces dataset-wide multivariate beta maps for each of the related matrices. The multivariate beta maps are regularized by user input matrices that encode relationships between variables. The idea is overall similar to canonical correlation analysis but generalizes the basis construction and to arbitrary numbers of modalities.
simlr( voxmats, smoothingMatrices, iterations = 10, sparsenessQuantiles, positivities, initialUMatrix, mixAlg = c("ica", "svd", "avg", "rrpca-l", "rrpca-s", "pca", "stochastic"), orthogonalize = FALSE, repeatedMeasures = NA, lineSearchRange = c(-10000000000, 10000000000), lineSearchTolerance = 0.00000001, randomSeed, constraint = c("none", "Grassmann", "Stiefel"), energyType = c("regression", "normalized", "cca", "ucca", "lowRank"), vmats, connectors = NULL, optimizationStyle = c("lineSearch", "mixed", "greedy"), scale = c("sqrtnp", "np", "centerAndScale", "center", "norm", "none", "impute", "eigenvalue", "robust"), expBeta = 0.5, verbose = FALSE )
voxmats | A list that contains the named matrices. Note: the optimization will likely perform much more smoothly if the input matrices are each scaled to zero mean unit variance e.g. by the |
---|---|
smoothingMatrices | list of (sparse) matrices that allow parameter smoothing/regularization. These should be square and same order and size of input matrices. |
iterations | number of gradient descent iterations |
sparsenessQuantiles | vector of quantiles to control sparseness - higher is sparser |
positivities | vector that sets for each matrix if we restrict to positive or negative solution (beta) weights. choices are positive, negative or either as expressed as a string. |
initialUMatrix | list of initialization matrix size |
mixAlg | 'svd', 'ica', 'rrpca-l', 'rrpca-s', 'stochastic', 'pca' or 'avg' denotes the algorithm employed when estimating the mixed modality bases |
orthogonalize | boolean to control whether we orthogonalize the solutions explicitly |
repeatedMeasures | list of repeated measurement identifiers. this will allow estimates of per identifier intercept. |
lineSearchRange | lower and upper limit used in |
lineSearchTolerance | tolerance used in |
randomSeed | controls repeatability of ica-based decomposition |
constraint | one of none, Grassmann or Stiefel |
energyType | one of regression, normalized, lowRank, cca or ucca |
vmats | optional initial |
connectors | a list ( length of projections or number of modalities ) that indicates which modalities should be paired with current modality |
optimizationStyle | |
scale | options to standardize each matrix. e.g. divide by the square root of its number of variables (Westerhuis, Kourti, and MacGregor 1998), divide by the number of variables or center or center and scale or ... (see code). can be a vector which will apply each strategy in order. |
expBeta | if greater than zero, use exponential moving average on gradient. |
verbose | boolean to control verbosity of output - set to level |
A list of u, x, y, z etc related matrices.
BB Avants.
set.seed(1500) nsub = 25 npix = c(100,200,133) nk = 5 outcome = matrix(rnorm( nsub * nk ),ncol=nk) outcome1 = matrix(rnorm( nsub * nk ),ncol=nk) outcome2 = matrix(rnorm( nsub * nk ),ncol=nk) outcome3 = matrix(rnorm( nsub * nk ),ncol=nk) view1tx = matrix( rnorm( npix[1] * nk ), nrow=nk ) view2tx = matrix( rnorm( npix[2] * nk ), nrow=nk ) view3tx = matrix( rnorm( npix[3] * nk ), nrow=nk ) mat1 = (outcome %*% t(outcome1) %*% (outcome1)) %*% view1tx mat2 = (outcome %*% t(outcome2) %*% (outcome2)) %*% view2tx mat3 = (outcome %*% t(outcome3) %*% (outcome3)) %*% view3tx matlist = list( vox = mat1, vox2 = mat2, vox3 = mat3 ) result = simlr( matlist )#>p1 = mat1 %*% (result$v[[1]]) p2 = mat2 %*% (result$v[[2]]) p3 = mat3 %*% (result$v[[3]]) regs = regularizeSimlr( matlist ) result2 = simlr( matlist )#>pred1 = predictSimlr( matlist, result ) pred2 = predictSimlr( matlist, result2 ) # compare to permuted data s1 = sample( 1:nsub) s2 = sample( 1:nsub) resultp = simlr( list( vox = mat1, vox2 = mat2[s1,], vox3 = mat3[s2,] ) )#>p1p = mat1 %*% (resultp$v[[1]]) p2p = mat2[s1,] %*% (resultp$v[[2]]) p3p = mat3[s2,] %*% (resultp$v[[3]]) # compare to SVD svd1 = svd( mat1, nu=nk, nv=0 )$u svd2 = svd( mat2, nu=nk, nv=0 )$u svd3 = svd( mat3, nu=nk, nv=0 )$u # real range(cor(p1,p2))#> [1] -0.6589783 0.5133050#> [1] -0.6737101 0.7049057#> [1] -0.4720501 0.7041769#> [1] -0.1900885 0.2029698#> [1] -0.2164204 0.2130339#> [1] -0.4718212 0.4590747#> [1] -0.6848291 0.6203092resultp = simlr(list( vox = mat1, vox2 = mat2[s1,], vox3 = mat3[s2,] ), initialUMatrix = nk , verbose=TRUE, iterations=5, energyType = "normalized" )#>#> [1] " <0> BUILD-V <0> BUILD-V <0> BUILD-V <0> BUILD-V <0> " #> [1] "initialDataTerm: 1.41238902910432 <o> mixer: ica <o> E: normalized" #> [1] "Iteration: 1 bestEv: 1.38286025921981 bestIt: 1 CE: 1.38286025921981" #> [1] "Iteration: 2 bestEv: 1.38286025921981 bestIt: 1 CE: 1.39833267218968" #> [1] "Iteration: 3 bestEv: 1.38286025921981 bestIt: 1 CE: 1.44431740041708" #> [1] "Iteration: 4 bestEv: 1.38286025921981 bestIt: 1 CE: 1.40084763843311" #> [1] "Iteration: 5 bestEv: 1.38286025921981 bestIt: 1 CE: 1.45149701654717"