Maggioni's multi-scale SVD algorithm explores the dimensionality of a dataset by investigating the change in eigenvalues with respect to a scale parameter. The scale parameter is defined by the radius of a ball that sits at each point in the data. The ball, at each scale, is moved across the dataset and SVD is computed within the intersection of the ball and the data at each point. The shape in this collection of eigenvalues, with respect to scale, enables us to estimate both signal and noise dimensionality and scale. The estimate can be computed efficiently on large datasets if the sampling is chosen appropriately. The challenge, in this algorithm, is classifying the dimensions of noise, curvature and data. This classification currently uses variations on heuristics suggested in work by Maggioni et al.

multiscaleSVD(x, r, locn, nev, knn = 0, verbose = FALSE, plot = 0)

Arguments

x

input matrix, should be n (samples) by p (measurements)

r

radii to explore

locn

number of local samples to take at each scale

nev

maximum number of eigenvalues to compute

knn

randomly sample neighbors to assist with large datasets. set k with this value.

verbose

boolean to control verbosity of output

plot

boolean to control whether we plot results. its value determines which eigenvector off which to base the scale of the y-axis.'

Value

list with a vector of tangent, curvature, noise dimensionality and a a dataframe containing eigenvalues across scale, in correspondence with r:

  • dim: The tangent, curvature and noise dimensionality vector. The data dimensionality is the first entry, the curvature dimensionality exists from the second to the first entry of the noise vector.

  • noiseCutoffs: Dimensionalities where the noise may begin. These are candidate cutoffs but may contain some curvature information.'

  • evalsVsScale: eigenvalues across scale

  • evalClustering:data-driven clustering of the eigenvalues

References

http://www.math.jhu.edu/~mauro/multiscaledatageometry.html

Author

Avants BB

Examples

sphereDim = 9 embeddDim = 100 n = 1000 if ( usePkg( "pracma" ) ) { set.seed(20190919) sphereData = pracma::rands( n, sphereDim, 1. ) mysig = 0.1 spherEmbed = matrix( rnorm( n * embeddDim, 0, mysig ), nrow = n, ncol = embeddDim ) spherEmbed[ , 1:ncol( sphereData ) ] = spherEmbed[ , 1:ncol( sphereData ) ] + sphereData myr = seq( 1.0, 2.2, 0.05 ) # scales at which to sample mymssvd = multiscaleSVD( spherEmbed, myr, locn=5, nev=20, plot=1 ) if (getRversion() < "3.6.0") { testthat::expect_equal(mymssvd$noiseCutoffs, c(10, 11)) cm = unname(colMeans(mymssvd$evalsVsScale[11:25,])) testthat::expect_equal(cm, c(0.133651668406975, 0.0985695151401464, 0.0914110478052329, 0.086272017653314, 0.081188302173622, 0.0766100356616153, 0.0719736252996842, 0.067588745051721, 0.0622331185687704, 0.0415236318358749, 0.0192976885668337, 0.0183063537558787, 0.0174990088862745, 0.0170012938275551, 0.0163859378707545, 0.0158265354487181, 0.0153357773252783, 0.0147933538908736, 0.0143510807701235, 0.0140473978346935)) } else { testthat::expect_equal(mymssvd$noiseCutoffs, c(11, 15)) cm = unname(colMeans(mymssvd$evalsVsScale[13:25,])) testthat::expect_equal(cm, c(0.138511257441516, 0.106071822485487, 0.0989441114152412, 0.092910922851038, 0.0877970523897918, 0.0832570763653118, 0.0782599820599334, 0.0734433988152632, 0.0678992413676906, 0.0432283615430504, 0.0202481578919003, 0.0191747572787057, 0.0185718929604774, 0.0178301092823977, 0.0172423799670431, 0.0166981650233669, 0.0162072551503541, 0.015784555784915, 0.0153600119986575, 0.0149084240854556 )) } }