NiftiArray: Fast Random Access of NIfTI Objects

This page is still under construction! Check back later for updates!

Overview

R is not well suited for big data sets. NIfTI images, depending on the image dimension and voxel size, can be quite large when loaded and memory is a concern. As sample size or number of scans increases it becomes difficult to perform simple operations voxel-wise across subjects. For example, calculating the mean image across 800 subjects is computationally intense since not all 800 subjects can be loaded into memory at once. Therefore, there is a need for alternative approaches.

The NiftiArray package allows for fast random access of imaging data in NIfTI format and supports DelayedArray operations. The package establishes the NiftiArray class, a convenient and memory-efficient array-like container for on-disk representation of NIfTI image(s). The NiftiArray class is an extension of the HDF5Array class and converts NIfTI objects on disk to HDF5 files which allow for block processing and memory-efficient representations in R.

NiftiArray is compatible with the DelayedArray and DelayedMatrixStats packages.

DelayedArray is an R package currently hosted on Bioconductor. DelayedArray allows common array operations on an object without loading it into memory. In order to reduce memory usage and optimize performance, operations on the object are either delayed or executed using a block processing mechanism. DelayedArray works with the NiftiArray class to delay computation and allow fast and efficient data access.

DelayedMatrixStats is an R package currently hosted on Bioconductor. DelayedMatrixStats contains functions for statistical calculations (i.e. row or column median calculation) using DelayedArray efficient block processing on large matrices while keeping local memory usage low. DelayedMatrixStats builds on both the DelayedArray and matrixStats packages to allow for high-performing functions operating on rows and columns of DelayedMatrix objects. The functions are optimized by data type and for subsetted calculations such that both memory usage and processing time is minimized.

Installation

You can install the development version of NiftiArray from GitHub using the following:

# install.packages('remotes')
remotes::install_github("muschellij2/NiftiArray")

We are working to get a stable version on Neuroconductor.

Tutorial

Packages

The packages you will need to load for use with this tutorial are below:

Data

This tutorial will use data found here. A description of the data is available here. In this tutorial we will use the FLAIR images for subjects 1-5. We will download the data to a temporary folder. If you prefer to load these to a specific directory feel free to change the nii_destination in fileinfo to the file path where you’d like to save each image.

# Information about URL to download and where to save the image locally in destiation
urls = file.path("https://raw.githubusercontent.com", 
                     "muschellij2", "open_ms_data", "master", 
                     "cross_sectional", "MNI", paste0("patient0", 1:5),
                     "FLAIR_N4_noneck_reduced_winsor_regtoFLAIR_brain_N4_regtoMNI.nii.gz")
nii_destinations = sapply(urls, function(x) tempfile(fileext = ".nii.gz"))
hdf5_destinations = sub(".nii.gz", ".h5", nii_destinations)

fileinfo = tibble::tibble(url = urls, 
                          nii_destination = nii_destinations,
                          hdf5_destination = hdf5_destinations)
# Download all the files to the nii_destination
mapply(function(x,y) {
    download.file(url = x, destfile = y)
}, fileinfo$url, fileinfo$nii_destination)
#> https://raw.githubusercontent.com/muschellij2/open_ms_data/master/cross_sectional/MNI/patient01/FLAIR_N4_noneck_reduced_winsor_regtoFLAIR_brain_N4_regtoMNI.nii.gz 
#>                                                                                                                                                                  0 
#> https://raw.githubusercontent.com/muschellij2/open_ms_data/master/cross_sectional/MNI/patient02/FLAIR_N4_noneck_reduced_winsor_regtoFLAIR_brain_N4_regtoMNI.nii.gz 
#>                                                                                                                                                                  0 
#> https://raw.githubusercontent.com/muschellij2/open_ms_data/master/cross_sectional/MNI/patient03/FLAIR_N4_noneck_reduced_winsor_regtoFLAIR_brain_N4_regtoMNI.nii.gz 
#>                                                                                                                                                                  0 
#> https://raw.githubusercontent.com/muschellij2/open_ms_data/master/cross_sectional/MNI/patient04/FLAIR_N4_noneck_reduced_winsor_regtoFLAIR_brain_N4_regtoMNI.nii.gz 
#>                                                                                                                                                                  0 
#> https://raw.githubusercontent.com/muschellij2/open_ms_data/master/cross_sectional/MNI/patient05/FLAIR_N4_noneck_reduced_winsor_regtoFLAIR_brain_N4_regtoMNI.nii.gz 
#>                                                                                                                                                                  0

# Notice the files were saved to a temporary directory on your machine
# Change tempdir() to the directory where you saved the files if you adapted nii_destination
list.files(tempdir(), full.names = TRUE)
#>  [1] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/fileece5362dfdaa.nii.gz"                  
#>  [2] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/fileece54dd1a4d.nii.gz"                   
#>  [3] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/fileece54fe9b7b9.nii.gz"                  
#>  [4] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/fileece55dced5ab.nii.gz"                  
#>  [5] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/fileece5791cfbab.nii.gz"                  
#>  [6] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/HDF5Array_dataset_creation_global_counter"
#>  [7] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/HDF5Array_dump"                           
#>  [8] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/HDF5Array_dump_files_global_counter"      
#>  [9] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/HDF5Array_dump_log"                       
#> [10] "/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg80000gn/T//RtmpskOjFc/HDF5Array_dump_names_global_counter"

These data are \(182 \times 218 \times 182\) with pixel dimension \(1 mm \times 1mm \times 1 mm\). On disk they take up approximately 5MB as nii.gz compressed objects. When loaded into R, each image consumes approximately 55 MB worth of memory.

Obviously, imaging data can be smaller or larger in memory depending on dimension and pixel size but naturally all imaging data will run into memory restrictions. NiftiArray attempts to overcome these problems by converting NIfTI objects to HDF5 files, an efficient random access data type on disk. NiftiArray conserves the array structure and NIfTI header. Therefore, interacting with NiftiArray objects in R is similar to regular NIfTI objects. The benefit here is the images remain on disk and are not ever fully loaded so memory is conserved. See Comparison of NiftiArray to Traditional Methods section for memory comparisons.

Single NIfTI Image to a NiftiArray

In the Data section we downloaded data for this tutorial into a temporary directory and recorded the file paths in which they were saved. These files are .nii.gz file types. Unfortunately, these NIfTI compressed images cannot be fast random accessed. In order to utilize the NiftiArray functionality these data must be converted to HDF5 files on disk. This will not affect how you interact with the object in R only how it is stored on disk. Unfortunately, saving this objects out as HDF5 objects will require the same amount of memory on disk so if you keep the NIfTI images on disk as well as the HDF5 files you will double your memory usage on disk. In R though, these objects will use almost no memory. This is where NiftiArray shines.

NiftiArray::writeNiftiArray

To convert a NIfTI object to an HDF5 file and eventually a local NiftiArray object you can use the NiftiArray::writeNiftiArray function. By default this function will store the data in a temporary folder, similar to where we saved the data for this tutorial. You can over-ride this though by specifying the filepath option.

Note: When calling NiftiArray::writeNiftiArray you are converting the NIfTI object on disk to a HDF5 file with a NIfTI-specific grouping or hierarchical format. See this link for more information on groups within HDF5 files. In the event that users have stored other information in groups inside the HDF5 file output from NiftiArray::writeNiftiArray we did not want to support over-writing the file and therefore additional information. Instead, we simply over-write the groups that are associated with a NiftiArray object inside the HDF5 file. To over-write these groups, set overwrite = TRUE in the NiftiArray::writeNiftiArray function.

Let’s convert and write the first subject’s data as a temporary file to the temporary directory on-disk. This temporary file will have the pattern NiftiArraypatient01 in the file name. Again, if you’d like to save this object somewhere other than the temporary directory simply change the filepath option in NiftiArray::writeNiftiArray.

The NIfTI was converted to the the NIfTI-HDF5 file format and saved on disk. It was also loaded into memory is a NiftiArray object and returned in R as patient01. patient01 is of class NiftiArray.

The NIfTI header is conserved in the NiftiArray class in case you ever need to quality control or convert a NiftiArray back to a NIfTI image. You can extract the header information from a NiftiArray object using the NiftiArray::nifti_header function.

In the next section we will re-load patient01 using the NiftiArray::NiftiArray function. Let’s remove the patient01 object from R memory so we can re-load it later.

The NiftiArray::writeNiftiArray function takes a single file path and converts the NIfTI image on disk to the HDF5 file on disk while also loading the NiftiArray into local memory. By default, the images are saved to a temporary directory. This is useful for quick one time calculations or conversions and ensures that on-disk storage remains minimal since the temporary directories delete the files systematically over time.

When you want to work with NiftiArray objects for lots of subjects frequently though the conversion process may take some valuable compute time and memory. Instead, in these instances if on-disk storage is not an issue we suggest saving the HDF5 NiftiArray objects to their own folder and loading NiftiArray into R using NiftiArray::NiftiArray. For only a few images compute time for both approaches (NiftiArray::writeNiftiArray and NiftiArray::NiftiArray) are fast and comparable but for larger data sets of reasonable dimension and pixel size it can take a few minutes to convert and save the HDF5 object and a few minutes to load into R locally. See the Compute Time section for more details about differences in compute time between NiftiArray::writeNiftiArray and NiftiArray::NiftiArray.

NiftiArray::NiftiArray

The NiftiArray function can be used to load an on disk NIfTI object into R as a NiftiArray.

It can also load the NiftiArray object from the HDF5 file converted using NiftiArray::writeNiftiArray object on-disk and stored as a HDF5 it can be loaded into R using the NiftiArray::NiftiArray function.

Notice this is exactly the same patient01 object as before.

Multiple NIfTI Images to NiftiArray Objects

In practice we have a list of subjects we want to convert from NIfTIs to NiftiArray objects and load into R.

NiftiArray::NiftiArrayList

The NiftiArray::NiftiArrayList function converts and writes NiftiArray objects if the x input is of class NIfTI and then loads all the NiftiArray objects into R in a list as a new class NiftiArrayList. That is, every element in the list is a NiftiArray.

The NiftiArray::NiftiArrayList class also simply loads the NiftiArray objects as a list if the x input is a set of file paths to the HDF5 NiftiArray files on disk.

At this point, we have all 5 patients loaded into R as a NiftiArrayList object. We can now convert the NiftiArray object to a NiftiMatrix object in order to run voxel-wise calculations.

Single NiftiArray to a NiftiMatrix

The NiftiArray object is a 3 dimensional array structure that allows for memory efficient delayed random access of NIfTI objects. The NiftiMatrix is the result of concatenating the NiftiArray. Similar to NiftiArray, NiftiMatrix is a new class object. Rather than an array structure we can strung out the image to a vector. In the code below, we convert a NiftiArray to a NiftiMatrix for one patient. We then verify that the class of this object is in fact a NiftiMatrix, has only a single column, index the vector to print some values, and validate that the object size is as memory efficient as the NiftiMatrix.

In this example, we showed the result of converting a single patients NiftiArray to a NiftiMatrix but it will be more useful to create a NiftiMatrix with multiple subjects.

NiftiArrayList to a Big NiftiMatrix

In order to use tools like DelayedArray and DelayedMatrixStats to calculate voxel-level statistics across multiple subjects we need to create a big NiftiMatrix. That is, each row will represent a voxel and each column a new subject. To do this, we can take advantage of the NiftiArrayList class.

DelayedMatrixStats

The DelayedMatrixStats package allows for row or column wise statistical operations using delayed block processing to keep both memory and speed optimized. For more information, the package and documentation is available through Bioconductor here.

Below we show a simple example obtaining the voxel level mean and median across subjects.

Notice the resulting vector voxel_medians is not a DelayedArray or NiftiArray object but rather a normal vector. We can convert it to a NiftiMatrix so that it returns to a memory efficient object using as.

Notice the NIfTI header is no longer accurate when we coerce the normal vector to a NiftiMatrix.

DelayedMatrixStats is a very powerful package so you should spend some time looking through the functions and documentation to see what is available and useful for you.

DelayedMatrixStats and Vectors to NiftiArray

Converting between the objects returned from DelayedMatrixStats functions and NiftiArray objects is not as easy because the NIfTI header was lost in the calculations involved in DelayedMatrixStats functions. We must extract the header from previous object and then re-initialize the NiftiArray.

Once a NiftiArray it is easy to create the niftiImage object.

You could write these objects out as NIfTIs using RNifti::writeNifti

Comparison of NiftiArray to Traditional Methods

Memory

Local Object Size

Memory mapped in bytes from a single patients image read. The image dimension is 182 by 218 by 182 with pixel dimension 1 mm by 1 mm by 1 mm. On disk the image is 4.9 MB.
Read Function Byte Size log_10(Byte Size)
NiftiArray::NiftiArray 9056 3.96
NiftiArray::writeNiftiArray 9056 3.96
RNifti::readNifti 57769432 7.76
neurobase::readnii 57777136 7.76

Speed

Speed is very important. Obviously, we want code to be speed efficient to save overall computation time but as a user we also don’t want to be distracted by time lags. 0.1 second is approximately the limit for a user to feel as though the system is reacting instantaneously. 1.0 second is about the limit for the user’s flow or thought process to stay uninterrupted even though they notice a delay. 10 seconds is the limit to keep a users attention focused on the task at hand. The limit of 10 seconds a user will notice a delay and often lose their train of thought. Beyond 10 seconds and the user may lose track of even the task at hand. That is, they probably opened Twitter or Instagram and have completely lost track of what they were doing [Miller 1968; Card et al. 1991].

Therefore, speeds at 0.1 and 1 second limits are ideal to minimize lag and maximize user attention spans.

Card, S. K., Robertson, G. G., and Mackinlay, J. D. (1991). The information visualizer: An information workspace. Proc. ACM CHI’91 Conf. (New Orleans, LA, 28 April-2 May), 181-188.

Miller, R. B. (1968). Response time in man-computer conversational transactions. Proc. AFIPS Fall Joint Computer Conference Vol. 33, 267-277.

One Patient Read

speed = tibble::as_tibble(
  microbenchmark::microbenchmark(
    NiftiArray::NiftiArray(fileinfo$hdf5_destination[1]),
    NiftiArray::NiftiArray(fileinfo$nii_destination[1]),
    NiftiArray::writeNiftiArray(fileinfo$nii_destination[1]),
    RNifti::readNifti(fileinfo$nii_destination[1]),
    neurobase::readnii(fileinfo$nii_destination[1]),
    times = 5)
  ) %>% 
  dplyr::mutate(time = time/(1*10^9)) %>% # conver nanoseconds to seconds
  dplyr::rename(read_type = expr) %>% 
  dplyr::mutate(read_type = 
                  dplyr::case_when(
                    stringr::str_detect(read_type, 'RNifti') ~ "RNifti::readNifti",
                    stringr::str_detect(read_type, 'neurobase') ~ "neurobase::readnii",
                    stringr::str_detect(read_type, 'hdf5') ~ "NiftiArray::NiftiArray - HDF5 File",                    
                    stringr::str_detect(read_type, 'NiftiArray::NiftiArray\\(fileinfo\\$nii') ~ "NiftiArray::NiftiArray - NIfTI File",
                    stringr::str_detect(read_type, 'NiftiArray::writeNiftiArray') ~ "NiftiArray::writeNiftiArray"),
                read_type = as.factor(read_type),
                read_type = forcats::fct_reorder(read_type, time, .desc = TRUE))

# Table of memory
speed_summary = speed %>% 
  dplyr::group_by(read_type) %>% 
  dplyr::summarise(
    mean = mean(time, na.rm = TRUE),
    median = mean(time, na.rm = TRUE),
    sd = sd(time, na.rm = TRUE),
    min = min(time, na.rm = TRUE),
    max = max(time, na.rm = TRUE)
  )

knitr::kable(
  speed_summary,
  col.names = c('Read Function', 'Mean', 'Median', 'Std. Dev.', 'Min.', 'Max.'),
  format = 'html',
  digits = 2,
  caption = 'Memory mapped in bytes from a single patients image read.
  The image dimension is 182 by 218 by 182 with pixel dimension 1 mm by 1 mm by 1 mm.
  On disk the image is 4.9 MB.',
  booktabs = TRUE
) %>%
  kableExtra::kable_styling("striped", full_width = FALSE)
Memory mapped in bytes from a single patients image read. The image dimension is 182 by 218 by 182 with pixel dimension 1 mm by 1 mm by 1 mm. On disk the image is 4.9 MB.
Read Function Mean Median Std. Dev. Min. Max.
NiftiArray::NiftiArray - NIfTI File 2.76 2.76 0.38 2.40 3.31
NiftiArray::writeNiftiArray 2.37 2.37 0.25 2.21 2.81
neurobase::readnii 1.36 1.36 0.17 1.14 1.60
RNifti::readNifti 0.17 0.17 0.02 0.14 0.18
NiftiArray::NiftiArray - HDF5 File 0.16 0.16 0.02 0.14 0.19

The NiftiArray::NiftiArray - HDF5 and RNifti::readNifti both are around the 0.1 limit of seamless user flow. The remaining functions are around the 1 second limit where a user will notice a lag but not lose their train of thought.