Adding external data to ggseg plotting

Introduction

Once you have covered the main functionality in ggseg you will want to use it to plot the results of your data. In order to do this, your data must adhere to certain specifications, so that ggseg can manage to merge your data with the atlas you are using. This means you need to be able to inspect and locate the way the regions you are working with are names in the internal atlas files. This vignette should provide the tools you need to figure these features out, and to manipulate your data to fit these requirements.

Inspecting the atlas labels

There are several ways you can inspect what the data in the atlas looks like. While each atlas has some small differences, they all share six main columns:
1. long - x-axis
2. lat - y-axis
3. area - name of area/network
4. hemi - hemisphere (left or right)
5. side - side of view (medial, lateral, sagittal or axial)

Most atlases also have a label column, which are raw names assigned from the program run to segment/extract data.

This information is stored in a list of data.frames called atlas.info, which is loaded when ggseg is loaded, just like the atlases and palettes.

library(ggseg)
## Loading required package: ggplot2
library(magrittr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
atlas.info$yeo7
##                   area  hemi    side
## 1          somatomotor  left lateral
## 322            default  left lateral
## 605             limbic  left  medial
## 685             limbic  left lateral
## 1253       somatomotor right lateral
## 1532            visual  left  medial
## 1672            visual right  medial
## 1809    frontoparietal right lateral
## 2043 ventral attention right lateral
## 2077           default right  medial
## 2260 ventral attention right  medial
## 2543           default  left  medial
## 2715  dorsal attention  left lateral
## 2985  dorsal attention right lateral
## 3228 ventral attention  left  medial
## 3449       somatomotor right  medial
## 3588       somatomotor  left  medial
## 3691            visual right lateral
## 3783            visual  left lateral
## 3985            limbic right  medial
## 4069           default right lateral
## 4231  dorsal attention right  medial
## 4398            limbic right lateral
## 4541  dorsal attention  left  medial
## 4808 ventral attention  left lateral
## 5425    frontoparietal  left lateral
## 6181    frontoparietal  left  medial
## 6582    frontoparietal right  medial

Here you can see information about the yeo7 atlas, and the main attributes of this atlas. If you want to use external data with your ggseg plot, you will need to make sure that your data has at least one column corresponding in name and content with another in the atlas you are using.

Structuring data for merging

For instance, here we make some data for the “default” and “visual” networks in the yeo7 atlas, and two p values for those two networks.

someData = data.frame(area=c("default","visual"),
                      p=c(.03,.6), 
                      stringsAsFactors = F)
someData
##      area    p
## 1 default 0.03
## 2  visual 0.60

Notice you we have spelled bothe the column name and the area names exactly as they appear in the data. This is necessary for the merging within the ggseg function to work properly. This merge can be attempted before supplying the data to ggseg to see if there are any errors.

yeo7 %>% 
  left_join(someData) %>% 
  head(10) #only added to truncate output
## Joining, by = "area"
##        long     lat        area hemi    side network          label group
## 1  0.518175 0.17936 somatomotor left lateral       2 lh_7Networks_2   0.1
## 2  0.517253 0.17964 somatomotor left lateral       2 lh_7Networks_2   0.1
## 3  0.512315 0.17985 somatomotor left lateral       2 lh_7Networks_2   0.1
## 4  0.508193 0.18240 somatomotor left lateral       2 lh_7Networks_2   0.1
## 5  0.507867 0.18354 somatomotor left lateral       2 lh_7Networks_2   0.1
## 6  0.507650 0.18500 somatomotor left lateral       2 lh_7Networks_2   0.1
## 7  0.511665 0.19775 somatomotor left lateral       2 lh_7Networks_2   0.1
## 8  0.511990 0.19906 somatomotor left lateral       2 lh_7Networks_2   0.1
## 9  0.516819 0.20529 somatomotor left lateral       2 lh_7Networks_2   0.1
## 10 0.541885 0.23611 somatomotor left lateral       2 lh_7Networks_2   0.1
##    id order  p
## 1   0     1 NA
## 2   0     2 NA
## 3   0     3 NA
## 4   0     4 NA
## 5   0     5 NA
## 6   0     6 NA
## 7   0     7 NA
## 8   0     8 NA
## 9   0     9 NA
## 10  0    10 NA

No errors! Yes, the p column is seemingly full of NAs, but that is just because the top of the data is the somatomotor network, which we did not supply any p values for, so it has been populated with NAs. We can sort the data differently, so we can see the phas been added correctly.

yeo7 %>% 
  left_join(someData) %>% 
  arrange(p) %>% 
  head(10) #only added to truncate output
## Joining, by = "area"
##        long     lat    area hemi    side network          label group id
## 1  0.037056 0.26139 default left lateral       7 lh_7Networks_7   1.1  1
## 2  0.032444 0.26546 default left lateral       7 lh_7Networks_7   1.1  1
## 3  0.016711 0.28956 default left lateral       7 lh_7Networks_7   1.1  1
## 4  0.007542 0.31685 default left lateral       7 lh_7Networks_7   1.1  1
## 5  0.006239 0.32254 default left lateral       7 lh_7Networks_7   1.1  1
## 6  0.005317 0.32449 default left lateral       7 lh_7Networks_7   1.1  1
## 7  0.004667 0.33524 default left lateral       7 lh_7Networks_7   1.1  1
## 8  0.001954 0.34543 default left lateral       7 lh_7Networks_7   1.1  1
## 9  0.000000 0.34711 default left lateral       7 lh_7Networks_7   1.1  1
## 10 0.000000 0.36040 default left lateral       7 lh_7Networks_7   1.1  1
##    order    p
## 1      1 0.03
## 2      2 0.03
## 3      3 0.03
## 4      4 0.03
## 5      5 0.03
## 6      6 0.03
## 7      7 0.03
## 8      8 0.03
## 9      9 0.03
## 10    10 0.03

If you need your data to be matched on several columns, the approach is the same. Add the column you want to match on, with the exact same name, and make sure it’s content matches the content of the same column in the data.

someData$hemi = c("left","left")
someData
##      area    p hemi
## 1 default 0.03 left
## 2  visual 0.60 left
yeo7 %>% 
  left_join(someData) %>% 
  arrange(p) %>% 
  head(10)
## Joining, by = c("area", "hemi")
##        long     lat    area hemi    side network          label group id
## 1  0.037056 0.26139 default left lateral       7 lh_7Networks_7   1.1  1
## 2  0.032444 0.26546 default left lateral       7 lh_7Networks_7   1.1  1
## 3  0.016711 0.28956 default left lateral       7 lh_7Networks_7   1.1  1
## 4  0.007542 0.31685 default left lateral       7 lh_7Networks_7   1.1  1
## 5  0.006239 0.32254 default left lateral       7 lh_7Networks_7   1.1  1
## 6  0.005317 0.32449 default left lateral       7 lh_7Networks_7   1.1  1
## 7  0.004667 0.33524 default left lateral       7 lh_7Networks_7   1.1  1
## 8  0.001954 0.34543 default left lateral       7 lh_7Networks_7   1.1  1
## 9  0.000000 0.34711 default left lateral       7 lh_7Networks_7   1.1  1
## 10 0.000000 0.36040 default left lateral       7 lh_7Networks_7   1.1  1
##    order    p
## 1      1 0.03
## 2      2 0.03
## 3      3 0.03
## 4      4 0.03
## 5      5 0.03
## 6      6 0.03
## 7      7 0.03
## 8      8 0.03
## 9      9 0.03
## 10    10 0.03

Notice how the message now states that it is joining by = c("area", "hemi"). The merge function has recognized that there are two equally named columns, and assumes (in this case correctly) that these are equivalent.
Notice that everything is case-sensitive, so writing Area or Left will not result in matching.

Providing data to ggseg

When you have managed to create data that merges nicely with the atlas, you can go ahead and supply it to the function.

library(ggplot2)
ggseg(someData, atlas="yeo7", mapping=aes(fill=p))

You can actually also supply it directly as an atlas. For instance, if you had saved the merged data from the previous steps, you can supply this directly to the atlas option.

newAtlas = yeo7 %>% 
  left_join(someData)
## Joining, by = c("area", "hemi")
ggseg(atlas=newAtlas, mapping=aes(fill=p), position="stacked")

It is this possibility of supplying a custom atlas that gives you particular flexibility, though a little tricky to begin with. As mentioned in the introductory vignette, if you plan on using faceting, mergin the data and supplying it as an atlas is the way to go. If you do not, you will get unwanted results. Lets do a recap of the unwanted results:

someData = data.frame(
  area = rep(c("transverse temporal", "insula",
               "pre central","superior parietal"),2), 
  p = sample(seq(0,.5,.001), 8),
  AgeG = c(rep("Young",4), rep("Old",4)),
  stringsAsFactors = FALSE)
  
ggseg(data=someData, colour="white", mapping=aes(fill=p)) +
  facet_wrap(~AgeG, ncol=1) +
  theme(legend.position = "bottom")

See how you have three facets, when you only have 2 groups, and that the “background” brain is not printed in your two groups. This is because for ggplot, that is what the data looks like. In order to plot it as we wish, we must completely duplicate the atlas for each group/facet. I like using lists for lapply to do this.

# Initiate your list. Creating list newAtlas, that contains two data frames, one for each group
newAtlas = list(Young = someData %>% filter(AgeG %in% "Young"),
                Old = someData %>% filter(AgeG %in% "Old")) 
newAtlas 
## $Young
##                  area     p  AgeG
## 1 transverse temporal 0.409 Young
## 2              insula 0.256 Young
## 3         pre central 0.382 Young
## 4   superior parietal 0.096 Young
## 
## $Old
##                  area     p AgeG
## 1 transverse temporal 0.478  Old
## 2              insula 0.186  Old
## 3         pre central 0.301  Old
## 4   superior parietal 0.497  Old
# Use list apply (lapply) to do the same operation on each element of the list.
# Here we join each data.frame with the atlas of choice ("dkt"), and make sure the
# group colum "AgeG" has it's group name populated in the column entirely
newAtlas = lapply(newAtlas, function(x) x %>% full_join(dkt) %>% mutate(AgeG = unique(x$AgeG)))
## Joining, by = "area"
## Joining, by = "area"
newAtlas %>% lapply(function(x)  head(x,5))
## $Young
##                  area     p  AgeG    long     lat id hemi    side acronym
## 1 transverse temporal 0.409 Young 2.73519 2.27969 20 left lateral    trnt
## 2 transverse temporal 0.409 Young 2.76025 2.30942 20 left lateral    trnt
## 3 transverse temporal 0.409 Young 2.83864 2.39596 20 left lateral    trnt
## 4 transverse temporal 0.409 Young 2.95078 2.50072 20 left lateral    trnt
## 5 transverse temporal 0.409 Young 3.07709 2.58665 20 left lateral    trnt
##       lobe                 label group order
## 1 temporal lh_transversetemporal  20.1     1
## 2 temporal lh_transversetemporal  20.1     2
## 3 temporal lh_transversetemporal  20.1     3
## 4 temporal lh_transversetemporal  20.1     4
## 5 temporal lh_transversetemporal  20.1     5
## 
## $Old
##                  area     p AgeG    long     lat id hemi    side acronym
## 1 transverse temporal 0.478  Old 2.73519 2.27969 20 left lateral    trnt
## 2 transverse temporal 0.478  Old 2.76025 2.30942 20 left lateral    trnt
## 3 transverse temporal 0.478  Old 2.83864 2.39596 20 left lateral    trnt
## 4 transverse temporal 0.478  Old 2.95078 2.50072 20 left lateral    trnt
## 5 transverse temporal 0.478  Old 3.07709 2.58665 20 left lateral    trnt
##       lobe                 label group order
## 1 temporal lh_transversetemporal  20.1     1
## 2 temporal lh_transversetemporal  20.1     2
## 3 temporal lh_transversetemporal  20.1     3
## 4 temporal lh_transversetemporal  20.1     4
## 5 temporal lh_transversetemporal  20.1     5
# Now, the each data.frame in our list has the exact same columns,
# so we can easily append the dataframes by row.
newAtlas = newAtlas %>% bind_rows()
newAtlas %>% head(5)
##                  area     p  AgeG    long     lat id hemi    side acronym
## 1 transverse temporal 0.409 Young 2.73519 2.27969 20 left lateral    trnt
## 2 transverse temporal 0.409 Young 2.76025 2.30942 20 left lateral    trnt
## 3 transverse temporal 0.409 Young 2.83864 2.39596 20 left lateral    trnt
## 4 transverse temporal 0.409 Young 2.95078 2.50072 20 left lateral    trnt
## 5 transverse temporal 0.409 Young 3.07709 2.58665 20 left lateral    trnt
##       lobe                 label group order
## 1 temporal lh_transversetemporal  20.1     1
## 2 temporal lh_transversetemporal  20.1     2
## 3 temporal lh_transversetemporal  20.1     3
## 4 temporal lh_transversetemporal  20.1     4
## 5 temporal lh_transversetemporal  20.1     5
# We can now supply the newAtlas as an atlas to ggseg
ggseg(atlas=newAtlas, colour="white", mapping=aes(fill=p)) +
  facet_wrap(~AgeG, ncol=1) +
  theme(legend.position = "bottom")

This whole procedure can be piped together, so you dont have to save all the intermediate steps.

newAtlas = list(Young = someData %>% filter(AgeG %in% "Young"),
                Old = someData %>% filter(AgeG %in% "Old")) %>% 
  lapply(function(x) x %>% full_join(dkt) %>% mutate(AgeG = unique(x$AgeG))) %>% 
  bind_rows()
## Joining, by = "area"
## Joining, by = "area"
ggseg(atlas=newAtlas, colour="white", mapping=aes(fill=p)) +
  facet_wrap(~AgeG, ncol=1) +
  scale_fill_gradientn(colours = c("royalblue","firebrick","goldenrod"),na.value="grey")