vignettes/visualization.Rmd
visualization.Rmd
In this vignette we will demonstrate how to visualize single-cell data in genome-browser-track style plots with Signac.
To demonstrate we’ll use the human PBMC dataset processed in this vignette.
There are several different genome browser style plot types available in Signac, including accessibility tracks, gene annotations, peak coordinate, genomic links, and fragment positions.
The main plotting function in Signac is CoveragePlot()
, and this computes the averaged frequency of sequenced DNA fragments for different groups of cells within a given genomic region.
cov_plot <- CoveragePlot(
object = pbmc,
region = "chr2-87011729-87035519",
annotation = FALSE,
peaks = FALSE
)
cov_plot
We can also request regions of the genome by gene name. This will use the gene coordinates stored in the Seurat object to determine which genomic region to plot
CoveragePlot(
object = pbmc,
region = "CD8A",
annotation = FALSE,
peaks = FALSE
)
Gene annotations within a given genomic region can be plotted using the AnnotationPlot()
function.
gene_plot <- AnnotationPlot(
object = pbmc,
region = "chr2-87011729-87035519"
)
gene_plot
Peak coordinates within a genomic region can be plotted using the PeakPlot()
function.
peak_plot <- PeakPlot(
object = pbmc,
region = "chr2-87011729-87035519"
)
peak_plot
Relationships between genomic positions can be plotted using the LinkPlot()
function. This will display an arc connecting two linked positions, with the transparency of the arc line proportional to a score associated with the link. These links could be used to encode different things, including regulatory relationships (for example, linking enhancers to the genes that they regulate), or experimental data such as Hi-C.
Just to demonstrate how the function works, we’ve created a fake link here and added it to the PBMC dataset.
link_plot <- LinkPlot(
object = pbmc,
region = "chr2-87011729-87035519"
)
link_plot
While the CoveragePlot()
function computes an aggregated signal within a genomic region for different groups of cells, sometimes it’s also useful to inspect the frequency of sequenced fragments within a genomic region for individual cells, without aggregation. This can be done using the TilePlot()
function.
tile_plot <- TilePlot(
object = pbmc,
region = "chr2-87011729-87035519",
idents = c("CD4 Memory", "CD8 Effector")
)
tile_plot
By default, this selects the top 100 cells for each group based on the total number of fragments in the genomic region. The genomic region is then tiled and the total fragments in each tile counted for each cell, and the resulting counts for each position displayed as a heatmap.
Multimodal single-cell datasets generate multiple experimental measurements for each cell. Several methods now exist that are capable of measuring single-cell chromatin data (such as chromatin accessibility) alongside other measurements from the same cell, such as gene expression or mitochondrial genotype. In these cases it’s often informative to visualize the multimodal data together in a single plot. This can be achieved using the ExpressionPlot()
function. This is similar to the VlnPlot()
function in Seurat, but is designed to be incorportated with genomic track plots generated by CoveragePlot()
.
expr_plot <- ExpressionPlot(
object = pbmc,
features = "CD8A",
assay = "RNA"
)
expr_plot
We can create similar plots for multiple genes at once simply by passing a list of gene names
ExpressionPlot(
object = pbmc,
features = c("CD8A", "CD4"),
assay = "RNA"
)
Above we’ve demonstrated how to generate individual tracks and panels that can be combined into a single plot for a single genomic region. These panels can be easily combined using the CombineTracks()
function.
CombineTracks(
plotlist = list(cov_plot, tile_plot, peak_plot, gene_plot, link_plot),
expression.plot = expr_plot,
heights = c(10, 6, 1, 2, 3),
widths = c(10, 1)
)
The heights
and widths
parameters control the relative heights and widths of the individual tracks, according to the order that the tracks appear in the plotlist
. The CombineTracks()
function ensures that the tracks are aligned vertically and horizontally, and moves the x-axis labels (describing the genomic position) to the bottom of the combined tracks.
Above we’ve shown how to create genomic plot panels individually and how to combine them. This allows more control over how each panel is constructed and how they’re combined, but involves multiple steps. For convenience, we’ve included the ability to generate and combine different panels automatically in the CoveragePlot()
function, through the annotation
, peaks
, tile
, and features
arguments. We can generate a similar plot to that shown above in a single function call:
CoveragePlot(
object = pbmc,
region = "chr2-87011729-87035519",
features = "CD8A",
annotation = TRUE,
peaks = TRUE,
tile = TRUE,
links = TRUE
)
Notice that in this example we create the tile plot for every group of cells that is shown in the coverage track, whereas above we were able to create a plot that showed the aggregated coverage for all groups of cells and the tile plot for only the CD4 memory cells and the CD8 effector cells. A higher degree of customization is possible when creating each track separately.
Above we demonstrated the different types of plots that can be constructed using Signac. Often when exploring genomic data it’s useful to be able to interactively browse through different regions of the genome and adjust tracks on the fly. This can be done in Signac using the CoverageBrowser()
function. This provides all the same functionality of the CoveragePlot()
function, except that we can scroll upstream/downstream, zoom in/out of regions, navigate to new region, and adjust which tracks are shown or how the cells are grouped. In exploring the data interactively, often you will find interesting plots that you’d like to save for viewing later. We’ve included a “Save plot” button that will add the current plot to a list of plots that is returned when the interactive session is ended. Here’s a recorded demonstration of the CoverageBrowser()
function:
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] GenomicRanges_1.42.0 GenomeInfoDb_1.26.5 IRanges_2.24.1
## [4] S4Vectors_0.28.1 BiocGenerics_0.36.0 Signac_1.2.0
##
## loaded via a namespace (and not attached):
## [1] fastmatch_1.1-0 systemfonts_1.0.1 plyr_1.8.6
## [4] igraph_1.2.6 lazyeval_0.2.2 splines_4.0.1
## [7] BiocParallel_1.24.1 listenv_0.8.0 SnowballC_0.7.0
## [10] scattermore_0.7 ggplot2_3.3.3 digest_0.6.27
## [13] htmltools_0.5.1.1 fansi_0.4.2 magrittr_2.0.1
## [16] memoise_2.0.0 tensor_1.5 cluster_2.1.2
## [19] ROCR_1.0-11 globals_0.14.0 Biostrings_2.58.0
## [22] matrixStats_0.58.0 docopt_0.7.1 pkgdown_1.6.1
## [25] spatstat.sparse_2.0-0 colorspace_2.0-0 ggrepel_0.9.1
## [28] textshaping_0.3.3 xfun_0.22 dplyr_1.0.5
## [31] sparsesvd_0.2 crayon_1.4.1 RCurl_1.98-1.3
## [34] jsonlite_1.7.2 spatstat.data_2.1-0 survival_3.2-11
## [37] zoo_1.8-9 glue_1.4.2 polyclip_1.10-0
## [40] gtable_0.3.0 zlibbioc_1.36.0 XVector_0.30.0
## [43] leiden_0.3.7 future.apply_1.7.0 abind_1.4-5
## [46] scales_1.1.1 DBI_1.1.1 miniUI_0.1.1.1
## [49] Rcpp_1.0.6 viridisLite_0.4.0 xtable_1.8-4
## [52] reticulate_1.19 spatstat.core_2.1-2 htmlwidgets_1.5.3
## [55] httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.1
## [58] Seurat_4.0.1.9005 ica_1.0-2 pkgconfig_2.0.3
## [61] farver_2.1.0 ggseqlogo_0.1 sass_0.3.1
## [64] uwot_0.1.10 deldir_0.2-10 utf8_1.2.1
## [67] labeling_0.4.2 tidyselect_1.1.0 rlang_0.4.10
## [70] reshape2_1.4.4 later_1.2.0 munsell_0.5.0
## [73] tools_4.0.1 cachem_1.0.4 generics_0.1.0
## [76] ggridges_0.5.3 evaluate_0.14 stringr_1.4.0
## [79] fastmap_1.1.0 yaml_2.2.1 ragg_1.1.2
## [82] goftest_1.2-2 knitr_1.33 fs_1.5.0
## [85] fitdistrplus_1.1-3 purrr_0.3.4 RANN_2.6.1
## [88] pbapply_1.4-3 future_1.21.0 nlme_3.1-152
## [91] mime_0.10 slam_0.1-48 RcppRoll_0.3.0
## [94] compiler_4.0.1 plotly_4.9.3 png_0.1-7
## [97] spatstat.utils_2.1-0 tibble_3.1.1 tweenr_1.0.2
## [100] bslib_0.2.4 stringi_1.5.3 highr_0.9
## [103] desc_1.3.0 lattice_0.20-41 Matrix_1.3-2
## [106] vctrs_0.3.7 pillar_1.6.0 lifecycle_1.0.0
## [109] spatstat.geom_2.1-0 lmtest_0.9-38 jquerylib_0.1.4
## [112] RcppAnnoy_0.0.18 data.table_1.14.0 cowplot_1.1.1
## [115] bitops_1.0-7 irlba_2.3.3 httpuv_1.6.0
## [118] patchwork_1.1.1 R6_2.5.0 promises_1.2.0.1
## [121] lsa_0.73.2 KernSmooth_2.23-18 gridExtra_2.3
## [124] parallelly_1.24.0 codetools_0.2-18 MASS_7.3-53.1
## [127] assertthat_0.2.1 rprojroot_2.0.2 SeuratObject_4.0.0
## [130] qlcMatrix_0.9.7 sctransform_0.3.2 Rsamtools_2.6.0
## [133] GenomeInfoDbData_1.2.4 mgcv_1.8-33 grid_4.0.1
## [136] rpart_4.1-15 tidyr_1.1.3 rmarkdown_2.7
## [139] Rtsne_0.15 ggforce_0.3.3 shiny_1.6.0