vignettes/snareseq.Rmd
snareseq.Rmd
In this vignette we will analyse a single-cell co-assay dataset measuring gene expression and DNA accessibility in the same cells. This vignette is similar to the PBMC multiomic vignette, but demonstrates a similar joint analysis in a different species and with data gathered using a different technology.
This dataset was published by Chen, Lake, and Zhang (2019) and uses a technology called SNARE-seq. We will look at a dataset from the adult mouse brain, and the raw data can be downloaded from NCBI GEO here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126074
As the fragment files for this dataset are not publicly available we have re-mapped the raw data to the mm10 genome and created a fragment file using Sinto.
The fragment file can be downloaded here: https://www.dropbox.com/s/se3uag0cvld4xqg/fragments.sort.bed.gz?dl=0
The fragment file index can be downloaded here: https://www.dropbox.com/s/kvb8fiewkjz3kzf/fragments.sort.bed.gz.tbi?dl=0
Code used to create the fragment file from raw data is available here: https://github.com/timoast/SNARE-seq
First we create a Seurat object containing two different assays, one containing the gene expression data and one containing the DNA accessibility data.
To load the count data, we can use the Read10X()
function from Seurat by first placing the barcodes.tsv.gz
, matrix.mtx.gz
, and features.tsv.gz
files into a separate folder.
library(Signac)
library(Seurat)
library(ggplot2)
library(EnsDb.Mmusculus.v79)
set.seed(1234)
# load processed data matrices for each assay
rna <- Read10X("/home/stuartt/data/snare-seq/GSE126074_AdBrainCortex_rna/", gene.column = 1)
atac <- Read10X("/home/stuartt/data/snare-seq/GSE126074_AdBrainCortex_atac/", gene.column = 1)
fragments <- "/home/stuartt/data/snare-seq/fragments.sort.bed.gz"
# create a Seurat object and add the assays
snare <- CreateSeuratObject(counts = rna)
snare[['ATAC']] <- CreateChromatinAssay(
counts = atac,
sep = c(":", "-"),
genome = "mm10",
fragments = fragments
)
# extract gene annotations from EnsDb
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
# change to UCSC style since the data was mapped to hg19
seqlevelsStyle(annotations) <- 'UCSC'
genome(annotations) <- "mm10"
# add the gene information to the object
Annotation(snare[["ATAC"]]) <- annotations
DefaultAssay(snare) <- "ATAC"
snare <- TSSEnrichment(snare)
snare <- NucleosomeSignal(snare)
snare$blacklist_fraction <- FractionCountsInRegion(
object = snare,
assay = 'ATAC',
regions = blacklist_mm10
)
Idents(snare) <- "all" # group all cells together, rather than by replicate
VlnPlot(
snare,
features = c("nCount_RNA", "nCount_ATAC", "TSS.enrichment",
"nucleosome_signal", "blacklist_fraction"),
pt.size = 0.1,
ncol = 5
)
snare <- subset(
x = snare,
subset = blacklist_fraction < 0.03 &
TSS.enrichment < 20 &
nCount_RNA > 800 &
nCount_ATAC > 500
)
snare
## An object of class Seurat
## 277704 features across 8055 samples within 2 assays
## Active assay: ATAC (244544 features, 0 variable features)
## 1 other assay present: RNA
Process gene expression data using Seurat
DefaultAssay(snare) <- "RNA"
snare <- FindVariableFeatures(snare, nfeatures = 3000)
snare <- NormalizeData(snare)
snare <- ScaleData(snare)
snare <- RunPCA(snare, npcs = 30)
snare <- RunUMAP(snare, dims = 1:30, reduction.name = "umap.rna")
snare <- FindNeighbors(snare, dims = 1:30)
snare <- FindClusters(snare, resolution = 0.5, algorithm = 3)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8055
## Number of edges: 324904
##
## Running smart local moving algorithm...
## Maximum modularity in 10 random starts: 0.8902
## Number of communities: 14
## Elapsed time: 4 seconds
Process the DNA accessibility data using Signac
DefaultAssay(snare) <- 'ATAC'
snare <- FindTopFeatures(snare, min.cutoff = 10)
snare <- RunTFIDF(snare)
snare <- RunSVD(snare)
snare <- RunUMAP(snare, reduction = 'lsi', dims = 2:30, reduction.name = 'umap.atac')
p2 <- DimPlot(snare, reduction = 'umap.atac', label = TRUE) + NoLegend() + ggtitle("ATAC UMAP")
p1 + p2
Next we can annotate cell types in the dataset by transferring labels from an existing scRNA-seq dataset for the adult mouse brain, produced by the Allen Institute.
# label transfer from Allen brain
allen <- readRDS("/home/stuartt/github/chrom/vignette_data/allen_brain.rds")
# use the RNA assay in the SNARE-seq data for integration with scRNA-seq
DefaultAssay(snare) <- 'RNA'
transfer.anchors <- FindTransferAnchors(
reference = allen,
query = snare,
dims = 1:30,
reduction = 'cca'
)
predicted.labels <- TransferData(
anchorset = transfer.anchors,
refdata = allen$subclass,
weight.reduction = snare[['pca']],
dims = 1:30
)
snare <- AddMetaData(object = snare, metadata = predicted.labels)
# label clusters based on predicted ID
new.cluster.ids <- c(
"L2/3 IT",
"L4",
"L6 IT",
"L5 CT",
"L4",
"L5 PT",
"Pvalb",
"Sst",
"Astro",
"Oligo",
"Vip/Lamp5",
"L6 IT.2",
"L6b",
"NP"
)
names(x = new.cluster.ids) <- levels(x = snare)
snare <- RenameIdents(object = snare, new.cluster.ids)
snare$celltype <- Idents(snare)
DimPlot(snare, group.by = 'celltype', label = TRUE, reduction = 'umap.rna')
We can visualize both the gene expression and DNA accessibility information at the same time using the CoveragePlot()
function. This makes it easy to compare DNA accessibility in a given region for different cell types and overlay gene expression information for different genes.
DefaultAssay(snare) <- "ATAC"
CoveragePlot(snare, region = "chr2-22620000-22660000", features = "Gad2")
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.14.0
## [3] AnnotationFilter_1.14.0 GenomicFeatures_1.42.3
## [5] AnnotationDbi_1.52.0 Biobase_2.50.0
## [7] GenomicRanges_1.42.0 GenomeInfoDb_1.26.5
## [9] IRanges_2.24.1 S4Vectors_0.28.1
## [11] BiocGenerics_0.36.0 ggplot2_3.3.3
## [13] SeuratObject_4.0.0 Seurat_4.0.1.9005
## [15] Signac_1.2.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.1 reticulate_1.19
## [3] tidyselect_1.1.0 RSQLite_2.2.7
## [5] htmlwidgets_1.5.3 grid_4.0.1
## [7] docopt_0.7.1 BiocParallel_1.24.1
## [9] Rtsne_0.15 munsell_0.5.0
## [11] codetools_0.2-18 ragg_1.1.2
## [13] ica_1.0-2 future_1.21.0
## [15] miniUI_0.1.1.1 withr_2.4.2
## [17] colorspace_2.0-0 highr_0.9
## [19] knitr_1.33 rstudioapi_0.13
## [21] ROCR_1.0-11 tensor_1.5
## [23] listenv_0.8.0 labeling_0.4.2
## [25] MatrixGenerics_1.2.1 slam_0.1-48
## [27] GenomeInfoDbData_1.2.4 polyclip_1.10-0
## [29] bit64_4.0.5 farver_2.1.0
## [31] rprojroot_2.0.2 parallelly_1.24.0
## [33] vctrs_0.3.7 generics_0.1.0
## [35] xfun_0.22 biovizBase_1.38.0
## [37] BiocFileCache_1.14.0 lsa_0.73.2
## [39] ggseqlogo_0.1 R6_2.5.0
## [41] DelayedArray_0.16.3 bitops_1.0-7
## [43] spatstat.utils_2.1-0 cachem_1.0.4
## [45] assertthat_0.2.1 promises_1.2.0.1
## [47] scales_1.1.1 nnet_7.3-15
## [49] gtable_0.3.0 globals_0.14.0
## [51] goftest_1.2-2 rlang_0.4.10
## [53] systemfonts_1.0.1 RcppRoll_0.3.0
## [55] splines_4.0.1 rtracklayer_1.50.0
## [57] lazyeval_0.2.2 dichromat_2.0-0
## [59] checkmate_2.0.0 spatstat.geom_2.1-0
## [61] yaml_2.2.1 reshape2_1.4.4
## [63] abind_1.4-5 backports_1.2.1
## [65] httpuv_1.6.0 Hmisc_4.5-0
## [67] tools_4.0.1 ellipsis_0.3.1
## [69] spatstat.core_2.1-2 jquerylib_0.1.4
## [71] RColorBrewer_1.1-2 ggridges_0.5.3
## [73] Rcpp_1.0.6 plyr_1.8.6
## [75] base64enc_0.1-3 progress_1.2.2
## [77] zlibbioc_1.36.0 purrr_0.3.4
## [79] RCurl_1.98-1.3 prettyunits_1.1.1
## [81] rpart_4.1-15 openssl_1.4.3
## [83] deldir_0.2-10 pbapply_1.4-3
## [85] cowplot_1.1.1 zoo_1.8-9
## [87] SummarizedExperiment_1.20.0 ggrepel_0.9.1
## [89] cluster_2.1.2 fs_1.5.0
## [91] magrittr_2.0.1 RSpectra_0.16-0
## [93] data.table_1.14.0 scattermore_0.7
## [95] lmtest_0.9-38 RANN_2.6.1
## [97] SnowballC_0.7.0 ProtGenerics_1.22.0
## [99] fitdistrplus_1.1-3 matrixStats_0.58.0
## [101] hms_1.0.0 patchwork_1.1.1
## [103] mime_0.10 evaluate_0.14
## [105] xtable_1.8-4 XML_3.99-0.6
## [107] jpeg_0.1-8.1 sparsesvd_0.2
## [109] gridExtra_2.3 compiler_4.0.1
## [111] biomaRt_2.46.3 tibble_3.1.1
## [113] KernSmooth_2.23-18 crayon_1.4.1
## [115] htmltools_0.5.1.1 mgcv_1.8-33
## [117] later_1.2.0 Formula_1.2-4
## [119] tidyr_1.1.3 DBI_1.1.1
## [121] tweenr_1.0.2 dbplyr_2.1.1
## [123] MASS_7.3-53.1 rappdirs_0.3.3
## [125] Matrix_1.3-2 igraph_1.2.6
## [127] pkgconfig_2.0.3 pkgdown_1.6.1
## [129] GenomicAlignments_1.26.0 foreign_0.8-81
## [131] plotly_4.9.3 spatstat.sparse_2.0-0
## [133] xml2_1.3.2 bslib_0.2.4
## [135] XVector_0.30.0 VariantAnnotation_1.36.0
## [137] stringr_1.4.0 digest_0.6.27
## [139] sctransform_0.3.2 RcppAnnoy_0.0.18
## [141] spatstat.data_2.1-0 Biostrings_2.58.0
## [143] rmarkdown_2.7 leiden_0.3.7
## [145] fastmatch_1.1-0 htmlTable_2.1.0
## [147] uwot_0.1.10 curl_4.3
## [149] shiny_1.6.0 Rsamtools_2.6.0
## [151] lifecycle_1.0.0 nlme_3.1-152
## [153] jsonlite_1.7.2 BSgenome_1.58.0
## [155] desc_1.3.0 viridisLite_0.4.0
## [157] askpass_1.1 fansi_0.4.2
## [159] pillar_1.6.0 lattice_0.20-41
## [161] fastmap_1.1.0 httr_1.4.2
## [163] survival_3.2-11 glue_1.4.2
## [165] qlcMatrix_0.9.7 png_0.1-7
## [167] bit_4.0.4 ggforce_0.3.3
## [169] stringi_1.5.3 sass_0.3.1
## [171] blob_1.2.1 textshaping_0.3.3
## [173] latticeExtra_0.6-29 memoise_2.0.0
## [175] dplyr_1.0.5 irlba_2.3.3
## [177] future.apply_1.7.0