vignettes/monocle.Rmd
monocle.Rmd
In this vignette we will demonstrate how to construct cell trajectories with Monocle 3 using single-cell ATAC-seq data. Please see the Monocle 3 website for information about installing Monocle 3.
To facilitate conversion between the Seurat (used by Signac) and CellDataSet (used by Monocle 3) formats, we will use a conversion function in the SeuratWrappers package available on GitHub.
We will use a single-cell ATAC-seq dataset containing human CD34+ hematopoietic stem and progenitor cells published by Satpathy and Granja et al. (2019, Nature Biotechnology).
The processed dataset is available on NCBI GEO here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE129785
Note that the fragment file is present inside the
GSE129785_RAW.tar
archive, and the index for the fragment
file is not supplied. You can index the file yourself using tabix, for example:
tabix -p bed <fragment_file>
.
First we will load the dataset and perform some standard preprocessing using Signac.
library(Signac)
library(Seurat)
library(SeuratWrappers)
library(monocle3)
library(Matrix)
library(ggplot2)
library(patchwork)
filepath <- "../vignette_data/GSE129785/GSE129785_scATAC-Hematopoiesis-CD34"
peaks <- read.table(paste0(filepath, ".peaks.txt.gz"), header = TRUE)
cells <- read.table(paste0(filepath, ".cell_barcodes.txt.gz"), header = TRUE, stringsAsFactors = FALSE)
rownames(cells) <- make.unique(cells$Barcodes)
mtx <- readMM(file = paste0(filepath, ".mtx"))
mtx <- as(object = mtx, Class = "dgCMatrix")
## 'as(<dgTMatrix>, "dgCMatrix")' is deprecated.
## Use 'as(., "CsparseMatrix")' instead.
## See help("Deprecated") and help("Matrix-deprecated").
bone_assay <- CreateChromatinAssay(
counts = mtx,
min.cells = 5,
fragments = "../vignette_data/GSE129785/GSM3722029_CD34_Progenitors_Rep1_fragments.tsv.gz",
sep = c("_", "_"),
genome = "hg19"
)
bone <- CreateSeuratObject(
counts = bone_assay,
meta.data = cells,
assay = "ATAC"
)
# The dataset contains multiple cell types
# We can subset to include just one replicate of CD34+ progenitor cells
bone <- bone[, bone$Group_Barcode == "CD34_Progenitors_Rep1"]
# add cell type annotations from the original paper
cluster_names <- c("HSC", "MEP", "CMP-BMP", "LMPP", "CLP", "Pro-B", "Pre-B", "GMP",
"MDP", "pDC", "cDC", "Monocyte-1", "Monocyte-2", "Naive-B", "Memory-B",
"Plasma-cell", "Basophil", "Immature-NK", "Mature-NK1", "Mature-NK2", "Naive-CD4-T1",
"Naive-CD4-T2", "Naive-Treg", "Memory-CD4-T", "Treg", "Naive-CD8-T1", "Naive-CD8-T2",
"Naive-CD8-T3", "Central-memory-CD8-T", "Effector-memory-CD8-T", "Gamma delta T")
num.labels <- length(cluster_names)
names(cluster_names) <- paste0( rep("Cluster", num.labels), seq(num.labels) )
bone$celltype <- cluster_names[as.character(bone$Clusters)]
bone[["ATAC"]]
## ChromatinAssay data with 570650 features for 9913 cells
## Variable features: 0
## Genome: hg19
## Annotation present: FALSE
## Motifs present: FALSE
## Fragment files: 1
Next we can add gene annotations for the hg19 genome to the object. This will be useful for computing quality control metrics (TSS enrichment score) and plotting.
library(EnsDb.Hsapiens.v75)
# extract gene annotations from EnsDb
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v75)
# change to UCSC style since the data was mapped to hg19
seqlevels(annotations) <- paste0('chr', seqlevels(annotations))
genome(annotations) <- "hg19"
# add the gene information to the object
Annotation(bone) <- annotations
We’ll compute TSS enrichment, nucleosome signal score, and the percentage of counts in genomic blacklist regions for each cell, and use these metrics to help remove low quality cells from the datasets.
bone <- TSSEnrichment(bone)
bone <- NucleosomeSignal(bone)
bone$blacklist_fraction <- FractionCountsInRegion(bone, regions = blacklist_hg19)
VlnPlot(
object = bone,
features = c("nCount_ATAC", "TSS.enrichment", "nucleosome_signal", "blacklist_fraction"),
pt.size = 0.1,
ncol = 4
)
bone <- bone[, (bone$nCount_ATAC < 50000) &
(bone$TSS.enrichment > 2) &
(bone$nucleosome_signal < 5)]
Next we can run a standard scATAC-seq analysis pipeline using Signac to perform dimension reduction, clustering, and cell type annotation.
bone <- FindTopFeatures(bone, min.cells = 10)
bone <- RunTFIDF(bone)
bone <- RunSVD(bone, n = 100)
DepthCor(bone)
bone <- RunUMAP(
bone,
reduction = "lsi",
dims = 2:50,
reduction.name = "UMAP"
)
bone <- FindNeighbors(bone, dims = 2:50, reduction = "lsi")
bone <- FindClusters(bone, resolution = 0.8, algorithm = 3)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 9765
## Number of edges: 433498
##
## Running smart local moving algorithm...
## Maximum modularity in 10 random starts: 0.8474
## Number of communities: 18
## Elapsed time: 4 seconds
Assign each cluster to the most common cell type based on the original annotations from the paper.
for(i in levels(bone)) {
cells_to_reid <- WhichCells(bone, idents = i)
newid <- names(sort(table(bone$celltype[cells_to_reid]),decreasing=TRUE))[1]
Idents(bone, cells = cells_to_reid) <- newid
}
bone$assigned_celltype <- Idents(bone)
DimPlot(bone, label = TRUE)
Next we can subset the different lineages and create a trajectory for each lineage. Another way to build the trajectories is to use the whole dataset and build separate pseudotime trajectories for the different cell partitions found by Monocle 3.
DefaultAssay(bone) <- "ATAC"
erythroid <- bone[, bone$assigned_celltype %in% c("HSC", "MEP", "CMP-BMP")]
lymphoid <- bone[, bone$assigned_celltype %in% c("HSC", "LMPP", "GMP", "CLP", "Pro-B", "pDC", "MDP", "GMP")]
We can convert the Seurat object to a CellDataSet object using the
as.cell_data_set()
function from SeuratWrappers
and build the trajectories using Monocle 3. We’ll do this separately for
erythroid and lymphoid lineages, but you could explore other strategies
building a trajectory for all lineages together.
erythroid.cds <- as.cell_data_set(erythroid)
erythroid.cds <- cluster_cells(cds = erythroid.cds, reduction_method = "UMAP")
erythroid.cds <- learn_graph(erythroid.cds, use_partition = TRUE)
lymphoid.cds <- as.cell_data_set(lymphoid)
lymphoid.cds <- cluster_cells(cds = lymphoid.cds, reduction_method = "UMAP")
lymphoid.cds <- learn_graph(lymphoid.cds, use_partition = TRUE)
To compute pseudotime estimates for each trajectory we need to decide
what the start of each trajectory is. In our case, we know that the
hematopoietic stem cells are the progenitors of other cell types in the
trajectory, so we can set these cells as the root of the trajectory.
Monocle 3 includes an interactive function to select cells as the root
nodes in the graph. This function will be launched if calling
order_cells()
without specifying the
root_cells
parameter. Here we’ve pre-selected some cells as
the root, and saved these to a file for reproducibility. This file can
be downloaded here.
# load the pre-selected HSCs
hsc <- readLines("../vignette_data/hsc_cells.txt")
# order cells
erythroid.cds <- order_cells(erythroid.cds, reduction_method = "UMAP", root_cells = hsc)
lymphoid.cds <- order_cells(lymphoid.cds, reduction_method = "UMAP", root_cells = hsc)
# plot trajectories colored by pseudotime
plot_cells(
cds = erythroid.cds,
color_cells_by = "pseudotime",
show_trajectory_graph = TRUE
)
plot_cells(
cds = lymphoid.cds,
color_cells_by = "pseudotime",
show_trajectory_graph = TRUE
)
Extract the pseudotime values and add to the Seurat object
bone <- AddMetaData(
object = bone,
metadata = erythroid.cds@principal_graph_aux@listData$UMAP$pseudotime,
col.name = "Erythroid"
)
bone <- AddMetaData(
object = bone,
metadata = lymphoid.cds@principal_graph_aux@listData$UMAP$pseudotime,
col.name = "Lymphoid"
)
FeaturePlot(bone, c("Erythroid", "Lymphoid"), pt.size = 0.1) & scale_color_viridis_c()
Thanks to the developers of Monocle 3, especially Cole Trapnell, Hannah Pliner, and members of the Trapnell lab. If you use Monocle please cite the Monocle papers.
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.0
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Asia/Singapore
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] EnsDb.Hsapiens.v75_2.99.0 ensembldb_2.24.1
## [3] AnnotationFilter_1.24.0 GenomicFeatures_1.52.2
## [5] AnnotationDbi_1.62.2 patchwork_1.1.3
## [7] ggplot2_3.4.4 Matrix_1.6-1.1
## [9] monocle3_1.3.4 SingleCellExperiment_1.22.0
## [11] SummarizedExperiment_1.30.2 GenomicRanges_1.52.0
## [13] GenomeInfoDb_1.36.3 IRanges_2.34.1
## [15] S4Vectors_0.38.2 MatrixGenerics_1.12.3
## [17] matrixStats_1.0.0 Biobase_2.60.0
## [19] BiocGenerics_0.46.0 SeuratWrappers_0.3.1
## [21] SeuratObject_4.1.4 Seurat_4.4.0
## [23] Signac_1.11.0
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.3 ProtGenerics_1.32.0 spatstat.sparse_3.0-2
## [4] bitops_1.0-7 httr_1.4.7 RColorBrewer_1.1-3
## [7] backports_1.4.1 tools_4.3.1 sctransform_0.4.0
## [10] utf8_1.2.3 R6_2.5.1 lazyeval_0.2.2
## [13] uwot_0.1.16 withr_2.5.1 sp_2.1-0
## [16] prettyunits_1.2.0 gridExtra_2.3 progressr_0.14.0
## [19] cli_3.6.1 textshaping_0.3.7 spatstat.explore_3.2-3
## [22] labeling_0.4.3 sass_0.4.7 spatstat.data_3.0-1
## [25] proxy_0.4-27 ggridges_0.5.4 pbapply_1.7-2
## [28] pkgdown_2.0.7 Rsamtools_2.16.0 systemfonts_1.0.5
## [31] foreign_0.8-84 R.utils_2.12.2 dichromat_2.0-0.1
## [34] parallelly_1.36.0 BSgenome_1.68.0 rstudioapi_0.15.0
## [37] RSQLite_2.3.1 generics_0.1.3 BiocIO_1.10.0
## [40] ica_1.0-3 spatstat.random_3.1-6 dplyr_1.1.3
## [43] ggbeeswarm_0.7.2 fansi_1.0.5 abind_1.4-5
## [46] R.methodsS3_1.8.2 terra_1.7-46 lifecycle_1.0.3
## [49] yaml_2.3.7 BiocFileCache_2.8.0 Rtsne_0.16
## [52] grid_4.3.1 blob_1.2.4 promises_1.2.1
## [55] crayon_1.5.2 miniUI_0.1.1.1 lattice_0.21-8
## [58] cowplot_1.1.1 KEGGREST_1.40.1 pillar_1.9.0
## [61] knitr_1.44 rjson_0.2.21 boot_1.3-28.1
## [64] future.apply_1.11.0 codetools_0.2-19 fastmatch_1.1-4
## [67] leiden_0.4.3 glue_1.6.2 leidenbase_0.1.25
## [70] data.table_1.14.8 remotes_2.4.2.1 vctrs_0.6.3
## [73] png_0.1-8 gtable_0.3.4 assertthat_0.2.1
## [76] cachem_1.0.8 xfun_0.40 S4Arrays_1.0.6
## [79] mime_0.12 survival_3.5-5 RcppRoll_0.3.0
## [82] ellipsis_0.3.2 fitdistrplus_1.1-11 ROCR_1.0-11
## [85] nlme_3.1-162 bit64_4.0.5 progress_1.2.2
## [88] filelock_1.0.2 RcppAnnoy_0.0.21 rprojroot_2.0.3
## [91] bslib_0.5.1 irlba_2.3.5.1 vipor_0.4.5
## [94] rpart_4.1.19 KernSmooth_2.23-21 colorspace_2.1-0
## [97] DBI_1.1.3 Hmisc_5.1-1 nnet_7.3-19
## [100] ggrastr_1.0.2 tidyselect_1.2.0 bit_4.0.5
## [103] compiler_4.3.1 curl_5.1.0 htmlTable_2.4.1
## [106] xml2_1.3.5 desc_1.4.2 DelayedArray_0.26.7
## [109] plotly_4.10.2 rtracklayer_1.60.1 checkmate_2.2.0
## [112] scales_1.2.1 lmtest_0.9-40 rappdirs_0.3.3
## [115] stringr_1.5.0 digest_0.6.33 goftest_1.2-3
## [118] spatstat.utils_3.0-3 minqa_1.2.6 rmarkdown_2.25
## [121] XVector_0.40.0 htmltools_0.5.6.1 pkgconfig_2.0.3
## [124] base64enc_0.1-3 lme4_1.1-34 dbplyr_2.3.4
## [127] fastmap_1.1.1 rlang_1.1.1 htmlwidgets_1.6.2
## [130] shiny_1.7.5 farver_2.1.1 jquerylib_0.1.4
## [133] zoo_1.8-12 jsonlite_1.8.7 BiocParallel_1.34.2
## [136] R.oo_1.25.0 VariantAnnotation_1.46.0 RCurl_1.98-1.12
## [139] magrittr_2.0.3 Formula_1.2-5 GenomeInfoDbData_1.2.10
## [142] munsell_0.5.0 Rcpp_1.0.11 viridis_0.6.4
## [145] reticulate_1.32.0 stringi_1.7.12 zlibbioc_1.46.0
## [148] MASS_7.3-60 plyr_1.8.9 parallel_4.3.1
## [151] listenv_0.9.0 ggrepel_0.9.3 deldir_1.0-9
## [154] Biostrings_2.68.1 splines_4.3.1 tensor_1.5
## [157] hms_1.1.3 igraph_1.5.1 spatstat.geom_3.2-5
## [160] reshape2_1.4.4 biomaRt_2.56.1 XML_3.99-0.14
## [163] evaluate_0.22 biovizBase_1.48.0 BiocManager_1.30.22
## [166] nloptr_2.0.3 httpuv_1.6.11 RANN_2.6.1
## [169] tidyr_1.3.0 purrr_1.0.2 polyclip_1.10-6
## [172] future_1.33.0 scattermore_1.2 rsvd_1.0.5
## [175] xtable_1.8-4 restfulr_0.0.15 later_1.3.1
## [178] viridisLite_0.4.2 ragg_1.2.6 tibble_3.2.1
## [181] beeswarm_0.4.0 memoise_2.0.1 GenomicAlignments_1.36.0
## [184] cluster_2.1.4 globals_0.16.2