vignettes/cicero.Rmd
cicero.Rmd
In this vignette we will demonstrate how to find cis-co-accessible networks with Cicero using single-cell ATAC-seq data. Please see the Cicero website for information about Cicero.
To facilitate conversion between the Seurat (used by Signac) and CellDataSet (used by Cicero) formats, we will use a conversion function in the SeuratWrappers package available on GitHub.
We will use a single-cell ATAC-seq dataset containing human CD34+ hematopoietic stem and progenitor cells published by Satpathy and Granja et al. (2019, Nature Biotechnology). The processed datasets are available on NCBI GEO here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE129785
This is the same dataset we used in the trajectory vignette, and we’ll start by loading the dataset that was created in that vignette. See the trajectory vignette for the code used to create the object from raw data.
First we will load their dataset and perform some standard preprocessing using Signac.
# load the object created in the Monocle 3 vignette
bone <- readRDS("cd34.rds")
We can find cis-co-accessible networks (CCANs) using Cicero.
The Cicero developers have developed a separate branch of the package
that works with a Monocle 3 CellDataSet
object. We will
first make sure this branch is installed, then convert our
Seurat
object for the whole bone marrow dataset to
CellDataSet
format.
# Install Cicero
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("cole-trapnell-lab/cicero-release", ref = "monocle3")
## monocle3 (e658515b2... -> 98402ed0c...) [GitHub]
## openssl (2.2.0 -> 2.2.1 ) [CRAN]
## tinytex (0.51 -> 0.52 ) [CRAN]
## bslib (0.7.0 -> 0.8.0 ) [CRAN]
## yaml (2.3.8 -> 2.3.10 ) [CRAN]
## xfun (0.44 -> 0.47 ) [CRAN]
## evaluate (0.23 -> 0.24.0 ) [CRAN]
## checkmate (2.3.1 -> 2.3.2 ) [CRAN]
## XML (3.99-0.16.1 -> 3.99-0.17 ) [CRAN]
## knitr (1.45 -> 1.48 ) [CRAN]
## rmarkdown (2.27 -> 2.28 ) [CRAN]
## htmlTable (2.4.2 -> 2.4.3 ) [CRAN]
## DBI (1.2.2 -> 1.2.3 ) [CRAN]
## Hmisc (5.1-2 -> 5.1-3 ) [CRAN]
##
## The downloaded binary packages are in
## /var/folders/n5/1l8rdp0951gcwznhr_sxwj0m0000gp/T//Rtmp2ZNmZ2/downloaded_packages
## wk (0.9.1 -> 0.9.2 ) [CRAN]
## s2 (1.1.6 -> 1.1.7 ) [CRAN]
## dqrng (0.4.0 -> 0.4.1 ) [CRAN]
## RSpectra (0.16-1 -> 0.16-2 ) [CRAN]
## biglm (0.9-2.1 -> 0.9-3 ) [CRAN]
## spData (2.3.0 -> 2.3.1 ) [CRAN]
## nloptr (2.0.3 -> 2.1.1 ) [CRAN]
## minqa (1.2.7 -> 1.2.8 ) [CRAN]
## uwot (0.1.16 -> 0.2.2 ) [CRAN]
## spdep (1.3-3 -> 1.3-5 ) [CRAN]
## slam (0.1-50 -> 0.1-52 ) [CRAN]
## shiny (1.8.1.1 -> 1.9.1 ) [CRAN]
## lme4 (1.1-35.3 -> 1.1-35.5) [CRAN]
## leidenbase (0.1.27 -> 0.1.30 ) [CRAN]
##
## The downloaded binary packages are in
## /var/folders/n5/1l8rdp0951gcwznhr_sxwj0m0000gp/T//Rtmp2ZNmZ2/downloaded_packages
## ── R CMD build ─────────────────────────────────────────────────────────────────
## * checking for file ‘/private/var/folders/n5/1l8rdp0951gcwznhr_sxwj0m0000gp/T/Rtmp2ZNmZ2/remotes480c3f5e994e/cole-trapnell-lab-monocle3-98402ed/DESCRIPTION’ ... OK
## * preparing ‘monocle3’:
## * checking DESCRIPTION meta-information ... OK
## * cleaning src
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## Omitted ‘LazyData’ from DESCRIPTION
## * building ‘monocle3_1.3.7.tar.gz’
##
## ── R CMD build ─────────────────────────────────────────────────────────────────
## * checking for file ‘/private/var/folders/n5/1l8rdp0951gcwznhr_sxwj0m0000gp/T/Rtmp2ZNmZ2/remotes480c22118ef8/cole-trapnell-lab-cicero-release-2b0ee07/DESCRIPTION’ ... OK
## * preparing ‘cicero’:
## * checking DESCRIPTION meta-information ... OK
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## * looking to see if a ‘data/datalist’ file should be added
## * building ‘cicero_1.3.9.tar.gz’
library(cicero)
# convert to CellDataSet format and make the cicero object
bone.cds <- as.cell_data_set(x = bone)
bone.cicero <- make_cicero_cds(bone.cds, reduced_coordinates = reducedDims(bone.cds)$UMAP)
We’ll demonstrate running Cicero here using just one chromosome to save some time, but the same workflow can be applied to find CCANs for the whole genome.
Here we demonstrate the most basic workflow for running Cicero. This workflow can be broken down into several steps, each with parameters that can be changed from their defaults to fine-tune the Cicero algorithm depending on your data. We highly recommend that you explore the Cicero website, paper, and documentation for more information.
# get the chromosome sizes from the Seurat object
genome <- seqlengths(bone)
# use chromosome 1 to save some time
# omit this step to run on the whole genome
genome <- genome[1]
# convert chromosome sizes to a dataframe
genome.df <- data.frame("chr" = names(genome), "length" = genome)
# run cicero
conns <- run_cicero(bone.cicero, genomic_coords = genome.df, sample_num = 100)
## [1] "Starting Cicero"
## [1] "Calculating distance_parameter value"
## [1] "Running models"
## [1] "Assembling connections"
## [1] "Successful cicero models: 755"
## [1] "Other models: "
##
## Too many elements in range Zero or one element in range
## 157 86
## [1] "Models with errors: 0"
## [1] "Done"
head(conns)
## Peak1 Peak2 coaccess
## 1 chr1-100003337-100003837 chr1-99791719-99792219 0
## 2 chr1-100003337-100003837 chr1-99828699-99829199 0
## 3 chr1-100003337-100003837 chr1-99835542-99836042 0
## 4 chr1-100003337-100003837 chr1-99836217-99836717 0
## 5 chr1-100003337-100003837 chr1-99839576-99840076 0
## 6 chr1-100003337-100003837 chr1-99840640-99841140 0
Now that we’ve found pairwise co-accessibility scores for each peak,
we can now group these pairwise connections into larger co-accessible
networks using the generate_ccans()
function from
Cicero.
ccans <- generate_ccans(conns)
## [1] "Coaccessibility cutoff used: 0.19"
head(ccans)
## Peak CCAN
## chr1-10009702-10010202 chr1-10009702-10010202 1
## chr1-100151188-100151688 chr1-100151188-100151688 2
## chr1-100164787-100165287 chr1-100164787-100165287 2
## chr1-100165566-100166066 chr1-100165566-100166066 2
## chr1-100202505-100203005 chr1-100202505-100203005 3
## chr1-100215491-100215991 chr1-100215491-100215991 3
We can add the co-accessible links found by Cicero to the
ChromatinAssay
object in Seurat. Using the
ConnectionsToLinks()
function in Signac we can convert the
outputs of Cicero to the format needed to store in the
links
slot in the ChromatinAssay
, and add this
to the object using the Links<-
assignment function.
links <- ConnectionsToLinks(conns = conns, ccans = ccans)
Links(bone) <- links
We can now visualize these links along with DNA accessibility
information by running CoveragePlot()
for a region:
CoveragePlot(bone, region = "chr1-40189344-40252549")
Thanks to the developers of Cicero, especially Cole Trapnell, Hannah Pliner, and members of the Trapnell lab. If you use Cicero please cite the Cicero paper.
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Asia/Singapore
## tzcode source: internal
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] cicero_1.3.9 Gviz_1.46.1
## [3] monocle3_1.3.7 SingleCellExperiment_1.24.0
## [5] SummarizedExperiment_1.32.0 GenomicRanges_1.54.1
## [7] GenomeInfoDb_1.38.8 IRanges_2.36.0
## [9] S4Vectors_0.40.2 MatrixGenerics_1.14.0
## [11] matrixStats_1.3.0 Biobase_2.62.0
## [13] BiocGenerics_0.48.1 patchwork_1.2.0
## [15] ggplot2_3.5.1 SeuratWrappers_0.3.1
## [17] Seurat_5.1.0 SeuratObject_5.0.2
## [19] sp_2.1-4 Signac_1.14.0
##
## loaded via a namespace (and not attached):
## [1] R.methodsS3_1.8.2 dichromat_2.0-0.1 progress_1.2.3
## [4] nnet_7.3-19 goftest_1.2-3 Biostrings_2.70.3
## [7] vctrs_0.6.5 spatstat.random_3.2-3 digest_0.6.37
## [10] png_0.1-8 ggrepel_0.9.5 deldir_2.0-4
## [13] parallelly_1.38.0 combinat_0.0-8 MASS_7.3-60.0.1
## [16] pkgdown_2.0.9 reshape2_1.4.4 httpuv_1.6.15
## [19] withr_3.0.1 xfun_0.44 survival_3.6-4
## [22] memoise_2.0.1 systemfonts_1.1.0 ragg_1.3.2
## [25] zoo_1.8-12 pbapply_1.7-2 R.oo_1.26.0
## [28] Formula_1.2-5 prettyunits_1.2.0 KEGGREST_1.42.0
## [31] promises_1.3.0 httr_1.4.7 restfulr_0.0.15
## [34] globals_0.16.3 fitdistrplus_1.1-11 ps_1.7.6
## [37] rstudioapi_0.16.0 miniUI_0.1.1.1 generics_0.1.3
## [40] base64enc_0.1-3 processx_3.8.4 curl_5.2.1
## [43] zlibbioc_1.48.2 polyclip_1.10-6 GenomeInfoDbData_1.2.11
## [46] SparseArray_1.2.4 xtable_1.8-4 stringr_1.5.1
## [49] desc_1.4.3 evaluate_0.23 S4Arrays_1.2.1
## [52] BiocFileCache_2.10.2 hms_1.1.3 irlba_2.3.5.1
## [55] colorspace_2.1-1 filelock_1.0.3 ROCR_1.0-11
## [58] reticulate_1.37.0 spatstat.data_3.0-4 magrittr_2.0.3
## [61] lmtest_0.9-40 later_1.3.2 viridis_0.6.5
## [64] lattice_0.22-6 spatstat.geom_3.2-9 future.apply_1.11.2
## [67] scattermore_1.2 XML_3.99-0.16.1 cowplot_1.1.3
## [70] RcppAnnoy_0.0.22 Hmisc_5.1-2 pillar_1.9.0
## [73] nlme_3.1-164 compiler_4.3.1 RSpectra_0.16-1
## [76] stringi_1.8.4 tensor_1.5 minqa_1.2.7
## [79] GenomicAlignments_1.38.2 plyr_1.8.9 crayon_1.5.3
## [82] abind_1.4-5 BiocIO_1.12.0 bit_4.0.5
## [85] dplyr_1.1.4 fastmatch_1.1-4 codetools_0.2-19
## [88] textshaping_0.4.0 bslib_0.7.0 slam_0.1-50
## [91] biovizBase_1.50.0 plotly_4.10.4 monocle_2.30.1
## [94] mime_0.12 leidenbase_0.1.27 splines_4.3.1
## [97] Rcpp_1.0.13 fastDummies_1.7.3 dbplyr_2.5.0
## [100] interp_1.1-6 knitr_1.45 blob_1.2.4
## [103] utf8_1.2.4 AnnotationFilter_1.26.0 lme4_1.1-35.3
## [106] fs_1.6.4 listenv_0.9.1 checkmate_2.3.1
## [109] pkgbuild_1.4.4 tibble_3.2.1 Matrix_1.6-5
## [112] callr_3.7.6 statmod_1.5.0 tweenr_2.0.3
## [115] pkgconfig_2.0.3 pheatmap_1.0.12 tools_4.3.1
## [118] cachem_1.1.0 RSQLite_2.3.7 viridisLite_0.4.2
## [121] DBI_1.2.2 fastmap_1.2.0 rmarkdown_2.27
## [124] scales_1.3.0 ica_1.0-3 Rsamtools_2.18.0
## [127] sass_0.4.9 FNN_1.1.4 BiocManager_1.30.23
## [130] dotCall64_1.1-1 VariantAnnotation_1.48.1 RANN_2.6.1
## [133] rpart_4.1.23 farver_2.1.2 yaml_2.3.8
## [136] VGAM_1.1-11 latticeExtra_0.6-30 foreign_0.8-86
## [139] rtracklayer_1.62.0 cli_3.6.3 purrr_1.0.2
## [142] leiden_0.4.3.1 lifecycle_1.0.4 uwot_0.1.16
## [145] backports_1.5.0 BiocParallel_1.36.0 gtable_0.3.5
## [148] rjson_0.2.21 ggridges_0.5.6 progressr_0.14.0
## [151] parallel_4.3.1 limma_3.58.1 jsonlite_1.8.8
## [154] RcppHNSW_0.6.0 bitops_1.0-8 bit64_4.0.5
## [157] assertthat_0.2.1 Rtsne_0.17 spatstat.utils_3.0-4
## [160] jquerylib_0.1.4 highr_0.11 R.utils_2.12.3
## [163] lazyeval_0.2.2 shiny_1.8.1.1 htmltools_0.5.8.1
## [166] DDRTree_0.1.5 sctransform_0.4.1 rappdirs_0.3.3
## [169] ensembldb_2.26.0 glue_1.7.0 spam_2.10-0
## [172] XVector_0.42.0 RCurl_1.98-1.16 BSgenome_1.70.2
## [175] jpeg_0.1-10 gridExtra_2.3 boot_1.3-30
## [178] igraph_2.0.3 R6_2.5.1 tidyr_1.3.1
## [181] RcppRoll_0.3.1 labeling_0.4.3 HSMMSingleCell_1.22.0
## [184] GenomicFeatures_1.54.4 cluster_2.1.6 nloptr_2.0.3
## [187] DelayedArray_0.28.0 tidyselect_1.2.1 ProtGenerics_1.34.0
## [190] htmlTable_2.4.2 ggforce_0.4.2 xml2_1.3.6
## [193] AnnotationDbi_1.64.1 future_1.34.0 fastICA_1.2-4
## [196] rsvd_1.0.5 munsell_0.5.1 KernSmooth_2.23-24
## [199] data.table_1.15.4 htmlwidgets_1.6.4 RColorBrewer_1.1-3
## [202] biomaRt_2.58.2 rlang_1.1.4 spatstat.sparse_3.0-3
## [205] spatstat.explore_3.2-7 remotes_2.5.0 fansi_1.0.6