vignettes/future.Rmd
future.Rmd
Parallel computing is supported in Signac through the future package,
making it easy to specify different parallelization options. Here we
demonstrate parallelization of the FeatureMatrix
function
and show some benchmark results to get a sense for the amount of speedup
you might expect.
The Seurat package also
uses future
for parallelization, and you can see the Seurat
vignette
for more information.
The following functions currently enable parallelization in Signac:
Parallelization can be enabled simply by importing the
future
package and setting the plan
.
## sequential:
## - args: function (..., envir = parent.frame())
## - tweaked: FALSE
## - call: NULL
By default the plan is set to sequential processing (no
parallelization). We can change this to multicore
or
multisession
to get asynchronous processing, and set the
number of workers to change the number of cores used.
## multicore:
## - args: function (..., workers = 10, envir = parent.frame())
## - tweaked: TRUE
## - call: plan("multicore", workers = 10)
You might also need to increase the maximum memory usage:
options(future.globals.maxSize = 50 * 1024 ^ 3) # for 50 Gb RAM
Note that as of future
version 1.14.0,
forked processing is disabled when running in RStudio. To enable
parallel computing in RStudio, you will need to select the
“multisession” option.
Here we demonstrate the runtime of FeatureMatrix
run on
144,023 peaks for 9,688 human PBMCs under different parallelization
options:
The following code was run on REHL with Intel Platinum 8268 CPU @ 2.00GHz
# download data
wget https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_10k_nextgem/atac_pbmc_10k_nextgem_fragments.tsv.gz
wget https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_10k_nextgem/atac_pbmc_10k_nextgem_fragments.tsv.gz.tbi
wget https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_10k_nextgem/atac_pbmc_10k_nextgem_peaks.bed
wget https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_10k_nextgem/atac_pbmc_10k_nextgem_singlecell.csv
library(Signac)
# load data
fragments <- "../vignette_data/atac_pbmc_10k_nextgem_fragments.tsv.gz"
peaks.10k <- read.table(
file = "../vignette_data/atac_pbmc_10k_nextgem_peaks.bed",
col.names = c("chr", "start", "end")
)
peaks <- GenomicRanges::makeGRangesFromDataFrame(peaks.10k)
md <- read.csv("../vignette_data/atac_pbmc_10k_nextgem_singlecell.csv", row.names = 1, header = TRUE)[-1, ]
cells <- rownames(md[md[['is__cell_barcode']] == 1, ])
fragments <- CreateFragmentObject(path = fragments, cells = cells, validate.fragments = FALSE)
# set number of replicates
nrep <- 5
results <- data.frame()
process_n <- 2000
# run sequentially
timing.sequential <- c()
for (i in seq_len(nrep)) {
start <- Sys.time()
fmat <- FeatureMatrix(fragments = fragments, features = peaks, cells = cells, process_n = process_n)
timing.sequential <- c(timing.sequential, as.numeric(Sys.time() - start, units = "secs"))
}
res <- data.frame(
"setting" = rep("Sequential", nrep),
"cores" = rep(1, nrep),
"replicate" = seq_len(nrep),
"time" = timing.sequential
)
results <- rbind(results, res)
# 4 core
library(future)
plan("multicore", workers = 4)
options(future.globals.maxSize = 100000 * 1024^2)
timing.4core <- c()
for (i in seq_len(nrep)) {
start <- Sys.time()
fmat <- FeatureMatrix(fragments = fragments, features = peaks, cells = cells, process_n = process_n)
timing.4core <- c(timing.4core, as.numeric(Sys.time() - start, units = "secs"))
}
res <- data.frame(
"setting" = rep("Parallel", nrep),
"cores" = rep(4, nrep),
"replicate" = seq_len(nrep),
"time" = timing.4core
)
results <- rbind(results, res)
# 10 core
plan("multicore", workers = 10)
timing.10core <- c()
for (i in seq_len(nrep)) {
start <- Sys.time()
fmat <- FeatureMatrix(fragments = fragments, features = peaks, cells = cells, process_n = process_n)
timing.10core <- c(timing.10core, as.numeric(Sys.time() - start, units = "secs"))
}
res <- data.frame(
"setting" = rep("Parallel", nrep),
"cores" = rep(10, nrep),
"replicate" = seq_len(nrep),
"time" = timing.10core
)
results <- rbind(results, res)
# save results
write.table(
x = results,
file = paste0("../vignette_data/pbmc10k/timings_", Sys.Date(), ".tsv"),
quote = FALSE,
row.names = FALSE
)
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.0
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Asia/Singapore
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.4.4 future_1.33.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.7 utf8_1.2.3 generics_0.1.3 stringi_1.7.12
## [5] listenv_0.9.0 digest_0.6.33 magrittr_2.0.3 evaluate_0.22
## [9] grid_4.3.1 fastmap_1.1.1 rprojroot_2.0.3 jsonlite_1.8.7
## [13] purrr_1.0.2 fansi_1.0.5 scales_1.2.1 codetools_0.2-19
## [17] textshaping_0.3.7 jquerylib_0.1.4 cli_3.6.1 rlang_1.1.1
## [21] parallelly_1.36.0 munsell_0.5.0 withr_2.5.1 cachem_1.0.8
## [25] yaml_2.3.7 tools_4.3.1 parallel_4.3.1 memoise_2.0.1
## [29] dplyr_1.1.3 colorspace_2.1-0 globals_0.16.2 vctrs_0.6.3
## [33] R6_2.5.1 lifecycle_1.0.3 stringr_1.5.0 fs_1.6.3
## [37] ragg_1.2.6 pkgconfig_2.0.3 desc_1.4.2 pkgdown_2.0.7
## [41] bslib_0.5.1 pillar_1.9.0 gtable_0.3.4 glue_1.6.2
## [45] systemfonts_1.0.5 xfun_0.40 tibble_3.2.1 tidyselect_1.2.0
## [49] rstudioapi_0.15.0 knitr_1.44 farver_2.1.1 htmltools_0.5.6.1
## [53] rmarkdown_2.25 labeling_0.4.3 compiler_4.3.1