vignettes/data_structures.Rmd
data_structures.Rmd
The Signac package is an extension of Seurat designed for the analysis of genomic single-cell assays. This includes any assay that generates signal mapped to genomic coordinates, such as scATAC-seq, scCUT&Tag, scACT-seq, and other methods.
As the analysis of these single-cell chromatin datasets presents some unique challenges in comparison to the analysis of scRNA-seq data, we have created an extended Assay
class to store the additional information needed, including:
A major advantage of the Signac design is its interoperability with existing functions in the Seurat package, and other packages that are able to use the Seurat object. This enables straightforward analysis of multimodal single-cell data through the addition of different assays to the Seurat object.
Here we outline the design of each class defined in the Signac package, and demonstrate methods that can be run on each class.
ChromatinAssay
ClassThe ChromatinAssay
class extends the standard Seurat Assay
class and adds several additional slots for data useful for the analysis of single-cell chromatin datasets. The class includes all the slots present in a standard Seurat Assay, with the following additional slots:
ranges
: A GRanges
object containing the genomic coordinates of each feature in the data
matrix.motifs
: A Motif
objectfragments
: A list of Fragment
objectsseqinfo
: A Seqinfo
object containing information about the genome that the data was mapped toannotation
: A GRanges
object containing gene annotationsbias
: A vector containing Tn5 integration bias information (the frequency of Tn5 integration at different hexamers)positionEnrichment
: A named list of matrices containing positional enrichment scores for Tn5 integration (for example, enrichment at the TSS or at different TF motifs)links
: A GRanges
object describing linked genomic positions, such as co-accessible sites or enhancer-gene regulatory relationships.ChromatinAssay
A ChromatinAssay
object can be constructed using the CreateChromatinAssay()
function.
# get some data to use in the following examples
counts <- GetAssayData(atac_small, slot = "counts")
# create a standalone ChromatinAssay object
chromatinassay <- CreateChromatinAssay(counts = counts, genome = "hg19")
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## Loading required package: stats4
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
Here the genome
parameter can be used to set the seqinfo
slot. We can pass the name of a genome present in UCSC (e.g., “hg19” or “mm10”), or we can pass a Seqinfo
-class object.
To create a Seurat object that contains a ChromatinAssay
rather than a standard Assay
, we can initialize the object using the ChromatinAssay
rather than a count matrix. Note that this feature was added in Seurat 3.2.
# create a Seurat object containing a ChromatinAssay
object <- CreateSeuratObject(counts = chromatinassay)
ChromatinAssay
to a Seurat
objectTo add a new ChromatinAssay
object to an existing Seurat object, we can use the standard assignment operation used for adding standard Assay
objects and other data types to the Seurat object.
# create a chromatin assay and add it to an existing Seurat object
object[["peaks"]] <- CreateChromatinAssay(counts = counts, genome = "hg19")
ChromatinAssay
dataWe can get/set data for the ChromatinAssay
in much the same way we do for a standard Assay
object: using the GetAssayData
and SetAssayData
functions defined in Seurat
. For example:
## Getting
# access the data slot, found in standard Assays and ChromatinAssays
data <- GetAssayData(atac_small, slot = "data")
# access the bias slot, unique to the ChromatinAssay
bias <- GetAssayData(atac_small, slot = "bias")
## Setting
# set the data slot
atac_small <- SetAssayData(atac_small, slot = "data", new.data = data)
# set the bias slot
bias <- rep(1, 100) # create a dummy bias vector
atac_small <- SetAssayData(atac_small, slot = "bias", new.data = bias)
We also have a variety of convenience functions defined for getting/setting data in specific slots. This includes the Fragments()
, Motifs()
, Links()
, and Annotation()
functions. For example, to get or set gene annotation data we can use the Annotation()
getter and Annotation<-
setter functions:
# first get some gene annotations for hg19
library(EnsDb.Hsapiens.v75)
# convert EnsDb to GRanges
gene.ranges <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v75)
# convert to UCSC style
seqlevelsStyle(gene.ranges) <- "UCSC"
genome(gene.ranges) <- "hg19"
# set gene annotations
Annotation(atac_small) <- gene.ranges
# get gene annotation information
Annotation(atac_small)
## GRanges object with 3072120 ranges and 5 metadata columns:
## seqnames ranges strand | tx_id gene_name
## <Rle> <IRanges> <Rle> | <character> <character>
## ENSE00001489430 chrX 192989-193061 + | ENST00000399012 PLCXD1
## ENSE00001536003 chrX 192991-193061 + | ENST00000484611 PLCXD1
## ENSE00002160563 chrX 193020-193061 + | ENST00000430923 PLCXD1
## ENSE00001750899 chrX 197722-197788 + | ENST00000445062 PLCXD1
## ENSE00001489388 chrX 197859-198351 + | ENST00000381657 PLCXD1
## ... ... ... ... . ... ...
## ENST00000361739 chrMT 7586-8269 + | ENST00000361739 MT-CO2
## ENST00000361789 chrMT 14747-15887 + | ENST00000361789 MT-CYB
## ENST00000361851 chrMT 8366-8572 + | ENST00000361851 MT-ATP8
## ENST00000361899 chrMT 8527-9207 + | ENST00000361899 MT-ATP6
## ENST00000362079 chrMT 9207-9990 + | ENST00000362079 MT-CO3
## gene_id gene_biotype type
## <character> <character> <factor>
## ENSE00001489430 ENSG00000182378 protein_coding exon
## ENSE00001536003 ENSG00000182378 protein_coding exon
## ENSE00002160563 ENSG00000182378 protein_coding exon
## ENSE00001750899 ENSG00000182378 protein_coding exon
## ENSE00001489388 ENSG00000182378 protein_coding exon
## ... ... ... ...
## ENST00000361739 ENSG00000198712 protein_coding cds
## ENST00000361789 ENSG00000198727 protein_coding cds
## ENST00000361851 ENSG00000228253 protein_coding cds
## ENST00000361899 ENSG00000198899 protein_coding cds
## ENST00000362079 ENSG00000198938 protein_coding cds
## -------
## seqinfo: 25 sequences from hg19 genome
The Fragments()
, Motifs()
, and Links()
functions are demonstrated in other sections below.
ChromatinAssay
methodsAs the ChromatinAssay
object uses Bioconductor objects like GRanges
and Seqinfo
, we can also call standard Bioconductor functions defined in the IRanges
, GenomicRanges
, and GenomeInfoDb
packages on the ChromatinAssay
object (or a Seurat object with a ChromatinAssay
as the default assay).
The following methods use the genomic ranges stored in a ChromatinAssay
object.
# extract the genomic ranges associated with each feature in the data matrix
granges(atac_small)
## GRanges object with 323 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 713460-714823 *
## [2] chr1 752422-753038 *
## [3] chr1 762106-763359 *
## [4] chr1 779589-780271 *
## [5] chr1 804872-805761 *
## ... ... ... ...
## [319] chr1 9299648-9300348 *
## [320] chr1 9327071-9327557 *
## [321] chr1 9335457-9336176 *
## [322] chr1 9349019-9350779 *
## [323] chr1 9352328-9354391 *
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
# find the nearest range
nearest(atac_small, subject = Annotation(atac_small))
## [1] 353132 313545 353180 353181 316856 352756 352756 158070 443435 429683
## [11] 384995 416914 424674 158278 158279 433593 158289 158292 416882 416846
## [21] 330101 416851 158359 158360 368091 367827 332919 370483 158535 308473
## [31] 158541 158542 363998 355291 433871 416767 323350 416760 158690 372319
## [41] 364664 431403 439123 416719 432709 427719 355667 416689 443324 386709
## [51] 434670 303555 432660 416683 381736 159871 416657 423129 392261 365826
## [61] 422594 159971 436548 343096 436566 355859 416566 332077 332077 363936
## [71] 370347 351939 160801 416517 416520 338322 363920 326578 326578 327323
## [81] 327323 327323 327323 327323 327323 327323 327323 327323 327323 367095
## [91] 160980 160980 434820 434837 443759 324844 161135 431471 340026 340026
## [101] 340026 340026 369496 340577 340577 340577 341621 435222 435223 432912
## [111] 443427 426548 416453 335191 442153 161420 416439 342162 342162 342162
## [121] 342162 303306 363897 352608 416438 436339 428684 317981 428699 428703
## [131] 342827 435474 416332 161829 428988 339920 161934 161934 440061 306261
## [141] 306263 351669 162155 162156 363861 342747 363853 434544 380558 334073
## [151] 334074 433697 162510 162510 416213 363843 422863 352374 352374 352399
## [161] 352402 162725 375321 442859 416141 303012 416142 416109 434615 430550
## [171] 391633 391635 315910 162940 162949 162963 162981 162981 307713 371219
## [181] 338832 338832 330394 330394 375499 353291 352159 352159 352159 375501
## [191] 416050 434442 435105 376297 415948 415966 434227 415947 363703 352789
## [201] 352789 415912 352790 444057 163718 363689 326449 415881 302688 421541
## [211] 432843 330143 430627 436030 442123 439413 363667 440058 415745 434148
## [221] 363657 435144 415713 434010 349513 423368 164491 309751 164496 442737
## [231] 314928 314928 314934 314934 314936 314936 314936 339456 376041 363644
## [241] 415671 363635 363635 328360 363637 164717 371613 371633 338776 338777
## [251] 433987 371077 316670 316670 316670 164797 164797 164798 164798 164798
## [261] 164802 310712 164803 164804 415606 355641 338270 338270 338270 338270
## [271] 338270 338270 431114 322988 322988 322988 431117 342023 342023 415605
## [281] 342786 331593 331593 331594 331594 369220 331594 331594 364635 353318
## [291] 435250 340755 371936 165021 165021 165023 165023 350877 415587 363619
## [301] 363619 418872 305795 434018 335496 335496 165154 363612 323389 323542
## [311] 420824 165164 165164 165167 165167 314147 314148 363609 443585 375482
## [321] 363611 165185 355274
# distance to the nearest range
distanceToNearest(atac_small, subject = Annotation(atac_small))
## Hits object with 323 hits and 1 metadata column:
## queryHits subjectHits | distance
## <integer> <integer> | <integer>
## [1] 1 353132 | 0
## [2] 2 313545 | 0
## [3] 3 353180 | 0
## [4] 4 353181 | 0
## [5] 5 316856 | 0
## ... ... ... . ...
## [319] 319 443585 | 0
## [320] 320 375482 | 0
## [321] 321 363611 | 4060
## [322] 322 165185 | 2159
## [323] 323 355274 | 0
## -------
## queryLength: 323 / subjectLength: 3072120
# find overlaps with another set of genomic ranges
findOverlaps(atac_small, subject = Annotation(atac_small))
## Hits object with 4615 hits and 0 metadata columns:
## queryHits subjectHits
## <integer> <integer>
## [1] 1 157933
## [2] 1 157934
## [3] 1 157935
## [4] 1 157936
## [5] 1 157937
## ... ... ...
## [4611] 320 363611
## [4612] 320 375482
## [4613] 323 165185
## [4614] 323 270557
## [4615] 323 355274
## -------
## queryLength: 323 / subjectLength: 3072120
Many other methods are defined, see the documentation for nearest-methods
, findOverlaps-methods
, inter-range-methods
, and coverage
in Signac for a full list.
The following methods use the seqinfo
data stored in a ChromatinAssay
object.
# get the full seqinfo information
seqinfo(atac_small)
## Seqinfo object with 298 sequences (2 circular) from hg19 genome:
## seqnames seqlengths isCircular genome
## chr1 249250621 FALSE hg19
## chr2 243199373 FALSE hg19
## chr3 198022430 FALSE hg19
## chr4 191154276 FALSE hg19
## chr5 180915260 FALSE hg19
## ... ... ... ...
## chr21_gl383580_alt 74652 FALSE hg19
## chr21_gl383581_alt 116690 FALSE hg19
## chr22_gl383582_alt 162811 FALSE hg19
## chr22_gl383583_alt 96924 FALSE hg19
## chr22_kb663609_alt 74013 FALSE hg19
# get the genome information
genome(atac_small)
## chr1 chr2 chr3
## "hg19" "hg19" "hg19"
## chr4 chr5 chr6
## "hg19" "hg19" "hg19"
## chr7 chr8 chr9
## "hg19" "hg19" "hg19"
## chr10 chr11 chr12
## "hg19" "hg19" "hg19"
## chr13 chr14 chr15
## "hg19" "hg19" "hg19"
## chr16 chr17 chr18
## "hg19" "hg19" "hg19"
## chr19 chr20 chr21
## "hg19" "hg19" "hg19"
## chr22 chrX chrY
## "hg19" "hg19" "hg19"
## chrM chrMT chr4_ctg9_hap1
## "hg19" "hg19" "hg19"
## chr6_apd_hap1 chr6_cox_hap2 chr6_dbb_hap3
## "hg19" "hg19" "hg19"
## chr6_mann_hap4 chr6_mcf_hap5 chr6_qbl_hap6
## "hg19" "hg19" "hg19"
## chr6_ssto_hap7 chr17_ctg5_hap1 chr1_gl000191_random
## "hg19" "hg19" "hg19"
## chr1_gl000192_random chr4_gl000193_random chr4_gl000194_random
## "hg19" "hg19" "hg19"
## chr7_gl000195_random chr8_gl000196_random chr8_gl000197_random
## "hg19" "hg19" "hg19"
## chr9_gl000198_random chr9_gl000199_random chr9_gl000200_random
## "hg19" "hg19" "hg19"
## chr9_gl000201_random chr11_gl000202_random chr17_gl000203_random
## "hg19" "hg19" "hg19"
## chr17_gl000204_random chr17_gl000205_random chr17_gl000206_random
## "hg19" "hg19" "hg19"
## chr18_gl000207_random chr19_gl000208_random chr19_gl000209_random
## "hg19" "hg19" "hg19"
## chr21_gl000210_random chrUn_gl000211 chrUn_gl000212
## "hg19" "hg19" "hg19"
## chrUn_gl000213 chrUn_gl000214 chrUn_gl000215
## "hg19" "hg19" "hg19"
## chrUn_gl000216 chrUn_gl000217 chrUn_gl000218
## "hg19" "hg19" "hg19"
## chrUn_gl000219 chrUn_gl000220 chrUn_gl000221
## "hg19" "hg19" "hg19"
## chrUn_gl000222 chrUn_gl000223 chrUn_gl000224
## "hg19" "hg19" "hg19"
## chrUn_gl000225 chrUn_gl000226 chrUn_gl000227
## "hg19" "hg19" "hg19"
## chrUn_gl000228 chrUn_gl000229 chrUn_gl000230
## "hg19" "hg19" "hg19"
## chrUn_gl000231 chrUn_gl000232 chrUn_gl000233
## "hg19" "hg19" "hg19"
## chrUn_gl000234 chrUn_gl000235 chrUn_gl000236
## "hg19" "hg19" "hg19"
## chrUn_gl000237 chrUn_gl000238 chrUn_gl000239
## "hg19" "hg19" "hg19"
## chrUn_gl000240 chrUn_gl000241 chrUn_gl000242
## "hg19" "hg19" "hg19"
## chrUn_gl000243 chrUn_gl000244 chrUn_gl000245
## "hg19" "hg19" "hg19"
## chrUn_gl000246 chrUn_gl000247 chrUn_gl000248
## "hg19" "hg19" "hg19"
## chrUn_gl000249 chr1_gl383516_fix chr1_gl383517_fix
## "hg19" "hg19" "hg19"
## chr1_gl949741_fix chr1_jh636052_fix chr1_jh636053_fix
## "hg19" "hg19" "hg19"
## chr1_jh636054_fix chr1_jh806573_fix chr1_jh806574_fix
## "hg19" "hg19" "hg19"
## chr1_jh806575_fix chr2_gl877870_fix chr2_gl877871_fix
## "hg19" "hg19" "hg19"
## chr2_kb663603_fix chr3_gl383523_fix chr3_gl383524_fix
## "hg19" "hg19" "hg19"
## chr3_gl383525_fix chr3_jh159131_fix chr3_jh159132_fix
## "hg19" "hg19" "hg19"
## chr3_ke332495_fix chr4_gl582967_fix chr4_gl877872_fix
## "hg19" "hg19" "hg19"
## chr4_ke332496_fix chr5_jh159133_fix chr5_ke332497_fix
## "hg19" "hg19" "hg19"
## chr6_jh636056_fix chr6_jh636057_fix chr6_jh806576_fix
## "hg19" "hg19" "hg19"
## chr6_kb663604_fix chr6_ke332498_fix chr7_gl582968_fix
## "hg19" "hg19" "hg19"
## chr7_gl582969_fix chr7_gl582970_fix chr7_gl582971_fix
## "hg19" "hg19" "hg19"
## chr7_gl582972_fix chr7_jh159134_fix chr7_jh636058_fix
## "hg19" "hg19" "hg19"
## chr7_ke332499_fix chr8_gl383535_fix chr8_gl383536_fix
## "hg19" "hg19" "hg19"
## chr8_gl949743_fix chr8_jh159135_fix chr8_ke332500_fix
## "hg19" "hg19" "hg19"
## chr9_gl339450_fix chr9_gl383537_fix chr9_gl383538_fix
## "hg19" "hg19" "hg19"
## chr9_jh636059_fix chr9_jh806577_fix chr9_jh806578_fix
## "hg19" "hg19" "hg19"
## chr9_jh806579_fix chr9_kb663605_fix chr10_gl383543_fix
## "hg19" "hg19" "hg19"
## chr10_gl383544_fix chr10_gl877873_fix chr10_jh591181_fix
## "hg19" "hg19" "hg19"
## chr10_jh591182_fix chr10_jh591183_fix chr10_jh636060_fix
## "hg19" "hg19" "hg19"
## chr10_jh806580_fix chr10_kb663606_fix chr10_ke332501_fix
## "hg19" "hg19" "hg19"
## chr11_gl582973_fix chr11_gl949744_fix chr11_jh159138_fix
## "hg19" "hg19" "hg19"
## chr11_jh159139_fix chr11_jh159140_fix chr11_jh159141_fix
## "hg19" "hg19" "hg19"
## chr11_jh159142_fix chr11_jh159143_fix chr11_jh591184_fix
## "hg19" "hg19" "hg19"
## chr11_jh591185_fix chr11_jh720443_fix chr11_jh806581_fix
## "hg19" "hg19" "hg19"
## chr12_gl383548_fix chr12_gl582974_fix chr12_jh720444_fix
## "hg19" "hg19" "hg19"
## chr12_kb663607_fix chr13_gl582975_fix chr14_kb021645_fix
## "hg19" "hg19" "hg19"
## chr15_jh720445_fix chr16_jh720446_fix chr17_gl383558_fix
## "hg19" "hg19" "hg19"
## chr17_gl383559_fix chr17_gl383560_fix chr17_gl383561_fix
## "hg19" "hg19" "hg19"
## chr17_gl383562_fix chr17_gl582976_fix chr17_jh159144_fix
## "hg19" "hg19" "hg19"
## chr17_jh159145_fix chr17_jh591186_fix chr17_jh636061_fix
## "hg19" "hg19" "hg19"
## chr17_jh720447_fix chr17_jh806582_fix chr17_kb021646_fix
## "hg19" "hg19" "hg19"
## chr17_ke332502_fix chr19_gl582977_fix chr19_jh159149_fix
## "hg19" "hg19" "hg19"
## chr19_kb021647_fix chr19_ke332505_fix chr20_gl582979_fix
## "hg19" "hg19" "hg19"
## chr20_jh720448_fix chr20_kb663608_fix chr21_ke332506_fix
## "hg19" "hg19" "hg19"
## chr22_jh720449_fix chr22_jh806583_fix chr22_jh806584_fix
## "hg19" "hg19" "hg19"
## chr22_jh806585_fix chr22_jh806586_fix chrX_gl877877_fix
## "hg19" "hg19" "hg19"
## chrX_jh159150_fix chrX_jh720451_fix chrX_jh720452_fix
## "hg19" "hg19" "hg19"
## chrX_jh720453_fix chrX_jh720454_fix chrX_jh720455_fix
## "hg19" "hg19" "hg19"
## chrX_jh806587_fix chrX_jh806588_fix chrX_jh806589_fix
## "hg19" "hg19" "hg19"
## chrX_jh806590_fix chrX_jh806591_fix chrX_jh806592_fix
## "hg19" "hg19" "hg19"
## chrX_jh806593_fix chrX_jh806594_fix chrX_jh806595_fix
## "hg19" "hg19" "hg19"
## chrX_jh806596_fix chrX_jh806597_fix chrX_jh806598_fix
## "hg19" "hg19" "hg19"
## chrX_jh806599_fix chrX_jh806600_fix chrX_jh806601_fix
## "hg19" "hg19" "hg19"
## chrX_jh806602_fix chrX_jh806603_fix chrX_kb021648_fix
## "hg19" "hg19" "hg19"
## chr1_gl383518_alt chr1_gl383519_alt chr1_gl383520_alt
## "hg19" "hg19" "hg19"
## chr2_gl383521_alt chr2_gl383522_alt chr2_gl582966_alt
## "hg19" "hg19" "hg19"
## chr3_gl383526_alt chr3_jh636055_alt chr4_gl383527_alt
## "hg19" "hg19" "hg19"
## chr4_gl383528_alt chr4_gl383529_alt chr5_gl339449_alt
## "hg19" "hg19" "hg19"
## chr5_gl383530_alt chr5_gl383531_alt chr5_gl383532_alt
## "hg19" "hg19" "hg19"
## chr5_gl949742_alt chr6_gl383533_alt chr6_kb021644_alt
## "hg19" "hg19" "hg19"
## chr7_gl383534_alt chr9_gl383539_alt chr9_gl383540_alt
## "hg19" "hg19" "hg19"
## chr9_gl383541_alt chr9_gl383542_alt chr10_gl383545_alt
## "hg19" "hg19" "hg19"
## chr10_gl383546_alt chr11_gl383547_alt chr11_jh159136_alt
## "hg19" "hg19" "hg19"
## chr11_jh159137_alt chr12_gl383549_alt chr12_gl383550_alt
## "hg19" "hg19" "hg19"
## chr12_gl383551_alt chr12_gl383552_alt chr12_gl383553_alt
## "hg19" "hg19" "hg19"
## chr12_gl877875_alt chr12_gl877876_alt chr12_gl949745_alt
## "hg19" "hg19" "hg19"
## chr15_gl383554_alt chr15_gl383555_alt chr16_gl383556_alt
## "hg19" "hg19" "hg19"
## chr16_gl383557_alt chr17_gl383563_alt chr17_gl383564_alt
## "hg19" "hg19" "hg19"
## chr17_gl383565_alt chr17_gl383566_alt chr17_jh159146_alt
## "hg19" "hg19" "hg19"
## chr17_jh159147_alt chr17_jh159148_alt chr18_gl383567_alt
## "hg19" "hg19" "hg19"
## chr18_gl383568_alt chr18_gl383569_alt chr18_gl383570_alt
## "hg19" "hg19" "hg19"
## chr18_gl383571_alt chr18_gl383572_alt chr19_gl383573_alt
## "hg19" "hg19" "hg19"
## chr19_gl383574_alt chr19_gl383575_alt chr19_gl383576_alt
## "hg19" "hg19" "hg19"
## chr19_gl949746_alt chr19_gl949747_alt chr19_gl949748_alt
## "hg19" "hg19" "hg19"
## chr19_gl949749_alt chr19_gl949750_alt chr19_gl949751_alt
## "hg19" "hg19" "hg19"
## chr19_gl949752_alt chr19_gl949753_alt chr20_gl383577_alt
## "hg19" "hg19" "hg19"
## chr21_gl383578_alt chr21_gl383579_alt chr21_gl383580_alt
## "hg19" "hg19" "hg19"
## chr21_gl383581_alt chr22_gl383582_alt chr22_gl383583_alt
## "hg19" "hg19" "hg19"
## chr22_kb663609_alt
## "hg19"
# find length of each chromosome
seqlengths(atac_small)
## chr1 chr2 chr3
## 249250621 243199373 198022430
## chr4 chr5 chr6
## 191154276 180915260 171115067
## chr7 chr8 chr9
## 159138663 146364022 141213431
## chr10 chr11 chr12
## 135534747 135006516 133851895
## chr13 chr14 chr15
## 115169878 107349540 102531392
## chr16 chr17 chr18
## 90354753 81195210 78077248
## chr19 chr20 chr21
## 59128983 63025520 48129895
## chr22 chrX chrY
## 51304566 155270560 59373566
## chrM chrMT chr4_ctg9_hap1
## 16571 16569 590426
## chr6_apd_hap1 chr6_cox_hap2 chr6_dbb_hap3
## 4622290 4795371 4610396
## chr6_mann_hap4 chr6_mcf_hap5 chr6_qbl_hap6
## 4683263 4833398 4611984
## chr6_ssto_hap7 chr17_ctg5_hap1 chr1_gl000191_random
## 4928567 1680828 106433
## chr1_gl000192_random chr4_gl000193_random chr4_gl000194_random
## 547496 189789 191469
## chr7_gl000195_random chr8_gl000196_random chr8_gl000197_random
## 182896 38914 37175
## chr9_gl000198_random chr9_gl000199_random chr9_gl000200_random
## 90085 169874 187035
## chr9_gl000201_random chr11_gl000202_random chr17_gl000203_random
## 36148 40103 37498
## chr17_gl000204_random chr17_gl000205_random chr17_gl000206_random
## 81310 174588 41001
## chr18_gl000207_random chr19_gl000208_random chr19_gl000209_random
## 4262 92689 159169
## chr21_gl000210_random chrUn_gl000211 chrUn_gl000212
## 27682 166566 186858
## chrUn_gl000213 chrUn_gl000214 chrUn_gl000215
## 164239 137718 172545
## chrUn_gl000216 chrUn_gl000217 chrUn_gl000218
## 172294 172149 161147
## chrUn_gl000219 chrUn_gl000220 chrUn_gl000221
## 179198 161802 155397
## chrUn_gl000222 chrUn_gl000223 chrUn_gl000224
## 186861 180455 179693
## chrUn_gl000225 chrUn_gl000226 chrUn_gl000227
## 211173 15008 128374
## chrUn_gl000228 chrUn_gl000229 chrUn_gl000230
## 129120 19913 43691
## chrUn_gl000231 chrUn_gl000232 chrUn_gl000233
## 27386 40652 45941
## chrUn_gl000234 chrUn_gl000235 chrUn_gl000236
## 40531 34474 41934
## chrUn_gl000237 chrUn_gl000238 chrUn_gl000239
## 45867 39939 33824
## chrUn_gl000240 chrUn_gl000241 chrUn_gl000242
## 41933 42152 43523
## chrUn_gl000243 chrUn_gl000244 chrUn_gl000245
## 43341 39929 36651
## chrUn_gl000246 chrUn_gl000247 chrUn_gl000248
## 38154 36422 39786
## chrUn_gl000249 chr1_gl383516_fix chr1_gl383517_fix
## 38502 49316 49352
## chr1_gl949741_fix chr1_jh636052_fix chr1_jh636053_fix
## 151551 7283150 1676126
## chr1_jh636054_fix chr1_jh806573_fix chr1_jh806574_fix
## 758378 24680 22982
## chr1_jh806575_fix chr2_gl877870_fix chr2_gl877871_fix
## 47409 66021 389939
## chr2_kb663603_fix chr3_gl383523_fix chr3_gl383524_fix
## 599580 171362 78793
## chr3_gl383525_fix chr3_jh159131_fix chr3_jh159132_fix
## 65063 393769 100694
## chr3_ke332495_fix chr4_gl582967_fix chr4_gl877872_fix
## 263861 248177 297485
## chr4_ke332496_fix chr5_jh159133_fix chr5_ke332497_fix
## 503215 266316 543325
## chr6_jh636056_fix chr6_jh636057_fix chr6_jh806576_fix
## 262912 200195 273386
## chr6_kb663604_fix chr6_ke332498_fix chr7_gl582968_fix
## 478993 149443 356330
## chr7_gl582969_fix chr7_gl582970_fix chr7_gl582971_fix
## 251823 354970 1284284
## chr7_gl582972_fix chr7_jh159134_fix chr7_jh636058_fix
## 327774 3821770 716227
## chr7_ke332499_fix chr8_gl383535_fix chr8_gl383536_fix
## 274521 429806 203777
## chr8_gl949743_fix chr8_jh159135_fix chr8_ke332500_fix
## 608579 102251 228602
## chr9_gl339450_fix chr9_gl383537_fix chr9_gl383538_fix
## 330164 62435 49281
## chr9_jh636059_fix chr9_jh806577_fix chr9_jh806578_fix
## 295379 22394 169437
## chr9_jh806579_fix chr9_kb663605_fix chr10_gl383543_fix
## 211307 155926 392792
## chr10_gl383544_fix chr10_gl877873_fix chr10_jh591181_fix
## 128378 168465 2281126
## chr10_jh591182_fix chr10_jh591183_fix chr10_jh636060_fix
## 196262 177920 437946
## chr10_jh806580_fix chr10_kb663606_fix chr10_ke332501_fix
## 93149 305900 1020827
## chr11_gl582973_fix chr11_gl949744_fix chr11_jh159138_fix
## 321004 276448 108875
## chr11_jh159139_fix chr11_jh159140_fix chr11_jh159141_fix
## 120441 546435 240775
## chr11_jh159142_fix chr11_jh159143_fix chr11_jh591184_fix
## 326647 191402 462282
## chr11_jh591185_fix chr11_jh720443_fix chr11_jh806581_fix
## 167437 408430 872115
## chr12_gl383548_fix chr12_gl582974_fix chr12_jh720444_fix
## 165247 163298 273128
## chr12_kb663607_fix chr13_gl582975_fix chr14_kb021645_fix
## 334922 34662 1523386
## chr15_jh720445_fix chr16_jh720446_fix chr17_gl383558_fix
## 170033 97345 457041
## chr17_gl383559_fix chr17_gl383560_fix chr17_gl383561_fix
## 338640 534288 644425
## chr17_gl383562_fix chr17_gl582976_fix chr17_jh159144_fix
## 45551 412535 388340
## chr17_jh159145_fix chr17_jh591186_fix chr17_jh636061_fix
## 194862 376223 186059
## chr17_jh720447_fix chr17_jh806582_fix chr17_kb021646_fix
## 454385 342635 211416
## chr17_ke332502_fix chr19_gl582977_fix chr19_jh159149_fix
## 341712 580393 245473
## chr19_kb021647_fix chr19_ke332505_fix chr20_gl582979_fix
## 1058686 579598 179899
## chr20_jh720448_fix chr20_kb663608_fix chr21_ke332506_fix
## 70483 283551 307252
## chr22_jh720449_fix chr22_jh806583_fix chr22_jh806584_fix
## 212298 167183 70876
## chr22_jh806585_fix chr22_jh806586_fix chrX_gl877877_fix
## 73505 43543 284527
## chrX_jh159150_fix chrX_jh720451_fix chrX_jh720452_fix
## 3110903 898979 522319
## chrX_jh720453_fix chrX_jh720454_fix chrX_jh720455_fix
## 1461188 752267 65034
## chrX_jh806587_fix chrX_jh806588_fix chrX_jh806589_fix
## 4110759 862483 270630
## chrX_jh806590_fix chrX_jh806591_fix chrX_jh806592_fix
## 2418393 882083 835911
## chrX_jh806593_fix chrX_jh806594_fix chrX_jh806595_fix
## 389631 390496 444074
## chrX_jh806596_fix chrX_jh806597_fix chrX_jh806598_fix
## 413927 1045622 899320
## chrX_jh806599_fix chrX_jh806600_fix chrX_jh806601_fix
## 1214327 6530008 1389764
## chrX_jh806602_fix chrX_jh806603_fix chrX_kb021648_fix
## 713266 182949 469972
## chr1_gl383518_alt chr1_gl383519_alt chr1_gl383520_alt
## 182439 110268 366579
## chr2_gl383521_alt chr2_gl383522_alt chr2_gl582966_alt
## 143390 123821 96131
## chr3_gl383526_alt chr3_jh636055_alt chr4_gl383527_alt
## 180671 173151 164536
## chr4_gl383528_alt chr4_gl383529_alt chr5_gl339449_alt
## 376187 121345 1612928
## chr5_gl383530_alt chr5_gl383531_alt chr5_gl383532_alt
## 101241 173459 82728
## chr5_gl949742_alt chr6_gl383533_alt chr6_kb021644_alt
## 226852 124736 187824
## chr7_gl383534_alt chr9_gl383539_alt chr9_gl383540_alt
## 119183 162988 71551
## chr9_gl383541_alt chr9_gl383542_alt chr10_gl383545_alt
## 171286 60032 179254
## chr10_gl383546_alt chr11_gl383547_alt chr11_jh159136_alt
## 309802 154407 200998
## chr11_jh159137_alt chr12_gl383549_alt chr12_gl383550_alt
## 191409 120804 169178
## chr12_gl383551_alt chr12_gl383552_alt chr12_gl383553_alt
## 184319 138655 152874
## chr12_gl877875_alt chr12_gl877876_alt chr12_gl949745_alt
## 167313 408271 372609
## chr15_gl383554_alt chr15_gl383555_alt chr16_gl383556_alt
## 296527 388773 192462
## chr16_gl383557_alt chr17_gl383563_alt chr17_gl383564_alt
## 89672 270261 133151
## chr17_gl383565_alt chr17_gl383566_alt chr17_jh159146_alt
## 223995 90219 278131
## chr17_jh159147_alt chr17_jh159148_alt chr18_gl383567_alt
## 70345 88070 289831
## chr18_gl383568_alt chr18_gl383569_alt chr18_gl383570_alt
## 104552 167950 164789
## chr18_gl383571_alt chr18_gl383572_alt chr19_gl383573_alt
## 198278 159547 385657
## chr19_gl383574_alt chr19_gl383575_alt chr19_gl383576_alt
## 155864 170222 188024
## chr19_gl949746_alt chr19_gl949747_alt chr19_gl949748_alt
## 987716 729519 1064303
## chr19_gl949749_alt chr19_gl949750_alt chr19_gl949751_alt
## 1091840 1066389 1002682
## chr19_gl949752_alt chr19_gl949753_alt chr20_gl383577_alt
## 987100 796478 128385
## chr21_gl383578_alt chr21_gl383579_alt chr21_gl383580_alt
## 63917 201198 74652
## chr21_gl383581_alt chr22_gl383582_alt chr22_gl383583_alt
## 116690 162811 96924
## chr22_kb663609_alt
## 74013
# find name of each chromosome
seqnames(atac_small)
## [1] "chr1" "chr2" "chr3"
## [4] "chr4" "chr5" "chr6"
## [7] "chr7" "chr8" "chr9"
## [10] "chr10" "chr11" "chr12"
## [13] "chr13" "chr14" "chr15"
## [16] "chr16" "chr17" "chr18"
## [19] "chr19" "chr20" "chr21"
## [22] "chr22" "chrX" "chrY"
## [25] "chrM" "chrMT" "chr4_ctg9_hap1"
## [28] "chr6_apd_hap1" "chr6_cox_hap2" "chr6_dbb_hap3"
## [31] "chr6_mann_hap4" "chr6_mcf_hap5" "chr6_qbl_hap6"
## [34] "chr6_ssto_hap7" "chr17_ctg5_hap1" "chr1_gl000191_random"
## [37] "chr1_gl000192_random" "chr4_gl000193_random" "chr4_gl000194_random"
## [40] "chr7_gl000195_random" "chr8_gl000196_random" "chr8_gl000197_random"
## [43] "chr9_gl000198_random" "chr9_gl000199_random" "chr9_gl000200_random"
## [46] "chr9_gl000201_random" "chr11_gl000202_random" "chr17_gl000203_random"
## [49] "chr17_gl000204_random" "chr17_gl000205_random" "chr17_gl000206_random"
## [52] "chr18_gl000207_random" "chr19_gl000208_random" "chr19_gl000209_random"
## [55] "chr21_gl000210_random" "chrUn_gl000211" "chrUn_gl000212"
## [58] "chrUn_gl000213" "chrUn_gl000214" "chrUn_gl000215"
## [61] "chrUn_gl000216" "chrUn_gl000217" "chrUn_gl000218"
## [64] "chrUn_gl000219" "chrUn_gl000220" "chrUn_gl000221"
## [67] "chrUn_gl000222" "chrUn_gl000223" "chrUn_gl000224"
## [70] "chrUn_gl000225" "chrUn_gl000226" "chrUn_gl000227"
## [73] "chrUn_gl000228" "chrUn_gl000229" "chrUn_gl000230"
## [76] "chrUn_gl000231" "chrUn_gl000232" "chrUn_gl000233"
## [79] "chrUn_gl000234" "chrUn_gl000235" "chrUn_gl000236"
## [82] "chrUn_gl000237" "chrUn_gl000238" "chrUn_gl000239"
## [85] "chrUn_gl000240" "chrUn_gl000241" "chrUn_gl000242"
## [88] "chrUn_gl000243" "chrUn_gl000244" "chrUn_gl000245"
## [91] "chrUn_gl000246" "chrUn_gl000247" "chrUn_gl000248"
## [94] "chrUn_gl000249" "chr1_gl383516_fix" "chr1_gl383517_fix"
## [97] "chr1_gl949741_fix" "chr1_jh636052_fix" "chr1_jh636053_fix"
## [100] "chr1_jh636054_fix" "chr1_jh806573_fix" "chr1_jh806574_fix"
## [103] "chr1_jh806575_fix" "chr2_gl877870_fix" "chr2_gl877871_fix"
## [106] "chr2_kb663603_fix" "chr3_gl383523_fix" "chr3_gl383524_fix"
## [109] "chr3_gl383525_fix" "chr3_jh159131_fix" "chr3_jh159132_fix"
## [112] "chr3_ke332495_fix" "chr4_gl582967_fix" "chr4_gl877872_fix"
## [115] "chr4_ke332496_fix" "chr5_jh159133_fix" "chr5_ke332497_fix"
## [118] "chr6_jh636056_fix" "chr6_jh636057_fix" "chr6_jh806576_fix"
## [121] "chr6_kb663604_fix" "chr6_ke332498_fix" "chr7_gl582968_fix"
## [124] "chr7_gl582969_fix" "chr7_gl582970_fix" "chr7_gl582971_fix"
## [127] "chr7_gl582972_fix" "chr7_jh159134_fix" "chr7_jh636058_fix"
## [130] "chr7_ke332499_fix" "chr8_gl383535_fix" "chr8_gl383536_fix"
## [133] "chr8_gl949743_fix" "chr8_jh159135_fix" "chr8_ke332500_fix"
## [136] "chr9_gl339450_fix" "chr9_gl383537_fix" "chr9_gl383538_fix"
## [139] "chr9_jh636059_fix" "chr9_jh806577_fix" "chr9_jh806578_fix"
## [142] "chr9_jh806579_fix" "chr9_kb663605_fix" "chr10_gl383543_fix"
## [145] "chr10_gl383544_fix" "chr10_gl877873_fix" "chr10_jh591181_fix"
## [148] "chr10_jh591182_fix" "chr10_jh591183_fix" "chr10_jh636060_fix"
## [151] "chr10_jh806580_fix" "chr10_kb663606_fix" "chr10_ke332501_fix"
## [154] "chr11_gl582973_fix" "chr11_gl949744_fix" "chr11_jh159138_fix"
## [157] "chr11_jh159139_fix" "chr11_jh159140_fix" "chr11_jh159141_fix"
## [160] "chr11_jh159142_fix" "chr11_jh159143_fix" "chr11_jh591184_fix"
## [163] "chr11_jh591185_fix" "chr11_jh720443_fix" "chr11_jh806581_fix"
## [166] "chr12_gl383548_fix" "chr12_gl582974_fix" "chr12_jh720444_fix"
## [169] "chr12_kb663607_fix" "chr13_gl582975_fix" "chr14_kb021645_fix"
## [172] "chr15_jh720445_fix" "chr16_jh720446_fix" "chr17_gl383558_fix"
## [175] "chr17_gl383559_fix" "chr17_gl383560_fix" "chr17_gl383561_fix"
## [178] "chr17_gl383562_fix" "chr17_gl582976_fix" "chr17_jh159144_fix"
## [181] "chr17_jh159145_fix" "chr17_jh591186_fix" "chr17_jh636061_fix"
## [184] "chr17_jh720447_fix" "chr17_jh806582_fix" "chr17_kb021646_fix"
## [187] "chr17_ke332502_fix" "chr19_gl582977_fix" "chr19_jh159149_fix"
## [190] "chr19_kb021647_fix" "chr19_ke332505_fix" "chr20_gl582979_fix"
## [193] "chr20_jh720448_fix" "chr20_kb663608_fix" "chr21_ke332506_fix"
## [196] "chr22_jh720449_fix" "chr22_jh806583_fix" "chr22_jh806584_fix"
## [199] "chr22_jh806585_fix" "chr22_jh806586_fix" "chrX_gl877877_fix"
## [202] "chrX_jh159150_fix" "chrX_jh720451_fix" "chrX_jh720452_fix"
## [205] "chrX_jh720453_fix" "chrX_jh720454_fix" "chrX_jh720455_fix"
## [208] "chrX_jh806587_fix" "chrX_jh806588_fix" "chrX_jh806589_fix"
## [211] "chrX_jh806590_fix" "chrX_jh806591_fix" "chrX_jh806592_fix"
## [214] "chrX_jh806593_fix" "chrX_jh806594_fix" "chrX_jh806595_fix"
## [217] "chrX_jh806596_fix" "chrX_jh806597_fix" "chrX_jh806598_fix"
## [220] "chrX_jh806599_fix" "chrX_jh806600_fix" "chrX_jh806601_fix"
## [223] "chrX_jh806602_fix" "chrX_jh806603_fix" "chrX_kb021648_fix"
## [226] "chr1_gl383518_alt" "chr1_gl383519_alt" "chr1_gl383520_alt"
## [229] "chr2_gl383521_alt" "chr2_gl383522_alt" "chr2_gl582966_alt"
## [232] "chr3_gl383526_alt" "chr3_jh636055_alt" "chr4_gl383527_alt"
## [235] "chr4_gl383528_alt" "chr4_gl383529_alt" "chr5_gl339449_alt"
## [238] "chr5_gl383530_alt" "chr5_gl383531_alt" "chr5_gl383532_alt"
## [241] "chr5_gl949742_alt" "chr6_gl383533_alt" "chr6_kb021644_alt"
## [244] "chr7_gl383534_alt" "chr9_gl383539_alt" "chr9_gl383540_alt"
## [247] "chr9_gl383541_alt" "chr9_gl383542_alt" "chr10_gl383545_alt"
## [250] "chr10_gl383546_alt" "chr11_gl383547_alt" "chr11_jh159136_alt"
## [253] "chr11_jh159137_alt" "chr12_gl383549_alt" "chr12_gl383550_alt"
## [256] "chr12_gl383551_alt" "chr12_gl383552_alt" "chr12_gl383553_alt"
## [259] "chr12_gl877875_alt" "chr12_gl877876_alt" "chr12_gl949745_alt"
## [262] "chr15_gl383554_alt" "chr15_gl383555_alt" "chr16_gl383556_alt"
## [265] "chr16_gl383557_alt" "chr17_gl383563_alt" "chr17_gl383564_alt"
## [268] "chr17_gl383565_alt" "chr17_gl383566_alt" "chr17_jh159146_alt"
## [271] "chr17_jh159147_alt" "chr17_jh159148_alt" "chr18_gl383567_alt"
## [274] "chr18_gl383568_alt" "chr18_gl383569_alt" "chr18_gl383570_alt"
## [277] "chr18_gl383571_alt" "chr18_gl383572_alt" "chr19_gl383573_alt"
## [280] "chr19_gl383574_alt" "chr19_gl383575_alt" "chr19_gl383576_alt"
## [283] "chr19_gl949746_alt" "chr19_gl949747_alt" "chr19_gl949748_alt"
## [286] "chr19_gl949749_alt" "chr19_gl949750_alt" "chr19_gl949751_alt"
## [289] "chr19_gl949752_alt" "chr19_gl949753_alt" "chr20_gl383577_alt"
## [292] "chr21_gl383578_alt" "chr21_gl383579_alt" "chr21_gl383580_alt"
## [295] "chr21_gl383581_alt" "chr22_gl383582_alt" "chr22_gl383583_alt"
## [298] "chr22_kb663609_alt"
# assign a new genome
genome(atac_small) <- "hg19"
Again, several other methods are available that are not listed here. See the documentation for seqinfo-methods
in Signac for a full list.
For a full list of methods for the ChromatinAssay
class run:
methods(class = 'ChromatinAssay')
## [1] [[<- AddMotifs AggregateTiles Annotation
## [5] Annotation<- CallPeaks coerce colMeans
## [9] colSums ConvertMotifID countOverlaps coverage
## [13] disjoin disjointBins distance distanceToNearest
## [17] findOverlaps follow Footprint Fragments
## [21] Fragments<- gaps genome genome<-
## [25] GetAssayData GetMotifData granges InsertionBias
## [29] isCircular isCircular<- isDisjoint Links
## [33] Links<- merge Motifs Motifs<-
## [37] nearest precede range reduce
## [41] RegionStats RenameCells rowMeans rowSums
## [45] RunChromVAR seqinfo seqinfo<- seqlengths
## [49] seqlengths<- seqlevels seqlevels<- seqnames
## [53] seqnames<- SetAssayData SetMotifData show
## [57] subset
## see '?methods' for accessing help and source code
ChromatinAssay
We can use the standard subset()
function or the [
operator to subset Seurat object containing ChromatinAssay
s. This works the same way as for standard Assay
objects.
# subset using the subset() function
# this is meant for interactive use
subset.obj <- subset(atac_small, subset = nCount_peaks > 100)
# subset using the [ extract operator
# this can be used programmatically
subset.obj <- atac_small[, atac_small$nCount_peaks > 100]
Assay
and ChromatinAssay
To convert from a ChromatinAssay
to a standard Assay
use the as()
function
# convert a ChromatinAssay to an Assay
assay <- as(object = atac_small[["peaks"]], Class = "Assay")
assay
## Assay data with 323 features for 100 cells
## Top 10 variable features:
## chr1-2157847-2188813, chr1-2471903-2481288, chr1-6843960-6846894,
## chr1-3815928-3820356, chr1-8935313-8940649, chr1-2515241-2519350,
## chr1-6051145-6055407, chr1-1708510-1715065, chr1-6659264-6664388,
## chr1-2227715-2234197
To convert from a standard Assay
to a ChromatinAssay
we use the as.ChromatinAssay()
function. This takes a standard assay object, as well as information to fill the additional slots in the ChromatinAssay
class.
# convert an Assay to a ChromatinAssay
chromatinassay <- as.ChromatinAssay(assay, seqinfo = "hg19")
chromatinassay
## ChromatinAssay data with 323 features for 100 cells
## Variable features: 323
## Genome: hg19
## Annotation present: FALSE
## Motifs present: FALSE
## Fragment files: 0
Fragment
ClassThe Fragment
class is designed for storing and interacting with a fragment file commonly used for single-cell chromatin data. It contains the path to an indexed fragment file on disk, a MD5 hash for the fragment file and the fragment file index, and a vector of cell names contained in the fragment file. Importantly, this is a named vector where the elements of the vector are the cell names as they appear in the fragment file, and the name of each element is the cell name as it appears in the ChromatinAssay
object storing the Fragment
object. This allows a mapping of cell names on disk to cell names in R, and avoids the need to alter fragment files on disk. This path can also be a remote file accessible by http
or ftp
.
Fragment
classA Fragment
object can be constructed using the CreateFragmentObject()
function.
frag.path <- system.file("extdata", "fragments.tsv.gz", package="Signac")
fragments <- CreateFragmentObject(
path = frag.path,
cells = colnames(atac_small),
validate.fragments = TRUE
)
## Computing hash
The validate.fragments
parameter controls whether the file is inspected to check whether the expected cell names are present. This can help avoid assigning the wrong fragment file to the object. If you’re sure that the file is correct, you can set this value to FALSE
to skip this step and save some time. This check is typically only run once when the Fragment
object is created, and is not normally run on existing Fragment
files.
Fragment
object to the ChromatinAssay
A ChromatinAssay
object can contain a list of Fragment
objects. This avoids the need to merge fragment files on disk and simplifies processes of merging or integrating different Seurat objects containing ChromatinAssay
s. To add a new Fragment
object to a ChromatinAssay
, or a Seurat object containing a ChromatinAssay
, we can use the Fragments<-
assignment function. This will do a few things:
Fragment
object was created.Fragment
object being added are already contained in another Fragment
object stored in the ChromatinAssay
. All fragments from a cell must be present in only one fragment file.Fragment
object to the list of Fragment
objects stored in the ChromatinAssay
.
Fragments(atac_small) <- fragments
The show()
method for Fragment
-class objects prints the number of cells that the Fragment
object contains data for.
fragments
## A Fragment object for 100 cells
Alternatively, we can initialize the ChromatinAssay
with a Fragment
object in a couple of ways. We can either pass a vector of Fragment
objects to the fragments
parameter in CreateChromatinAssay()
, or pass the path to a single fragment file. If we pass the path to a fragment file we assume that the file contains fragments for all cells in the ChromatinAssay
and that the cell names are the same in the fragment file on disk and in the ChromatinAssay
. For example:
chrom_assay <- CreateChromatinAssay(
counts = counts,
genome = "hg19",
fragments = frag.path
)
## Computing hash
object <- CreateSeuratObject(
counts = chrom_assay,
assay = "peaks"
)
This will create a Seurat object containing a ChromatinAssay
, with a single Fragment
object.
Fragment
object from the ChromatinAssay
All the Fragment
objects associated with a ChromatinAssay
can be removed by assigning NULL
using the Fragment<-
assignment function. For example:
## list()
To remove a subset of Fragment
object from the list of Fragment
objects stored in the ChromatinAssay
, you will need to extract the list of Fragment
objects using the Fragments()
function, subset the list of objects, then assign the subsetted list to the assay using the Seurat::SetAssayData()
function. For example:
chrom_assay <- SetAssayData(chrom_assay, slot = "fragments", new.data = fragments)
Fragments(chrom_assay)
## [[1]]
## A Fragment object for 100 cells
Fragment
objectThe path to the fragment file can be updated using the UpdatePath()
function. This can be useful if you move the fragment file to a new directory, or if you copy a stored Seurat object containing a ChromatinAssay
to a different server.
fragments <- UpdatePath(fragments, new.path = "/home/stuartt/github/chrom/vignette_data/fragments.tsv.gz")
Fragment files hosted on remote servers accessible via http or ftp can also be added to the ChromatinAssay
in the same way as for locally-hosted fragment files. This can enable the exploration of large single-cell datasets without the need for downloading large files. For example, we can create a Fragment object using a file hosted on the 10x Genomics website:
fragments <- CreateFragmentObject(
path = "http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz"
)
## Computing hash
fragments
## A Fragment object for 0 cells
When files are hosted remotely, the checks described in the section above (MD5 hash and expected cells) are not performed.
Fragment
dataTo access the cell names stored in a Fragment
object, we can use the Cells()
function. Importantly, this returns the cell names as they appear in the ChromatinAssay
, rather than the as they appear in the fragment file itself.
fragments <- CreateFragmentObject(
path = frag.path,
cells = colnames(atac_small),
validate.fragments = TRUE
)
## Computing hash
## [1] "AAACGAAAGAGCGAAA-1" "AAACGAAAGAGTTTGA-1" "AAACGAAAGCGAGCTA-1"
## [4] "AAACGAAAGGCTTCGC-1" "AAACGAAAGTGCTGAG-1" "AAACGAACAAGGGTAC-1"
Similarly, we can set the cell name information in a Fragment
object using the Cells<-
assignment function. This will set the named vector of cells stored in the Fragment
object. Here we must supply a named vector.
To extract any of the data stored in a Fragment
object we can also use the GetFragmentData()
function. For example, we can find the path to the fragment file on disk:
GetFragmentData(object = fragments, slot = "path")
## [1] "/tmp/Rtmpoz554S/temp_libpath334a48a7f4fe/Signac/extdata/fragments.tsv.gz"
For a full list of methods for the Fragment
class run:
methods(class = 'Fragment')
## [1] CallPeaks Cells Cells<- RenameCells show
## see '?methods' for accessing help and source code
Motif
ClassThe Motif
class stores information needed for DNA sequence motif analysis, and has the following slots:
data
: a sparse feature by motif matrix, where entries are 1 if the feature contains the motif, and 0 otherwisepwm
: A named list of position weight or position frequency matricesmotif.names
: a list of motif IDs and their common namespositions
: A GRangesList
object containing the exact positions of each motifmeta.data
: Additional information about the motifsMany of these slots are optional and do not need to be filled, but are only required when running certain functions. For example, the positions
slot will be needed if running TF footprinting.
Motif
classA Motif
object can be constructed using the CreateMotifObject()
function. Much of the data needed for constructing a Motif
object can be generated using functions from the TFBSTools and motifmatchr packages. Position frequency matrices for motifs can be loaded using the JASPAR packages on Bioconductor or the chromVARmotifs package. For example:
library(JASPAR2018)
library(TFBSTools)
library(motifmatchr)
# Get a list of motif position frequency matrices from the JASPAR database
pfm <- getMatrixSet(
x = JASPAR2018,
opts = list(species = 9606) # 9606 is the species code for human
)
# Scan the DNA sequence of each peak for the presence of each motif
motif.matrix <- CreateMotifMatrix(
features = granges(atac_small),
pwm = pfm,
genome = 'hg19'
)
# Create a new Mofif object to store the results
motif <- CreateMotifObject(
data = motif.matrix,
pwm = pfm
)
The show()
method for the Motif
class prints the total number of motifs and regions included in the object:
motif
## A Motif object containing 452 motifs in 323 regions
Motif
object to the ChromatinAssay
We can add a Motif
object to the ChromatinAssay
, or a Seurat object containing a ChromatinAssay
using the Motifs<-
assignment operator.
Motifs(atac_small) <- motif
Motif
dataData stored in a Motif
object can be accessed using the GetMotifData()
and SetMotifData()
functions.
# extract data from the Motif object
pfm <- GetMotifData(object = motif, slot = "pwm")
# set data in the Motif object
motif <- SetMotifData(object = motif, slot = "pwm", new.data = pfm)
We can access the set of motifs and set of features used in the Motif
object using the colnames()
and rownames()
functions:
## [1] "MA0025.1" "MA0030.1" "MA0031.1" "MA0051.1" "MA0056.1" "MA0057.1"
## [1] "chr1-713460-714823" "chr1-752422-753038" "chr1-762106-763359"
## [4] "chr1-779589-780271" "chr1-804872-805761" "chr1-839520-841123"
To quickly convert between motif IDs (like MA0497.1
) and motif common names (like MEF2C), we can use the ConvertMotifID()
function. For example:
# convert ID to common name
ids <- c("MA0025.1","MA0030.1","MA0031.1","MA0051.1","MA0056.1","MA0057.1")
names <- ConvertMotifID(object = motif, id = ids)
names
## [1] "NFIL3" "FOXF2" "FOXD1" "IRF2" "MZF1"
## [6] "MZF1(var.2)"
# convert names to IDs
ConvertMotifID(object = motif, name = names)
## [1] "MA0025.1" "MA0030.1" "MA0031.1" "MA0051.1" "MA0056.1" "MA0057.1"
For a full list of methods for the Motif
class run:
methods(class = 'Motif')
## [1] [ ConvertMotifID dim dimnames GetMotifData
## [6] SetMotifData show subset
## see '?methods' for accessing help and source code
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome_1.58.0
## [3] rtracklayer_1.50.0 Biostrings_2.58.0
## [5] XVector_0.30.0 motifmatchr_1.12.0
## [7] TFBSTools_1.28.0 JASPAR2018_1.1.1
## [9] EnsDb.Hsapiens.v75_2.99.0 ensembldb_2.14.0
## [11] AnnotationFilter_1.14.0 GenomicFeatures_1.42.3
## [13] AnnotationDbi_1.52.0 Biobase_2.50.0
## [15] GenomicRanges_1.42.0 GenomeInfoDb_1.26.5
## [17] IRanges_2.24.1 S4Vectors_0.28.1
## [19] BiocGenerics_0.36.0 Signac_1.2.0
## [21] SeuratObject_4.0.0 Seurat_4.0.1.9005
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 SnowballC_0.7.0
## [3] scattermore_0.7 R.methodsS3_1.8.1
## [5] ragg_1.1.2 tidyr_1.1.3
## [7] ggplot2_3.3.3 bit64_4.0.5
## [9] knitr_1.33 R.utils_2.10.1
## [11] irlba_2.3.3 DelayedArray_0.16.3
## [13] data.table_1.14.0 rpart_4.1-15
## [15] KEGGREST_1.30.1 RCurl_1.98-1.3
## [17] generics_0.1.0 cowplot_1.1.1
## [19] RSQLite_2.2.7 RANN_2.6.1
## [21] future_1.21.0 bit_4.0.4
## [23] spatstat.data_2.1-0 xml2_1.3.2
## [25] httpuv_1.6.0 SummarizedExperiment_1.20.0
## [27] assertthat_0.2.1 DirichletMultinomial_1.32.0
## [29] xfun_0.22 hms_1.0.0
## [31] jquerylib_0.1.4 evaluate_0.14
## [33] promises_1.2.0.1 fansi_0.4.2
## [35] progress_1.2.2 caTools_1.18.2
## [37] dbplyr_2.1.1 igraph_1.2.6
## [39] DBI_1.1.1 htmlwidgets_1.5.3
## [41] sparsesvd_0.2 spatstat.geom_2.1-0
## [43] purrr_0.3.4 ellipsis_0.3.1
## [45] dplyr_1.0.5 backports_1.2.1
## [47] annotate_1.68.0 biomaRt_2.46.3
## [49] deldir_0.2-10 MatrixGenerics_1.2.1
## [51] vctrs_0.3.7 ROCR_1.0-11
## [53] abind_1.4-5 cachem_1.0.4
## [55] ggforce_0.3.3 checkmate_2.0.0
## [57] sctransform_0.3.2 GenomicAlignments_1.26.0
## [59] prettyunits_1.1.1 goftest_1.2-2
## [61] cluster_2.1.2 lazyeval_0.2.2
## [63] seqLogo_1.56.0 crayon_1.4.1
## [65] pkgconfig_2.0.3 slam_0.1-48
## [67] tweenr_1.0.2 nlme_3.1-152
## [69] ProtGenerics_1.22.0 nnet_7.3-15
## [71] rlang_0.4.10 globals_0.14.0
## [73] lifecycle_1.0.0 miniUI_0.1.1.1
## [75] BiocFileCache_1.14.0 dichromat_2.0-0
## [77] rprojroot_2.0.2 polyclip_1.10-0
## [79] matrixStats_0.58.0 lmtest_0.9-38
## [81] Matrix_1.3-2 ggseqlogo_0.1
## [83] zoo_1.8-9 base64enc_0.1-3
## [85] ggridges_0.5.3 png_0.1-7
## [87] viridisLite_0.4.0 bitops_1.0-7
## [89] R.oo_1.24.0 KernSmooth_2.23-18
## [91] blob_1.2.1 stringr_1.4.0
## [93] parallelly_1.24.0 readr_1.4.0
## [95] jpeg_0.1-8.1 CNEr_1.26.0
## [97] scales_1.1.1 memoise_2.0.0
## [99] magrittr_2.0.1 plyr_1.8.6
## [101] ica_1.0-2 zlibbioc_1.36.0
## [103] compiler_4.0.1 RColorBrewer_1.1-2
## [105] fitdistrplus_1.1-3 Rsamtools_2.6.0
## [107] listenv_0.8.0 patchwork_1.1.1
## [109] pbapply_1.4-3 htmlTable_2.1.0
## [111] Formula_1.2-4 MASS_7.3-53.1
## [113] mgcv_1.8-33 tidyselect_1.1.0
## [115] stringi_1.5.3 textshaping_0.3.3
## [117] yaml_2.2.1 askpass_1.1
## [119] latticeExtra_0.6-29 ggrepel_0.9.1
## [121] grid_4.0.1 sass_0.3.1
## [123] VariantAnnotation_1.36.0 fastmatch_1.1-0
## [125] tools_4.0.1 future.apply_1.7.0
## [127] rstudioapi_0.13 TFMPvalue_0.0.8
## [129] foreign_0.8-81 lsa_0.73.2
## [131] gridExtra_2.3 farver_2.1.0
## [133] Rtsne_0.15 digest_0.6.27
## [135] shiny_1.6.0 pracma_2.3.3
## [137] qlcMatrix_0.9.7 Rcpp_1.0.6
## [139] later_1.2.0 RcppAnnoy_0.0.18
## [141] httr_1.4.2 biovizBase_1.38.0
## [143] colorspace_2.0-0 XML_3.99-0.6
## [145] fs_1.5.0 tensor_1.5
## [147] reticulate_1.19 splines_4.0.1
## [149] uwot_0.1.10 RcppRoll_0.3.0
## [151] spatstat.utils_2.1-0 pkgdown_1.6.1
## [153] plotly_4.9.3 systemfonts_1.0.1
## [155] xtable_1.8-4 jsonlite_1.7.2
## [157] poweRlaw_0.70.6 R6_2.5.0
## [159] Hmisc_4.5-0 pillar_1.6.0
## [161] htmltools_0.5.1.1 mime_0.10
## [163] glue_1.4.2 fastmap_1.1.0
## [165] BiocParallel_1.24.1 codetools_0.2-18
## [167] utf8_1.2.1 lattice_0.20-41
## [169] bslib_0.2.4 spatstat.sparse_2.0-0
## [171] tibble_3.1.1 curl_4.3
## [173] leiden_0.3.7 gtools_3.8.2
## [175] GO.db_3.12.1 openssl_1.4.3
## [177] survival_3.2-11 rmarkdown_2.7
## [179] docopt_0.7.1 desc_1.3.0
## [181] munsell_0.5.0 GenomeInfoDbData_1.2.4
## [183] reshape2_1.4.4 gtable_0.3.0
## [185] spatstat.core_2.1-2