What is Signac?

Signac is an extension of Seurat for the analysis of single-cell chromatin data (DNA-based single-cell assays). We have extended the Seurat object to include information about the genome sequence and genomic coordinates of sequenced fragments per cell, and include functions needed for the analysis of single-cell chromatin data.

How do I interact with the object?

Signac uses the Seurat object structure, and so all the Seurat commands can be used when analysing data with Signac. See the Data Structures and Object Interaction vignette for an explanation of the classes defined in Signac and how to use them. See the Seurat documentation for more information about the Seurat object: https://satijalab.org/seurat/

How do I merge objects with Signac?

See the merge and integration vignettes for information on combining multiple single-cell chromatin datasets.

How can I create a fragment file for my dataset?

The fragment file is provided in the output of popular single-cell data data processing tools such as chromap, cellranger-atac, and cellranger-arc.

If you are using another method that does not provide a fragment file as output, you can use the sinto package to generate a fragment file from the BAM file. See here for more information on using Sinto to generate a fragment file: https://timoast.github.io/sinto/basic_usage.html#create-scatac-seq-fragments-file

How should I decide on the number of dimensions to use?

Choosing the dimensionality is a general problem in single-cell analysis for which there is no simple solution. There has been discussion about this for scRNA-seq, and you can read our recommendations for scRNA-seq in the Seurat vignettes: https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html (see “Determine the ‘dimensionality’ of the dataset”).

Here are some general tips/suggestions that might help guide you in the choice for number of dimensions:

  • the number of dimensions needed will generally scale with the size and complexity of the dataset
  • you can try varying the number of dimensions used and observing how the resulting clusters or UMAP changes
  • it is usually better to choose values that are higher rather than too low
  • having a good understanding of the biology will help a lot in knowing whether the clusters make sense, or if the dimensionality might be too high/low

An annotation or genome sequence for my organism is not available on Bioconductor, what do I do?

If you are studying an organism that does not have a BSgenome genome package or EnsDB annotation package available on BioConductor, you can still use your own GTF file or FASTA files with Signac.

Gene annotation

To use a GTF file, you can import it using rtracklayer, for example:

gtf <- rtracklayer::import('genes.gtf')

Alternatively, gene annotations for your species may be available on AnnotationHub.

Genome sequence

You can use a FASTA file in place of a BSgenome package for functions that require access to the genome sequence. To do so, index the FASTA file using samtools faidx, and then create a FaFile object using the Rsamtools package:

fa <- Rsamtools::FaFile("path/to/fasta")

Alternatively, you can create your own BSgenome data package, see this vignette.

How should I cite Signac?

If you use Signac, please cite Stuart et al., 2021:

@ARTICLE{signac,
  title     = "Single-cell chromatin state analysis with Signac",
  author    = "Stuart, Tim and Srivastava, Avi and Madad, Shaista and Lareau,
               Caleb A and Satija, Rahul",
  journal   = "Nat. Methods",
  publisher = "Nature Publishing Group",
  pages     = "1--9",
  month     =  nov,
  year      =  2021,
  url       = "https://www.nature.com/articles/s41592-021-01282-5",
  language  = "en"
}

Signac is an extension of Seurat, and uses the Seurat object structure, so you should consider citing the Seurat paper if you have used Signac.