Find peaks that are correlated with the expression of nearby genes. For each gene, this function computes the correlation coefficient between the gene expression and accessibility of each peak within a given distance from the gene TSS, and computes an expected correlation coefficient for each peak given the GC content, accessibility, and length of the peak. The expected coefficient values for the peak are then used to compute a z-score and p-value.

LinkPeaks(
  object,
  peak.assay,
  expression.assay,
  peak.slot = "counts",
  expression.slot = "data",
  method = "pearson",
  gene.coords = NULL,
  distance = 5e+05,
  min.distance = NULL,
  min.cells = 10,
  genes.use = NULL,
  n_sample = 200,
  pvalue_cutoff = 0.05,
  score_cutoff = 0.05,
  gene.id = FALSE,
  verbose = TRUE
)

Arguments

object

A Seurat object

peak.assay

Name of assay containing peak information

expression.assay

Name of assay containing gene expression information

peak.slot

Name of slot to pull chromatin data from

expression.slot

Name of slot to pull expression data from

method

Correlation method to use. One of "pearson" or "spearman"

gene.coords

GRanges object containing coordinates of genes in the expression assay. If NULL, extract from gene annotations stored in the assay.

distance

Distance threshold for peaks to include in regression model

min.distance

Minimum distance between peak and TSS to include in regression model. If NULL (default), no minimum distance is used.

min.cells

Minimum number of cells positive for the peak and gene needed to include in the results.

genes.use

Genes to test. If NULL, determine from expression assay.

n_sample

Number of peaks to sample at random when computing the null distribution.

pvalue_cutoff

Minimum p-value required to retain a link. Links with a p-value equal or greater than this value will be removed from the output.

score_cutoff

Minimum absolute value correlation coefficient for a link to be retained

gene.id

Set to TRUE if genes in the expression assay are named using gene IDs rather than gene names.

verbose

Display messages

Value

Returns a Seurat object with the Links information set. This is a granges object accessible via the Links

function, with the following information:

  • score: the correlation coefficient between the accessibility of the peak and expression of the gene

  • zscore: the z-score of the correlation coefficient, computed based on the distribution of correlation coefficients from a set of background peaks

  • pvalue: the p-value associated with the z-score for the link

  • gene: name of the linked gene

  • peak: name of the linked peak

Details

This function was inspired by the method originally described by SHARE-seq (Sai Ma et al. 2020, Cell). Please consider citing the original SHARE-seq work if using this function: doi:10.1016/j.cell.2020.09.056