Run term frequency inverse document frequency (TF-IDF) normalization on a matrix.

RunTFIDF(object, ...)

# S3 method for default
RunTFIDF(
  object,
  assay = NULL,
  method = 1,
  scale.factor = 10000,
  verbose = TRUE,
  ...
)

# S3 method for Assay
RunTFIDF(
  object,
  assay = NULL,
  method = 1,
  scale.factor = 10000,
  verbose = TRUE,
  ...
)

# S3 method for Seurat
RunTFIDF(
  object,
  assay = NULL,
  method = 1,
  scale.factor = 10000,
  verbose = TRUE,
  ...
)

Arguments

object

A Seurat object

...

Arguments passed to other methods

assay

Name of assay to use

method

Which TF-IDF implementation to use. Choice of:

  • 1: The TF-IDF implementation used by Stuart & Butler et al. 2019 (doi: 10.1101/460147 ). This computes \(\log(TF \times IDF)\).

  • 2: The TF-IDF implementation used by Cusanovich & Hill et al. 2018 (doi: 10.1016/j.cell.2018.06.052 ). This computes \(TF \times (\log(IDF))\).

  • 3: The log-TF method used by Andrew Hill. This computes \(\log(TF) \times \log(IDF)\).

  • 4: The 10x Genomics method (no TF normalization). This computes \(IDF\).

scale.factor

Which scale factor to use. Default is 10000.

verbose

Print progress

Value

Returns a Seurat object

Details

Four different TF-IDF methods are implemented. We recommend using method 1 (the default).

References

https://en.wikipedia.org/wiki/Latent_semantic_analysis#Latent_semantic_indexing

Examples

mat <- matrix(data = rbinom(n = 25, size = 5, prob = 0.2), nrow = 5) RunTFIDF(object = mat)
#> Performing TF-IDF normalization
#> 5 x 5 sparse Matrix of class "dgCMatrix" #> #> [1,] 7.354682 . 7.488134 8.740497 . #> [2,] 7.131699 7.419181 7.265130 . 8.517393 #> [3,] 7.759934 7.642204 7.893412 . . #> [4,] 7.131699 8.112028 . 7.824446 7.824446 #> [5,] 7.488134 7.082948 7.621595 7.488134 7.488134
RunTFIDF(atac_small[['peaks']])
#> Performing TF-IDF normalization
#> ChromatinAssay data with 323 features for 100 cells #> Variable features: 323 #> Genome: hg19 #> Annotation present: TRUE #> Motifs present: TRUE #> Fragment files: 0
RunTFIDF(object = atac_small)
#> Performing TF-IDF normalization
#> An object of class Seurat #> 1323 features across 100 samples within 3 assays #> Active assay: peaks (323 features, 323 variable features) #> 2 other assays present: bins, RNA #> 2 dimensional reductions calculated: lsi, umap