Compute the term-frequency inverse-document-frequency

Run term frequency inverse document frequency (TF-IDF) normalization on a matrix.

Usage

RunTFIDF(object, ...)

# Default S3 method
RunTFIDF(
  object,
  assay = NULL,
  method = 1,
  scale.factor = 10000,
  idf = NULL,
  verbose = TRUE,
  ...
)

# S3 method for class 'Assay5'
RunTFIDF(
  object,
  assay = NULL,
  method = 1,
  scale.factor = 10000,
  idf = NULL,
  layer = "counts",
  save = "data",
  verbose = TRUE,
  ...
)

# S3 method for class 'StdAssay'
RunTFIDF(
  object,
  assay = NULL,
  method = 1,
  scale.factor = 10000,
  idf = NULL,
  layer = "counts",
  save = "data",
  verbose = TRUE,
  ...
)

# S3 method for class 'Seurat'
RunTFIDF(
  object,
  assay = NULL,
  method = 1,
  scale.factor = 10000,
  idf = NULL,
  layer = "counts",
  save = "data",
  verbose = TRUE,
  ...
)

Arguments

object

A Seurat object

...

Arguments passed to other methods

assay

Name of assay to use.

method

Which TF-IDF implementation to use. Choice of:

1: The TF-IDF implementation used by Stuart & Butler et al. 2019 (doi:10.1101/460147 ). This computes \(\log(TF \times IDF)\).
2: The TF-IDF implementation used by Cusanovich & Hill et al. 2018 (doi:10.1016/j.cell.2018.06.052 ). This computes \(TF \times (\log(IDF))\).
3: The log-TF method used by Andrew Hill. This computes \(\log(TF) \times \log(IDF)\).
4: The 10x Genomics method (no TF normalization). This computes \(IDF\).

scale.factor

Which scale factor to use. Default is 10000.

idf

A precomputed IDF vector to use. If NULL, compute based on the input data matrix.

verbose

Print progress

layer

Name of layer to use.

save

Name of layer to save results in.

Value

Returns a SeuratObject::Seurat() object

Details

Four different TF-IDF methods are implemented. We recommend using method 1 (the default).

References

https://en.wikipedia.org/wiki/Latent_semantic_analysis#Latent_semantic_indexing

Examples

mat <- matrix(data = rbinom(n = 25, size = 5, prob = 0.2), nrow = 5)
RunTFIDF(object = mat)
#> Performing TF-IDF normalization
#> 5 x 5 Matrix of class "dgeMatrix"
#>          [,1]     [,2]     [,3]     [,4]     [,5]
#> [1,] 6.949537 7.759934 7.488134 7.354682 0.000000
#> [2,] 7.929766 0.000000 8.181001 0.000000 7.236979
#> [3,] 6.831874 7.236979 7.370421 7.236979 7.524481
#> [4,] 7.419181 7.131699 0.000000 8.229778 0.000000
#> [5,] 7.082948 7.488134 0.000000 6.795546 8.181001
RunTFIDF(atac_small[["peaks"]])
#> Processing layer: counts
#> Performing TF-IDF normalization
#> Warning: Some cells contain 0 total counts
#> Warning: Some features contain 0 total counts
#> GRangesAssay data with 100 features for 100 cells
#> Variable features: 0 
#> Annotation present: TRUE 
#> Fragment files: 0 
#> Motifs present: TRUE 
#> Links present: 0 
#> Region aggregation matrices: 0
RunTFIDF(atac_small[["peaks"]])
#> Processing layer: counts
#> Performing TF-IDF normalization
#> Warning: Some cells contain 0 total counts
#> Warning: Some features contain 0 total counts
#> GRangesAssay data with 100 features for 100 cells
#> Variable features: 0 
#> Annotation present: TRUE 
#> Fragment files: 0 
#> Motifs present: TRUE 
#> Links present: 0 
#> Region aggregation matrices: 0
RunTFIDF(object = atac_small)
#> Processing layer: counts
#> Performing TF-IDF normalization
#> Warning: Some cells contain 0 total counts
#> Warning: Some features contain 0 total counts
#> An object of class Seurat 
#> 150 features across 100 samples within 2 assays 
#> Active assay: peaks (100 features, 0 variable features)
#>  2 layers present: counts, data
#>  1 other assay present: RNA
#>  2 dimensional reductions calculated: lsi, umap