Return a vector if genomic regions that match the distribution of a set of
query regions for any given set of characteristics, specified in the input
meta.feature
dataframe.
MatchRegionStats(
meta.feature,
query.feature,
features.match = c("GC.percent"),
n = 10000,
verbose = TRUE,
...
)
A dataframe containing DNA sequence information for features to choose from
A dataframe containing DNA sequence information for features to match.
Which features of the query to match when selecting a set of regions. A vector of column names present in the feature metadata can be supplied to match multiple characteristics at once. Default is GC content.
Number of regions to select, with characteristics matching the query
Display messages
Arguments passed to other functions
Returns a character vector
For each requested feature to match, a density
distribution is estimated using the density
function,
and a set of weights for each feature in the dataset estimated based on the
density distribution. If multiple features are to be matched (for example,
GC content and overall accessibility), a joint density distribution is then
computed by multiplying the individual feature weights. A set of features
with characteristics matching the query regions is then selected using the
sample
function, with the probability of randomly
selecting each feature equal to the joint density distribution weight.
metafeatures <- SeuratObject::GetAssayData(
object = atac_small[['peaks']], slot = 'meta.features'
)
query.feature <- metafeatures[1:10, ]
features.choose <- metafeatures[11:nrow(metafeatures), ]
MatchRegionStats(
meta.feature = features.choose,
query.feature = query.feature,
features.match = "percentile",
n = 10
)
#> Matching percentile distribution
#> [1] "chr1-6779996-6780494" "chr1-2135022-2137621" "chr1-1690269-1690749"
#> [4] "chr1-6651141-6652061" "chr1-4066997-4068144" "chr1-1617619-1618183"
#> [7] "chr1-2058137-2059185" "chr1-6514584-6515741" "chr1-8377792-8378947"
#> [10] "chr1-2585232-2586236"