Return a vector if genomic regions that match the distribution of a set of
query regions for any given set of characteristics, specified in the input
meta.feature dataframe.
Usage
MatchRegionStats(
meta.feature,
query.feature,
features.match = c("GC.percent"),
n = 10000,
verbose = TRUE,
...
)Arguments
- meta.feature
A dataframe containing DNA sequence information for features to choose from
- query.feature
A dataframe containing DNA sequence information for features to match.
- features.match
Which features of the query to match when selecting a set of regions. A vector of column names present in the feature metadata can be supplied to match multiple characteristics at once. Default is GC content.
- n
Number of regions to select, with characteristics matching the query
- verbose
Display messages
- ...
Arguments passed to other functions
Details
For each requested feature to match, a density distribution is estimated
using the stats::density() function,
and a set of weights for each feature in the dataset estimated based on the
density distribution. If multiple features are to be matched (for example,
GC content and overall accessibility), features are first transformed such
that they are uncorrelated with each other using a Cholesky decomposition and
a joint density distribution is then computed by multiplying the individual
feature weights. A set of features with characteristics matching the query
regions is then selected using the base::sample() function, with
the probability of randomly selecting each feature equal to the joint density
distribution weight. If the wrswoR package is
available, the wrswoR::sample_int_crank() function is used for
faster sampling.
Examples
metafeatures <- atac_small[["peaks"]][[]]
query.feature <- metafeatures[1:10, ]
features.choose <- metafeatures[11:nrow(metafeatures), ]
MatchRegionStats(
meta.feature = features.choose,
query.feature = query.feature,
features.match = "GC.percent",
n = 10
)
#> Matching region characteristics using nearest-neighbor distance
#> [1] "chr1:1012999-1013896" "chr1:1261037-1261825" "chr1:1103886-1104761"
#> [4] "chr1:1098941-1099797" "chr1:1264764-1265656" "chr1:890356-891196"
#> [7] "chr1:1188896-1189774" "chr1:1222590-1223380" "chr1:955190-956101"
#> [10] "chr1:1259851-1260705"