prob.hits {GRIN2} | R Documentation |
Find Probability of Locus Hit
Description
Computes the probability that each genomic locus (e.g., gene or regulatory region) is affected by one or more types of genomic lesions. This function estimates statistical significance for lesion enrichment using a convolution of independent but non-identical Bernoulli distributions.
Usage
prob.hits(hit.cnt, chr.size = NULL)
Arguments
hit.cnt |
A list returned by the |
chr.size |
A |
Details
This function estimates a p-value for each locus based on the probability of observing the observed number of lesions (or more) by chance, under a model where lesion events are treated as independent Bernoulli trials.
For each lesion type, the model considers heterogeneity in lesion probability across loci based on their genomic context (e.g., locus size, chromosome size). These probabilities are then combined using a convolution of Bernoulli distributions to estimate the likelihood of observing the actual hit counts.
In addition, the function calculates:
-
FDR-adjusted q-values using the method of Pounds and Cheng (2006), which estimates the proportion of true null hypotheses.
-
p- and q-values for multi-lesion constellation hits, i.e., the probability that a locus is affected by one (
p1
), two (p2
), or more types of lesions simultaneously.
Value
A list with the following components:
gene.hits |
A |
lsn.data |
Original input lesion data. |
gene.data |
Original input gene annotation data. |
gene.lsn.data |
A |
chr.size |
Chromosome size information used in the computation. |
gene.index |
A |
lsn.index |
A |
Author(s)
Abdelrahman Elsayed abdelrahman.elsayed@stjude.org and Stanley Pounds stanley.pounds@stjude.org
References
Pounds, S. et al. (2013). A genomic random interval model for statistical analysis of genomic lesion data.
Cao, X., Elsayed, A. H., & Pounds, S. B. (2023). Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics.
See Also
prep.gene.lsn.data
, find.gene.lsn.overlaps
, count.hits
Examples
data(lesion_data)
data(hg38_gene_annotation)
data(hg38_chrom_size)
# 1) Prepare gene and lesion data:
prep.gene.lsn <- prep.gene.lsn.data(lesion_data, hg38_gene_annotation)
# 2) Identify overlapping gene-lesion events:
gene.lsn.overlap <- find.gene.lsn.overlaps(prep.gene.lsn)
# 3) Count number of subjects and lesions affecting each gene:
count.subj.hits <- count.hits(gene.lsn.overlap)
# 4) Compute p- and q-values for lesion enrichment per gene:
hits.prob <- prob.hits(count.subj.hits, hg38_chrom_size)