sim_bnb {depower} | R Documentation |
Simulate data from a BNB distribution
Description
Simulate data from the bivariate negative binomial (BNB) distribution. The
BNB distribution is used to simulate count data where the event counts are
jointly dependent (correlated). For independent data, see
sim_nb()
.
Usage
sim_bnb(
n,
mean1,
mean2,
ratio,
dispersion,
nsims = 1L,
return_type = "list",
max_zeros = 0.99,
ncores = 1L
)
Arguments
n |
(integer: |
mean1 |
(numeric: |
mean2 , ratio |
(numeric:
|
dispersion |
(numeric: |
nsims |
(Scalar integer: |
return_type |
(string: |
max_zeros |
(Scalar numeric: |
ncores |
(Scalar integer: |
Details
The negative binomial distribution may be defined using a gamma-Poisson
mixture distribution. In this case, the Poisson parameter \lambda
is a random variable with gamma distribution. Equivalence between different
parameterizations are demonstrated below:
# Define constants and their relationships n <- 10000 dispersion <- 8 mu <- 4 p <- dispersion / (dispersion + mu) q <- mu / (mu + dispersion) variance <- mu + (mu^2 / dispersion) rate <- p / (1 - p) scale <- (1 - p) / p # alternative formula for mu mu_alt <- (dispersion * (1 - p)) / p stopifnot(isTRUE(all.equal(mu, mu_alt))) set.seed(20240321) # Using built-in rnbinom with dispersion and mean w <- rnbinom(n = n, size = dispersion, mu = mu) # Using gamma-Poisson mixture with gamma rate parameter x <- rpois( n = n, lambda = rgamma(n = n, shape = dispersion, rate = rate) ) # Using gamma-Poisson mixture with gamma scale parameter y <- rpois( n = n, lambda = rgamma(n = n, shape = dispersion, scale = scale) ) # Using gamma-Poisson mixture with multiplicative mean and # gamma scale parameter z <- rpois( n = n, lambda = mu * rgamma(n = n, shape = dispersion, scale = 1/dispersion) ) # Compare CDFs par(mar=c(4,4,1,1)) plot( x = sort(w), y = (1:n)/n, xlim = range(c(w,x,y,z)), ylim = c(0,1), col = 'green', lwd = 4, type = 'l', main = 'CDF' ) lines(x = sort(x), y = (1:n)/n, col = 'red', lwd = 2) lines(x = sort(y), y = (1:n)/n, col = 'yellow', lwd = 1.5) lines(x = sort(z), y = (1:n)/n, col = 'black')
The BNB distribution is implemented by compounding two conditionally
independent Poisson random variables X_1 \mid G = g \sim \text{Poisson}(\mu g)
and X_2 \mid G = g \sim \text{Poisson}(r \mu g)
with a gamma random
variable G \sim \text{Gamma}(\theta, \theta^{-1})
. The probability mass
function for the joint distribution of X_1, X_2
is
P(X_1 = x_1, X_2 = x_2) = \frac{\Gamma(x_1 + x_2 + \theta)}{(\mu + r \mu + \theta)^{x_1 + x_2 + \theta}}
\frac{\mu^{x_1}}{x_1!} \frac{(r \mu)^{x_2}}{x_2!}
\frac{\theta^{\theta}}{\Gamma(\theta)}
where x_1,x_2 \in \mathbb{N}^{\geq 0}
are specific values of the count
outcomes, \theta \in \mathbb{R}^{> 0}
is the dispersion
parameter
which controls the dispersion and level of correlation between the two
samples (otherwise known as the shape parameter of the gamma distribution),
\mu \in \mathbb{R}^{> 0}
is the mean parameter, and
r = \frac{\mu_2}{\mu_1} \in \mathbb{R}^{> 0}
is the ratio
parameter
representing the multiplicative change in the mean of the second sample
relative to the first sample. G
denotes the random subject effect and
the gamma distribution scale parameter is assumed to be the inverse of the
dispersion parameter (\theta^{-1}
) for identifiability.
Correlation decreases from 1 to 0 as the dispersion parameter increases from 0 to infinity. For a given dispersion, increasing means also increases the correlation. See 'Examples' for a demonstration.
See 'Details' in sim_nb()
for additional information on the
negative binomial distribution.
Value
If nsims = 1
and the number of unique parameter combinations is
one, the following objects are returned:
If
return_type = "list"
, a list:Slot Name Description 1 Simulated counts from sample 1. 2 Simulated counts from sample 2. If
return_type = "data.frame"
, a data frame:Column Name Description 1 item
Subject/item indicator. 2 condition
Sample/condition indicator. 3 value
Simulated counts.
If nsims > 1
or the number of unique parameter combinations is greater than
one, each object described above is returned in a list-column named data
in
a depower simulation data frame:
Column | Name | Description |
1 | n1 | Sample size of sample 1. |
2 | n2 | Sample size of sample 2. |
3 | mean1 | Mean for sample 1. |
4 | mean2 | Mean for sample 2. |
5 | ratio | Ratio of means (sample 2 / sample 1). |
6 | dispersion | Gamma distribution shape parameter (dispersion). |
7 | nsims | Number of valid simulation iterations. |
8 | distribution | Distribution sampled from. |
9 | data | List-column of simulated data. |
References
Yu L, Fernandez S, Brock G (2020). “Power analysis for RNA-Seq differential expression studies using generalized linear mixed effects models.” BMC Bioinformatics, 21(1), 198. ISSN 1471-2105, doi:10.1186/s12859-020-3541-7.
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
See Also
sim_nb()
, stats::rpois()
, stats::rgamma()
,
stats::rnbinom()
Examples
#----------------------------------------------------------------------------
# sim_bnb() examples
#----------------------------------------------------------------------------
library(depower)
# Paired two-sample data returned in a data frame
sim_bnb(
n = 10,
mean1 = 10,
ratio = 1.6,
dispersion = 3,
nsims = 1,
return_type = "data.frame"
)
# Paired two-sample data returned in a list
sim_bnb(
n = 10,
mean1 = 10,
ratio = 1.6,
dispersion = 3,
nsims = 1,
return_type = "list"
)
# Two simulations of paired two-sample data
# returned as a list of data frames
sim_bnb(
n = 10,
mean1 = 10,
ratio = 1.6,
dispersion = 3,
nsims = 2,
return_type = "data.frame"
)
# Two simulations of Paired two-sample data
# returned as a list of lists
sim_bnb(
n = 10,
mean1 = 10,
ratio = 1.6,
dispersion = 3,
nsims = 2,
return_type = "list"
)
#----------------------------------------------------------------------------
# Visualization of the BNB distribution as dispersion varies.
#----------------------------------------------------------------------------
set.seed(1234)
data <- lapply(
X = c(1, 10, 100, 1000),
FUN = function(x) {
d <- sim_bnb(
n = 1000,
mean1 = 10,
ratio = 1.5,
dispersion = x,
nsims = 1,
return_type = "data.frame"
)
cor <- cor(
x = d[d$condition == "1", ]$value,
y = d[d$condition == "2", ]$value
)
cbind(dispersion = x, correlation = cor, d)
}
)
data <- do.call(what = "rbind", args = data)
ggplot2::ggplot(
data = data,
mapping = ggplot2::aes(x = value, fill = condition)
) +
ggplot2::facet_wrap(
facets = ggplot2::vars(.data$dispersion),
ncol = 2,
labeller = ggplot2::labeller(.rows = ggplot2::label_both)
) +
ggplot2::geom_density(alpha = 0.3) +
ggplot2::coord_cartesian(xlim = c(0, 60)) +
ggplot2::geom_text(
mapping = ggplot2::aes(
x = 30,
y = 0.12,
label = paste0("Correlation: ", round(correlation, 2))
),
check_overlap = TRUE
) +
ggplot2::labs(
x = "Value",
y = "Density",
fill = "Condition",
caption = "Mean1=10, Mean2=15, ratio=1.5"
)