sim_log_lognormal {depower} | R Documentation |
Simulate data from a normal distribution
Description
Simulate data from the log transformed lognormal distribution (i.e. a normal distribution). This function handles all three cases:
One-sample data
Dependent two-sample data
Independent two-sample data
Usage
sim_log_lognormal(
n1,
n2 = NULL,
ratio,
cv1,
cv2 = NULL,
cor = 0,
nsims = 1L,
return_type = "list",
ncores = 1L,
messages = TRUE
)
Arguments
n1 |
(integer: |
n2 |
(integer: |
ratio |
(numeric:
See 'Details' for additional information. |
cv1 |
(numeric: |
cv2 |
(numeric: |
cor |
(numeric: |
nsims |
(Scalar integer: |
return_type |
(string: |
ncores |
(Scalar integer: |
messages |
(Scalar logical: |
Details
Based on assumed characteristics of the original lognormal distribution, data is simulated from the corresponding log-transformed (normal) distribution. This simulated data is suitable for assessing power of a hypothesis for the geometric mean or ratio of geometric means from the original lognormal data.
This method can also be useful for other population distributions which are positive and where it makes sense to describe the ratio of geometric means. However, the lognormal distribution is theoretically correct in the sense that you can log transform to a normal distribution, compute the summary statistic, then apply the inverse transformation to summarize on the original lognormal scale.
Let GM(\cdot)
be the geometric mean and AM(\cdot)
be the
arithmetic mean. For independent lognormal samples X_1
and X_2
\text{Fold Change} = \frac{GM(X_2)}{GM(X_1)}
For dependent lognormal samples X_1
and X_2
\text{Fold Change} = GM\left( \frac{X_2}{X_1} \right)
Unlike ratios and the arithmetic mean, for equal sample sizes of
X_1
and X_2
it follows that
\frac{GM(X_2)}{GM(X_1)} = GM \left( \frac{X_2}{X_1} \right) =
e^{AM(\ln X_2) - AM(\ln X_1)} = e^{AM(\ln X_2 - \ln X_1)}
.
The coefficient of variation (CV) for X
is defined as
CV = \frac{SD(X)}{AM(X)}
The relationship between sample statistics for the original lognormal data
(X
) and the natural logged data (\ln{X}
) are
\begin{aligned}
AM(X) &= e^{AM(\ln{X}) + \frac{Var(\ln{X})}{2}} \\
GM(X) &= e^{AM(\ln{X})} \\
Var(X) &= AM(X)^2 \left( e^{Var(\ln{X})} - 1 \right) \\
CV(X) &= \frac{\sqrt{AM(X)^2 \left( e^{Var(\ln{X})} - 1 \right)}}{AM(X)} \\
&= \sqrt{e^{Var(\ln{X})} - 1}
\end{aligned}
and
\begin{aligned}
AM(\ln{X}) &= \ln \left( \frac{AM(X)}{\sqrt{CV(X)^2 + 1}} \right) \\
Var(\ln{X}) &= \ln(CV(X)^2 + 1) \\
Cor(\ln{X_1}, \ln{X_2}) &= \frac{\ln \left( Cor(X_1, X_2)CV(X_1)CV(X_2) + 1 \right)}{SD(\ln{X_1})SD(\ln{X_2})}
\end{aligned}
Based on the properties of correlation and variance,
\begin{aligned}
Var(X_2 - X_1) &= Var(X_1) + Var(X_2) - 2Cov(X_1, X_2) \\
&= Var(X_1) + Var(X_2) - 2Cor(X_1, X_2)SD(X_1)SD(X_2) \\
SD(X_2 - X_1) &= \sqrt{Var(X_2 - X_1)}
\end{aligned}
The standard deviation of the differences gets smaller the more positive the correlation and conversely gets larger the more negative the correlation. For the special case where the two samples are uncorrelated and each has the same variance, it follows that
\begin{aligned}
Var(X_2 - X_1) &= \sigma^2 + \sigma^2 \\
SD(X_2 - X_1) &= \sqrt{2}\sigma
\end{aligned}
Value
If nsims = 1
and the number of unique parameter combinations is
one, the following objects are returned:
If one-sample data with
return_type = "list"
, a list:Slot Name Description 1 One sample of simulated normal values. If one-sample data with
return_type = "data.frame"
, a data frame:Column Name Description 1 item
Pair/subject/item indicator. 2 value
Simulated normal values. If two-sample data with
return_type = "list"
, a list:Slot Name Description 1 Simulated normal values from sample 1. 2 Simulated normal values from sample 2. If two-sample data with
return_type = "data.frame"
, a data frame:Column Name Description 1 item
Pair/subject/item indicator. 2 condition
Time/group/condition indicator. 3 value
Simulated normal values.
If nsims > 1
or the number of unique parameter combinations is greater than
one, each object described above is returned in data frame, located in a
list-column named data
.
If one-sample data, a data frame:
Column Name Description 1 n1
The sample size. 2 ratio
Geometric mean [GM(sample 1)]. 3 cv1
Coefficient of variation for sample 1. 4 nsims
Number of data simulations. 5 distribution
Distribution sampled from. 6 data
List-column of simulated data. If two-sample data, a data frame:
Column Name Description 1 n1
Sample size of sample 1. 2 n2
Sample size of sample 2. 3 ratio
Ratio of geometric means [GM(sample 2) / GM(sample 1)] or geometric mean ratio [GM(sample 2 / sample 1)]. 4 cv1
Coefficient of variation for sample 1. 5 cv2
Coefficient of variation for sample 2. 6 cor
Correlation between samples. 7 nsims
Number of data simulations. 8 distribution
Distribution sampled from. 9 data
List-column of simulated data.
References
Julious SA (2004). “Sample sizes for clinical trials with Normal data.” Statistics in Medicine, 23(12), 1921–1986. doi:10.1002/sim.1783.
Hauschke D, Steinijans VW, Diletti E, Burke M (1992). “Sample size determination for bioequivalence assessment using a multiplicative model.” Journal of Pharmacokinetics and Biopharmaceutics, 20(5), 557–561. ISSN 0090-466X, doi:10.1007/BF01061471.
Johnson NL, Kotz S, Balakrishnan N (1994). Continuous univariate distributions, Wiley series in probability and mathematical statistics, 2nd ed edition. Wiley, New York. ISBN 9780471584957 9780471584940.
See Also
stats::rnorm()
, mvnfast::rmvn()
Examples
#----------------------------------------------------------------------------
# sim_log_lognormal() examples
#----------------------------------------------------------------------------
library(depower)
# Independent two-sample data returned in a data frame
sim_log_lognormal(
n1 = 10,
n2 = 10,
ratio = 1.3,
cv1 = 0.35,
cv2 = 0.35,
cor = 0,
nsims = 1,
return_type = "data.frame"
)
# Independent two-sample data returned in a list
sim_log_lognormal(
n1 = 10,
n2 = 10,
ratio = 1.3,
cv1 = 0.35,
cv2 = 0.35,
cor = 0,
nsims = 1,
return_type = "list"
)
# Dependent two-sample data returned in a data frame
sim_log_lognormal(
n1 = 10,
n2 = 10,
ratio = 1.3,
cv1 = 0.35,
cv2 = 0.35,
cor = 0.4,
nsims = 1,
return_type = "data.frame"
)
# Dependent two-sample data returned in a list
sim_log_lognormal(
n1 = 10,
n2 = 10,
ratio = 1.3,
cv1 = 0.35,
cv2 = 0.35,
cor = 0.4,
nsims = 1,
return_type = "list"
)
# One-sample data returned in a data frame
sim_log_lognormal(
n1 = 10,
ratio = 1.3,
cv1 = 0.35,
nsims = 1,
return_type = "data.frame"
)
# One-sample data returned in a list
sim_log_lognormal(
n1 = 10,
ratio = 1.3,
cv1 = 0.35,
nsims = 1,
return_type = "list"
)
# Independent two-sample data: two simulations for four parameter combinations.
# Returned as a list-column of lists within a data frame
sim_log_lognormal(
n1 = c(10, 20),
n2 = c(10, 20),
ratio = 1.3,
cv1 = 0.35,
cv2 = 0.35,
cor = 0,
nsims = 2,
return_type = "list"
)
# Dependent two-sample data: two simulations for two parameter combinations.
# Returned as a list-column of lists within a data frame
sim_log_lognormal(
n1 = c(10, 20),
n2 = c(10, 20),
ratio = 1.3,
cv1 = 0.35,
cv2 = 0.35,
cor = 0.4,
nsims = 2,
return_type = "list"
)
# One-sample data: two simulations for two parameter combinations
# Returned as a list-column of lists within a data frame
sim_log_lognormal(
n1 = c(10, 20),
ratio = 1.3,
cv1 = 0.35,
nsims = 2,
return_type = "list"
)