pairbeta4 {PopulateR} | R Documentation |
Pair two people, using a four-parameter beta distribution, into households
Description
Creates a data frame of paired people, based on a distribution of age differences. The function uses a four-parameter beta distribution to create the pairs. Two data frames are required. One person from each data frame will be matched, based on the age difference distribution specified. If the data frames are different sizes, the "smalldf" data frame must be the smaller of the two. In this situation, a random subsample of the "largedf" data frame will be used. Both data frames must be restricted to only those people that will be paired.
Usage
pairbeta4(
smalldf,
smlid,
smlage,
largedf,
lrgid,
lrgage,
shapeA = NULL,
shapeB = NULL,
locationP = NULL,
scaleP = NULL,
HHStartNum,
HHNumVar,
userseed = NULL,
ptostop = NULL,
attempts = 10,
numiters = 1e+06,
verbose = FALSE
)
Arguments
smalldf |
The data frame containing one set of people to be paired. If the two data frames contain different numbers of people, this must be the data frame containing the smallest number. |
smlid |
The variable containing the unique ID for each person, in the smalldf data frame. |
smlage |
The age variable, in the smalldf data frame. |
largedf |
A data frame containing the second set of people to be paired. If the two data frames contain different numbers of people, this must be the data frame containing the largest number. |
lrgid |
The variable containing the unique ID for each person, in the largedf data frame. |
lrgage |
The age variable, in the largedf data frame. |
shapeA |
This is the first shape parameter of the four-parameter beta distribution If this value is negative, smalldf has the oldest ages. If this value is positive, smalldf has the youngest ages. |
shapeB |
This is the second shape parameter of the four-parameter beta distribution This value must be positive. |
locationP |
The location parameter of the four-parameter beta distribution. |
scaleP |
The scale parameter of the four-parameter beta distribution. |
HHStartNum |
The starting value for HHNumVar. Must be numeric. |
HHNumVar |
The column name for the household variable. |
userseed |
If specified, this will set the seed to the number provided. If not, the normal set.seed() function will be used. |
ptostop |
The critical p-value stopping rule for the function. If this value is not set, the critical p-value of .01 is used. |
attempts |
The maximum number of times largedf will be sampled to draw an age match from the correct distribution, for each observation in the smalldf. The default number of attempts is 10. |
numiters |
The maximum number of iterations used to construct the output data frame ($Matched) containing the pairs. The default value is 1000000, and is the stopping rule if the algorithm does not converge. |
verbose |
Whether the number of iterations used, the critical chi-squared value, and the final chi-squared value are printed to the console. The default value is FALSE. |
Value
A list of three data frames. $Matched contains the data frame of pairs. $Smaller contains the unmatched observations from smalldf. $Larger contains the unmatched observations from largedf.
Examples
library(dplyr)
# the children data frame is smaller
set.seed(1)
# sample a combination of females and males to be parents
Parents <- Township %>%
filter(Relationship == "Partnered", Age > 18) %>%
slice_sample(n = 500)
Children <- Township %>%
filter(Relationship == "NonPartnered", Age < 20) %>%
slice_sample(n = 200)
ChildAllMatched <- pairbeta4(Children, smlid = "ID", smlage = "Age", Parents, lrgid = "ID",
lrgage = "Age", shapeA = 2.2, shapeB = 3.7, locationP = 16.5,
scaleP = 40.1, HHStartNum = 1, HHNumVar = "Household",
userseed=4, ptostop = .01, attempts = 2, numiters = 8)
MatchedPairs <- ChildAllMatched$Matched
UnmatchedChildren <- ChildAllMatched$Smaller
UnmatchedAdults <- ChildAllMatched$Larger
# children data frame is larger, the locationP and scaleP values are negative
Parents2 <- Township %>%
filter(Relationship == "Partnered", Age > 18) %>%
slice_sample(n = 100)
Children2 <- Township %>%
filter(Relationship == "NonPartnered", Age < 20) %>%
slice_sample(n = 500)
ChildMatched <- pairbeta4(Parents2, smlid = "ID", smlage = "Age", Children2, lrgid = "ID",
lrgage = "Age", shapeA = 2.2, shapeB = 3.7, locationP = -16.5,
scaleP = -40.1, HHStartNum = 1, HHNumVar = "Household",
userseed=4, ptostop = .05, attempts = 2, numiters = 8)
MatchedPairs2 <- ChildMatched$Matched
UnmatchedChildren2 <- ChildMatched$Smaller
UnmatchedAdults2 <- ChildMatched$Larger