Estimates a confidence interval for a pairwise effect size (phi or Cramer's V) using a non-parametric percentile bootstrap. Rows are resampled with replacement from the pairwise-complete observations, and the chosen effect size estimator is recomputed on each resample. The resulting empirical distribution is used to derive the interval.
Usage
bootstrap_ci(
x,
y,
R = 1000L,
conf = 0.95,
type = c("percentile", "bca"),
corrected = FALSE,
correct = FALSE,
seed = NULL
)Arguments
- x
A factor, character, or logical vector.
- y
A factor, character, or logical vector of the same length as
x.- R
Integer. Number of bootstrap resamples. Default
1000L. Values below 500 are accepted with a warning.- conf
Numeric in (0, 1). Confidence level. Default
0.95.- type
Character. Bootstrap interval type:
"percentile"(default) or"bca"(bias-corrected and accelerated, Efron 1987). The BCa interval generally has better coverage but requires the jackknife influence values and is slower.- corrected
Logical. Whether to use the bias-corrected estimator (Bergsma, 2013). Default
FALSE.- correct
Logical. Yates' continuity correction. Default
FALSE.- seed
Integer or
NULL. Optional random seed for reproducibility. DefaultNULL(no seed set).
Value
A named list with:
estimateThe point estimate of phi or Cramer's V on the original data.
ci_lowerLower confidence bound.
ci_upperUpper confidence bound.
confThe requested confidence level.
typeThe interval type used.
RNumber of resamples actually used (may be lower than requested if degenerate resamples were removed).
metricCharacter:
"phi"or"cramers_v".correctedLogical.
nNumber of pairwise-complete observations.
boot_distNumeric vector of length
Rcontaining the full bootstrap distribution (useful for plotting).
Details
Percentile interval (Efron & Tibshirani, 1993, Ch. 13):
The \(\alpha/2\) and \(1-\alpha/2\) quantiles of the bootstrap distribution \(\hat{\theta}^*_1, \ldots, \hat{\theta}^*_R\) are used directly as the lower and upper bounds.
BCa interval (Efron, 1987):
Adjusts the quantiles used for the interval by a bias-correction term \(\hat{z}_0\) (estimated from the proportion of bootstrap resamples below the point estimate) and an acceleration term \(\hat{a}\) (estimated from the jackknife influence values). This gives second-order accurate intervals without requiring a transformation. The formulas follow DiCiccio & Efron (1996).
Degenerate resamples: Bootstrap resamples that yield a
single-level variable (all observations in one category) produce
NA effect sizes and are discarded. A warning is issued if more
than 5% of resamples are discarded.
Effect size on a boundary: Both phi and Cramer's V are bounded at 0. Bootstrap distributions for weak associations are therefore right-skewed and pile up at 0. The BCa correction partially addresses this; for near-zero associations the percentile lower bound may be 0, which is correct.
References
DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, 11(3), 189–212. doi:10.1214/ss/1032280214
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185. doi:10.1080/01621459.1987.10478410
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall. doi:10.1007/978-1-4899-4541-9
Examples
set.seed(42)
x <- sample(c("A", "B"), 120, replace = TRUE,
prob = c(0.4, 0.6))
y <- ifelse(x == "A",
sample(c("yes", "no"), 120, replace = TRUE, prob = c(0.7, 0.3)),
sample(c("yes", "no"), 120, replace = TRUE, prob = c(0.3, 0.7)))
# Small R for a fast example; use R >= 1000 in real work.
ci <- bootstrap_ci(x, y, R = 200, seed = 1)
#> Warning: R < 500 may yield unstable confidence intervals.
ci$estimate
#> [1] 0.4323126
ci$ci_lower
#> [1] 0.2452065
ci$ci_upper
#> [1] 0.5829789
# \donttest{
# BCa interval (slower: adds a leave-one-out jackknife)
ci_bca <- bootstrap_ci(x, y, R = 500, type = "bca", seed = 1)
# }