Prune edges from a catgraph by effect size or adjusted p-value
Source:R/prune_edges.R
prune_edges.RdRemoves edges whose effect size or (adjusted) p-value does not meet a
specified threshold, returning a new catgraph object with a
sparser graph. Multiple-testing adjustment is applied across all edges
by default.
Usage
prune_edges(
x,
min_weight = 0,
max_p = 1,
p_adjust = c("BH", "holm", "bonferroni", "none"),
remove_isolates = FALSE
)Arguments
- x
A
catgraphobject.- min_weight
Numeric, non-negative. Edges with effect size strictly below this value are removed. Since phi and Cramer's V lie in [0, 1], meaningful thresholds are in that range; values >= 1 remove all edges. Effect-size pruning is the primary filter and is always a safer choice than p-value pruning, because the package is centred on effect sizes. Default
0(no filtering).- max_p
Numeric in [0, 1]. Edges with adjusted p-value strictly above this value are removed. Adjustment method is controlled by
p_adjust. Default1(no filtering).- p_adjust
Character. Multiple-testing correction applied across all edges (
choose(p, 2)simultaneous tests). One of:"BH"Benjamini-Hochberg false discovery rate (default). Recommended for exploratory work.
"holm"Holm-Bonferroni step-down; strong family-wise error rate control.
"bonferroni"Bonferroni; conservative FWER control.
"none"Raw p-values (unadjusted). Not recommended when many variables are analysed; retained for reproducing pre-0.4.0 behaviour.
- remove_isolates
Logical. If
TRUE, vertices with degree 0 after pruning are also removed. DefaultFALSE.
Value
A catgraph object with the filtered graph. The graph gains
two new edge attributes: p_value_adj (adjusted p-values) and
p_adjust_method (the method string).
Details
Pruning uses the edge attribute weight (the active effect size)
and the adjusted p_value. Both thresholds apply
simultaneously; an edge is retained only when both conditions
are met.
Edges with NA weights or p-values (from degenerate pairs) are
always removed.
Chained calls and multiplicity scoping. Multiple-testing
correction is applied across the edges present in the graph at the
time of the call. When prune_edges() is called on a graph that
has already been pruned with a non-"none" p-value
adjustment, the second call re-adjusts on the surviving subset, not
on the original choose(p, 2) tests. This is anti-conservative
and the function emits a warning in this case. To change the
adjustment method mid-analysis, rebuild the catgraph with
catgraph() and prune once; do not chain two adjusted
prunes. A single prune_edges() call that specifies both
min_weight and max_p is always safe because the BH /
Holm denominators are computed before any filtering.
Conventional Cohen (1988) thresholds for phi and Cramer's V: small \(\approx 0.1\), medium \(\approx 0.3\), large \(\geq 0.5\).
References
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSS-B, 57(1), 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
Examples
df <- expand_table(Titanic)
cg <- catgraph(df)
# Default: BH-adjusted p-values, effect-size floor of 0.1
cg_pruned <- prune_edges(cg, min_weight = 0.1, max_p = 0.05)
cg_pruned
#> catgraph object (pairwise association network)
#> Variables : 4
#> Edges : 5
#> Method : Cramer's V (classical)
#> Weights : min = 0.1110 median = 0.2941 max = 0.4556
#> Note : edges encode pairwise marginal association, not
#> conditional independence. All metrics lie on [0, 1].
#> NMI / AMI weights are not exchangeable with Cramer's V
#> weights across graph objects. See vignette
#> 'Methodological caveats'.
# Stricter: Holm adjustment
prune_edges(cg, min_weight = 0.1, max_p = 0.05, p_adjust = "holm")
#> catgraph object (pairwise association network)
#> Variables : 4
#> Edges : 5
#> Method : Cramer's V (classical)
#> Weights : min = 0.1110 median = 0.2941 max = 0.4556
#> Note : edges encode pairwise marginal association, not
#> conditional independence. All metrics lie on [0, 1].
#> NMI / AMI weights are not exchangeable with Cramer's V
#> weights across graph objects. See vignette
#> 'Methodological caveats'.
# Pre-0.4.0 behaviour (raw p-values)
prune_edges(cg, min_weight = 0.1, max_p = 0.05, p_adjust = "none")
#> catgraph object (pairwise association network)
#> Variables : 4
#> Edges : 5
#> Method : Cramer's V (classical)
#> Weights : min = 0.1110 median = 0.2941 max = 0.4556
#> Note : edges encode pairwise marginal association, not
#> conditional independence. All metrics lie on [0, 1].
#> NMI / AMI weights are not exchangeable with Cramer's V
#> weights across graph objects. See vignette
#> 'Methodological caveats'.