Skip to contents

The primary user-facing constructor for catgraph. It computes pairwise effect sizes (phi or Cramer's V) for all categorical variable pairs, stores the resulting weighted igraph network, and preserves processed data and metadata for downstream analysis. Use this function for standard workflows; use build_graph only when a raw igraph object is required.

Usage

catgraph(
  data,
  method = "cramers_v",
  corrected = FALSE,
  correct = FALSE,
  simulate_p = FALSE,
  B = 2000L,
  alpha = 0.5
)

# S3 method for class 'catgraph'
print(x, ...)

# S3 method for class 'catgraph'
summary(object, top = 10L, ...)

Arguments

data

A data frame or tibble whose columns represent categorical variables. Factor, character, and logical columns are supported. Numeric columns are coerced to character with a message.

method

Character. Association metric for edge weights. One of "cramers_v" (default), "cramers_v_corrected", "nmi", "ami", or "bayesian_cramers_v". See build_graph for details.

corrected

Logical. Deprecated shortcut for method = "cramers_v_corrected". Kept for backward compatibility. Default FALSE.

correct

Logical. Yates' continuity correction for the chi-square test. Default FALSE.

simulate_p

Logical. Monte Carlo p-value simulation. Default FALSE.

B

Integer. Monte Carlo resamples when simulate_p = TRUE. Default 2000L.

alpha

Numeric. Dirichlet prior concentration for method = "bayesian_cramers_v". Default 0.5 (Jeffreys prior). Ignored for all other methods.

x

A catgraph object.

...

Ignored.

object

A catgraph object.

top

Integer. Number of strongest edges to display. Use Inf for all edges. Default 10L.

Value

An S3 object of class catgraph containing:

graph

An undirected weighted igraph object. True zero associations are absent edges, not near-zero edges.

data

The processed data frame actually used for estimation (after non-categorical coercion and constant-column removal). Downstream functions such as catgraph_ci resample from this object. Changed from raw_data in v0.4.0 to fix an internal-consistency bug.

raw_data

The original input data frame, for reference.

method

Character string recording which association metric was used ("cramers_v", "cramers_v_corrected", "nmi", "ami", or "bayesian_cramers_v").

alpha

The Dirichlet prior used, or NA when method is not "bayesian_cramers_v".

corrected

Logical flag, TRUE when method = "cramers_v_corrected". Kept for backward compatibility.

n_vars

Number of variables (graph vertices).

n_pairs_total

Number of variable pairs evaluated.

n_pairs

Number of retained graph edges (pairs with non-zero effect size).

call

The matched call.

Details

Scope. A catgraph is a pairwise association network, not a conditional-independence graphical model. Edges encode bivariate dependence between two variables and do not imply that the two variables remain dependent after controlling for the remaining variables. Interpret centrality, community, and bridge measures accordingly. See the package vignette for a full discussion.

All variable pairs with non-zero effect size are retained by default (no thresholding at construction time). To remove weak or non-significant edges, pass the object to prune_edges.

Methods (by generic)

  • print(catgraph): Print a concise summary of a catgraph object.

  • summary(catgraph): Summarise a catgraph object, listing edges sorted by effect size.

References

Bergsma, W. (2013). A bias-correction for Cramer's V and Tschuprow's T. Journal of the Korean Statistical Society, 42(3), 323–328. doi:10.1016/j.jkss.2012.10.002

Examples

df <- expand_table(Titanic)
cg <- catgraph(df)
cg
#> catgraph object (pairwise association network)
#>   Variables : 4 
#>   Edges     : 6 
#>   Method    : Cramer's V (classical) 
#>   Weights   : min = 0.0976  median = 0.2630  max = 0.4556
#>   Note      : edges encode pairwise marginal association, not
#>               conditional independence. All metrics lie on [0, 1].
#>               NMI / AMI weights are not exchangeable with Cramer's V
#>               weights across graph objects. See vignette
#>               'Methodological caveats'.
summary(cg)
#> catgraph summary
#>   Variables       : 4 
#>   Pairs evaluated : 6 
#>   Edges retained  : 6 
#> 
#>   Method          : Cramer's V (classical) 
#> 
#>   Top 6 edges by effect size:
#> 
#>    var1     var2 effect_size    metric    p_value    n type
#> 1   Sex Survived     0.45560       phi 2.302e-101 2201  2x2
#> 2 Class      Sex     0.39872 cramers_v  1.557e-75 2201  RxC
#> 3 Class Survived     0.29412 cramers_v  5.000e-41 2201  RxC
#> 4 Class      Age     0.23195 cramers_v  1.695e-25 2201  RxC
#> 5   Sex      Age     0.11101       phi  1.907e-07 2201  2x2
#> 6   Age Survived     0.09758       phi  4.701e-06 2201  2x2

cg_bc <- catgraph(df, corrected = TRUE)