catgraph provides network-based exploratory analysis of categorical data at two complementary levels.
Variable-level association network. Variables are nodes. Edges are weighted by the phi coefficient (2x2 tables) or Cramer’s V (larger tables), with optional bias correction (Bergsma, 2013). The workflow supports structural exploration, edge pruning with multiple-testing adjustment, bootstrap confidence intervals, and descriptive network summaries such as centrality and community structure.
-
Modality-level co-association network. Modalities (factor levels) are nodes. Cross-variable edges are weighted by absolute phi coefficients, with signed standardised Pearson residuals stored separately to indicate whether co-occurrence is above or below independence expectation. The workflow supports edge pruning, signed edge visualisation, and community detection over modalities.
The modality layer sits in the tradition of Multiple Correspondence Analysis and two-mode affiliation networks. It operates on pairwise associations and does not model higher-order interactions or conditional dependencies. It is a descriptive category co-association map, not a respondent-segmentation tool. For respondent segmentation use
poLCAorFactoMineR::HCPC. Modality gravity indices. A novel extension to standard graph centrality that incorporates the empirical prevalence of each modality. The Modality Gravity Index (MGI) and Orbital Score (OS) identify which modalities act as gravitational attractors (dominant, pulling rarer modalities toward them) and which are satellites (minority modalities orbiting more prevalent ones). This addresses a fundamental limitation of standard centrality indices, which treat all nodes as exchangeable regardless of their empirical frequency.
What’s new in 0.10.0
cluster_modalities()now defaults tosigned = TRUE(breaking change from 0.9.0). Communities are defined by positive co-association only — edges where modalities co-occur less than expected under independence (negative standardised Pearson residual) are excluded from clustering. This produces substantively more interpretable communities: for example,smoking_status=currentandlung_disease=noare no longer pulled into the same community by their large absolute phi weight despite being a repulsion pair. Usesigned = FALSEto restore the previous behaviour.build_modality_graph()now storesphi_signedas an additional edge attribute alongsideweight(absolute phi),p_value, andstd_resid. All downstream functions are unaffected.
Quick start
library(catgraph)
data(survey_health)
# Variable-level: which categorical variables show pairwise association?
cg <- catgraph(survey_health, corrected = TRUE)
cg_p <- prune_edges(cg, min_weight = 0.05, max_p = 0.05, p_adjust = "BH")
plot(cg_p)
# Modality-level: which category levels tend to co-occur across variables?
mg <- build_modality_graph(survey_health)
mg <- prune_modality_edges(mg, min_weight = 0.10, max_p = 0.05)
mg <- cluster_modalities(mg)
plot(mg, color_by = "cluster", signed = TRUE)
# Gravity indices: which modalities are attractors vs satellites?
grav <- modality_gravity(mg)
print(grav) # role-grouped: ATTRACTORS / SATELLITES
summary(grav) # role counts, Spearman rho diagnostic
plot_gravity(mg) # 6-panel: traditional centrality vs MGI
plot_gravity_scatter(grav, mg) # eigenvector vs dMGI contradiction plot
# Compare gravity profiles across subgroups
mg_f <- build_conditional_modality_graph(survey_health, given = list(sex = "female"))
mg_m <- build_conditional_modality_graph(survey_health, given = list(sex = "male"))
mg_f <- prune_modality_edges(mg_f, min_weight = 0.10, max_p = 0.05)
mg_m <- prune_modality_edges(mg_m, min_weight = 0.10, max_p = 0.05)
compare_gravity(list(female = mg_f, male = mg_m))
# Formal test: do two groups differ in overall association structure?
test_modality_graph_equality(mg_f, mg_m, n_perm = 500)See vignette("introduction", package = "catgraph") for the full variable-level and modality-level workflow, and vignette("comparison", package = "catgraph") for a worked comparison of catgraph’s modality layer with MCA, bipartite affiliation networks, and naive co-occurrence projection.
Scope — use catgraph for
| Question | Tool |
|---|---|
| Which categorical variables co-vary pairwise? | catgraph() |
| Which category levels bundle together across variables? | build_modality_graph() |
| Which modalities are structurally dominant vs peripheral? | modality_gravity() |
| How do category-level association patterns differ across groups? | compare_*_graphs() |
| What’s the unprojected respondent-modality incidence like? | bipartite_modality_graph() |
Scope — do not use catgraph for
- Causal inference or claims about direct effects.
- Conditional-independence structure — use a graphical model instead (
bnlearn,gRim). - Respondent segmentation or latent-class analysis — use
poLCAorFactoMineR::HCPC(). - Estimating conditional relationships between variables — edges are marginal and do not control for other variables.
Interpretation
All graphs produced by catgraph represent marginal association structure. Edge weights quantify pairwise dependence and should be interpreted descriptively. Differences between graphs indicate changes in association patterns, not necessarily changes in underlying causal mechanisms.
Gravity indices (MGI, OS) incorporate empirical modality prevalence. A positive dMGI indicates a modality that exerts net gravitational pull over less prevalent neighbours; a negative dMGI indicates a satellite modality being pulled toward more prevalent ones.