Changelog
Source:NEWS.md
catgraph 0.11.0
New feature: multiple association paradigms
This release introduces three additional association metrics for edge weighting, expanding catgraph from a single-paradigm tool to a multi-paradigm framework covering frequentist, information-theoretic, and Bayesian traditions. All four metrics are available at both the variable level (catgraph(), build_graph(), assoc_similarity()) and the modality level (build_modality_graph()).
New method argument
catgraph(), build_graph(), assoc_similarity(), and build_modality_graph() now accept a method argument (default "cramers_v") controlling which association measure is used as edge weights:
-
"cramers_v"— classical phi / Cramér’s V (previous default, fully backward-compatible). -
"cramers_v_corrected"— bias-corrected Cramér’s V via Bergsma (2013). Previously accessed throughcorrected = TRUE; that argument is retained for backward compatibility and now resolves internally tomethod = "cramers_v_corrected". -
"nmi"— Normalised Mutual Information. Symmetric, bounded [0, 1], information-theoretic. Sensitive to marginal entropy imbalances; recommended when variables have unequal numbers of categories or skewed marginal distributions. -
"ami"— Adjusted Mutual Information. Subtracts expected MI under random permutation (Vinh et al., 2010), correcting for the upward bias of NMI on sparse contingency tables. Recommended when any variable has rare categories (expected cell counts < 5). -
"bayesian_cramers_v"— Dirichlet-smoothed Cramér’s V. Applies a symmetric Dirichlet(alpha) prior to cell counts before computing the association, stabilising edge weights when tables are sparse. Defaultalpha = 0.5(Jeffreys prior). Converges to classical Cramér’s V as n → ∞.
New alpha argument
catgraph(), build_graph(), assoc_similarity(), and build_modality_graph() accept alpha (default 0.5), the Dirichlet prior concentration for method = "bayesian_cramers_v". Ignored for all other methods. Validated at call time.
New exported functions
-
nmi_assoc(x, y, adjusted = FALSE)— computes NMI or AMI for a pair of categorical vectors. Returns the same list structure aseffect_size()for consistency. -
bayesian_cramers_v(x, y, alpha = 0.5)— computes Dirichlet- smoothed Cramér’s V for a pair of categorical vectors. p-value is taken from the unsmoothed chi-square test to avoid anti-conservative inference.
catgraph object: new fields
-
$method— character string recording the association paradigm used. -
$alpha— the Dirichlet prior used (NA_real_when method is not"bayesian_cramers_v"). -
$corrected— retained for backward compatibility; now derived from$methodrather than stored independently.
print() and summary() changes
print.catgraph() and summary.catgraph() now display a human-readable Method line and, when applicable, an Alpha line showing the Dirichlet prior. The old Estimator line and Metric mix line have been replaced.
Backward compatibility
All existing code using catgraph(df), catgraph(df, corrected = TRUE), build_graph(df), or assoc_similarity(df) continues to work without any changes. The corrected argument is retained and silently resolved to method = "cramers_v_corrected".
References
Bergsma, W. (2013). A bias-correction for Cramér’s V and Tschuprow’s T. Journal of the Korean Statistical Society, 42(3), 323–328.
Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Wiley.
Good, I. J. (1965). The Estimation of Probabilities. MIT Press.
Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison. Journal of Machine Learning Research, 11, 2837–2854.
catgraph 0.10.0
Breaking change in default behaviour
-
cluster_modalities()now defaults tosigned = TRUE(wasFALSE). Communities are now defined by positive co-association only (edges with negative standardised Pearson residual — repulsion — are excluded from the clustering graph). This produces substantively more interpretable communities: modalities that co-occur less than expected (e.g.smoking_status=currentandlung_disease=no) are no longer pulled into the same community by their large absolute phi weight. Users who need the previous unsigned behaviour can passsigned = FALSEexplicitly.
New edge attribute in build_modality_graph()
-
build_modality_graph()now storesphi_signedas an additional edge attribute alongside the existingweight(absolute phi),p_value, andstd_resid. Theweightattribute is unchanged so all downstream functions (gravity indices, centrality, pruning, plotting) are unaffected.phi_signedis available for users who want the directed association value directly from the graph object.
Functions unaffected by this change
The following functions operate on weight (absolute phi) or std_resid directly and are unaffected by the clustering default change: modality_gravity(), node_centrality(), plot_gravity(), prune_modality_edges(), plot.catmodgraph(), compare_modality_graphs(), test_modality_graph_equality(), test_modality_edge_differences(), build_conditional_modality_graph(), joint_balance().
catgraph 0.9.0
New functions
modality_gravity()— computes Modality Gravity Index (MGI+, MGI-, dMGI) and Orbital Score (OS) for every node in a catmodgraph. Identifies attractor and satellite modalities based on prevalence-weighted edge contributions.plot_gravity()— 2x3 panel figure comparing traditional centrality (strength, betweenness, eigenvector) against gravity indices (MGI+, OS, dMGI) on a shared network layout. Setbars = TRUEfor a bar chart version.plot_gravity_scatter()— scatter plot of eigenvector centrality vs dMGI, with contradiction cases labelled. Primary diagnostic for demonstrating that MGI captures structure beyond standard centrality.compare_gravity()— compares dMGI and OS profiles across two conditional catmodgraph objects (e.g. female vs male subgroup), returning a ranked difference table and optional bar chart.
S3 methods
print.modality_gravity()— role-grouped formatted console output (ATTRACTORS / NEUTRAL / SATELLITES sections).summary.modality_gravity()— role counts, per-variable breakdown, and Spearman rho(strength, dMGI) diagnostic.node_centrality.catmodgraph()— extends node_centrality() to accept catmodgraph objects, returning traditional centrality measures augmented with all gravity indices in a single table.
Other changes
node_centrality()converted to S3 generic to support dispatch to both catgraph and catmodgraph objects.69 new tests added in
tests/testthat/test-modality-gravity.R.
catgraph 0.8.0
New features
plot_modality_difference(): new exported function. Renders the output oftest_modality_edge_differences()as a single graph with a shared layout. Edge colour encodes the sign of the differenceweight_x - weight_y, width scales with |difference|, and opacity scales with-log10(p_adjusted)(clamped at 1e-4). Complements the existingplot.catmodedgetest()bar chart: use the bar chart for a ranked-list read, this function for a network-structural read.joint_balance(): new high-level user-facing entry point for cross-group categorical-balance diagnostics. Combines (1) per-variable marginal chi-square tests (BH-adjusted across variables), (2) pairwise omnibus tests fromtest_modality_graph_equality()(Bonferroni-adjusted across group pairs), and (3) edge-wise post-hoc fromtest_modality_edge_differences()(BH-adjusted across edges, only for pairs rejecting the omnibus). Returns an S3 object of classjointbalancewithprint(),summary(), andplot()methods. The plot method is a two-panel layout: marginal bar chart + modality- difference graph for the most-rejecting pair.build_conditional_modality_graph(): new exported function. Thin wrapper overbuild_modality_graph()that subsets the data by one or more conditioning levels (e.g.,list(sex = "female")) and stores the conditioning specification on the returned object. The returned object is a plaincatmodgraphso that all existing methods (plot, prune, cluster, compare) work unchanged. Intended for exploring conditional joint categorical structure in sociodemographic strata, survey waves, or study sites. Does NOT support conditioning on derived cluster labels — the conditioning must be an observed variable.
catgraph 0.7.0
This release adds formal inference for cross-group structural differences in modality networks. Two permutation-based tests are introduced, complementing the visual comparison tools shipped in 0.6.0.
New features
test_modality_graph_equality(): new exported function. Tests the null hypothesis that two modality networks are drawn from the same joint distribution, using label permutation. Supports three test statistics (Frobenius, Jaccard, maximum edge difference), two pipeline modes (unfiltered full weight matrix or the user’s pruned analysis), and stratified permutation via astrataargument. Returns an S3 object of classcatmodtestwithprint(),summary(), andplot()methods.test_modality_edge_differences(): new exported function for edge-wise post-hoc testing after a significant global test. Computes empirical p-values per edge from the same permutation procedure, with Benjamini-Hochberg FDR correction applied across edges. Supports restricting the edge set to the union of the two input graphs to reduce testing burden. Returns an S3 object of classcatmodedgetestwithprint(),summary(), andplot()methods.
Documentation
- New vignette section planned in
comparison.Rmddemonstrating the two-step inferential workflow (omnibus test then edge-wise post-hoc) on the bundledsurvey_healthdataset. - Simulation study script added to
tests_manual/documenting Type I error and power characteristics.
Dependencies
- Adds
stats::p.adjustto Imports (via@importFrom).
catgraph 0.6.0
This release refocuses the modality layer as a category co-association network in the tradition of Multiple Correspondence Analysis and two-mode affiliation networks, removing functionality that over-claimed the package as a respondent-segmentation tool. It is a breaking change for users of the 0.5.x respondent-profile module.
Scope refocus (breaking)
-
Removed:
assign_users_to_profiles(),profile_modality_clusters(), thecatmodprofileS3 class, the$user_clusterand$user_scoreslist components, and all associated S3 methods. These implemented a post-hoc respondent-to-community assignment rule that mis-framed the modality layer as a segmentation method. The modality layer is now documented honestly as a descriptive category co-association map; respondent-level segmentation should usepoLCAorFactoMineR::HCPC()instead. -
Renamed:
profile_modality_clusters()->summarise_modality_communities(). The function now returns an object of classcatmodcommunity(wascatmodprofile), with list componentscommunity_summary,community_members, andvariable_composition. - Documentation rewrite: the modality-graph section of the vignette has been rewritten to frame the layer as a category co-association tool rather than a segmentation tool, with an explicit “when to use what” decision table and a dedicated caveats subsection for modality graphs.
New features
-
plot.catmodgraph(signed = TRUE): new argument to colour edges by sign of the stored standardised Pearson residual (green = positive / attraction, red = negative / repulsion). Edge transparency scales with |residual|. Defaultsigned = FALSEpreserves the previous unsigned colouring. -
plot.catmodgraph(remove_isolates = TRUE): new argument, defaultTRUE. Hides degree-0 vertices from the plot without modifying the input object. Singletons arising from pruning no longer clutter the visualization by default. -
bipartite_modality_graph(): new exported function constructing the two-mode (respondent × modality) incidence graph with anigraphbipartitetypeattribute. Returns a new S3 classcatbipartitewithprint(),summary(), andplot()methods. Complements the unipartite modality graph by preserving raw incidence structure. -
compare_catgraphs(): new exported function for multi-panel visual comparison of variable-level association networks across groups, populations, or time points. Supports pooled / individual / overlay / none pruning strategies, with a shared layout anchored on the union graph. -
compare_modality_graphs(): new exported function for multi-panel comparison of modality-level networks. Supportsrestrict = "common"or"union"modality-set handling and signed-edge visualisation. -
New vignette
comparison.Rmd: worked comparison ofcatgraph’s modality layer against Multiple Correspondence Analysis, bipartite affiliation networks, and naive co-occurrence projection on the bundledsurvey_healthdataset. Includes a decision-table for when to use which method and an explicit list of wherecatgraphshould not be overclaimed. -
cluster_modalities(signed = TRUE): new argument for signed-aware community detection. Drops negative-residual (repulsion) edges before running Louvain, so communities are defined by positive co-association only. Thestd_residattribute is retained on the original graph for visualization. Documented as a pragmatic adaptation; for principled signed-network community detection (Traag & Bruggeman 2009), see thesignnetpackage.
Tests
Test suite updated to reflect the removed functions. Tests referencing assign_users_to_profiles, $user_cluster, or $user_scores have been deleted; new tests cover summarise_modality_communities() and its S3 methods.
catgraph 0.5.0
This release refocuses the package on its network-native intent and ships a working modality-level association network module. It is a breaking change for users of the 0.4.x respondent-segmentation module.
Scope refocus
The package is now organised around two network modules:
- Variable-level association network — shipped and stable since 0.4.0.
- Modality-level association network — new in 0.5.0. Nodes are modalities (factor levels across all variables), edges are cross-variable modality associations (phi coefficient, with the signed standardised Pearson residual stored as an interpretive edge attribute). Louvain or Walktrap communities on that graph serve as modality clusters, and respondents are assigned to profiles via an argmax rule (count-based default, weighted-score option).
New features
-
build_modality_graph(): constructs a modality-level graph from a categorical data frame. Same-variable modality pairs are excluded by construction. Returns an object of new S3 classcatmodgraph. -
prune_modality_edges(): modality-level analogue ofprune_edges(), removing weak or non-significant edges by phi and p-value thresholds. Also supports isolate removal and keeps the modality metadata, indicator matrix, and membership aligned with the surviving vertices. -
cluster_modalities(): Louvain (default) or Walktrap community detection on acatmodgraph. Writes membership onto vertices. -
assign_users_to_profiles(): assigns each respondent to a modality cluster using either a count rule or a node-strength-weighted rule. -
profile_modality_clusters(): summarises modality communities (size, variable composition, internal cohesion, user-cluster counts). Returns an object of new S3 classcatmodprofile. -
plot.catmodgraph(): dedicated plot method for modality graphs, with colouring by originating variable or by detected cluster. - New S3 classes:
catmodgraphandcatmodprofile, withprint()andsummary()methods. - New vignette section “Modality-level network analysis” documenting the full modality workflow on the expanded Titanic dataset.
Breaking changes
-
Removed:
segment_individuals(),profile_clusters(),plot_profiles(), thecatsegmentandcatprofileS3 classes, and all their S3 methods. The Gower-distance + PAM / hclust / k-modes row clustering they implemented was distance-based, not network-based, and therefore did not match the package’s network-first intent. The modality-network module described above replaces this functionality. -
Dropped dependencies:
clusterandklaRare no longer required.
Cleanup
- Corrected the
@sourcetag in thesurvey_healthdataset documentation (previously referenced adata-raw/script that was not in the tarball). - Fixed an escaped-backtick rendering bug in the vignette’s
chained-caveatchunk. - Fixed a duplicated YAML block at the top of the vignette that rendered as visible text in the HTML output.
Scope change (breaking for documentation only)
- Package title changed from “Undirected Graphical Models for Categorical Data Using Effect Size Metrics” to “Effect-Size Weighted Association Networks for Categorical Data”.
-
DESCRIPTION and vignette now state explicitly that a
catgraphis a pairwise marginal association network, not a conditional- independence graphical model. Edges encode bivariate dependence only. - New vignette sections “Scope and interpretation” and “Methodological caveats” document the five caveats reviewers should know about: marginal-not-conditional structure, mixed phi/Cramer’s V metrics, pairwise-deletion edge-specific sample sizes, multiple testing, and unsigned weights.
Bug fixes
-
build_graph()no longer forces zero weights to.Machine$double.eps. In v0.3.0 and earlier, pairs with true zero association were stored as edges with weight2.22e-16, which structurally guaranteed a complete graph and silently inflated density, centrality, and community-detection measures. True zero weights are now absent edges. For dense similarity-matrix output (e.g. heatmaps), use the newassoc_similarity()function. -
catgraph()now stores the processed data, not the raw input. In v0.3.0,cg$dataheld the user’s original data whilebuild_graph()internally coerced columns and dropped constant columns. This meantcatgraph_ci()resampled from a dataset that differed from the one used for the original point estimate.cg$datanow holds the same processed object; the original is kept incg$raw_datafor reference. -
plot.catgraph()now honours thelayoutargument. The igraph branch hard-coded Fruchterman-Reingold;kk,circle,grid,graphopt,nicely, andrandomare now respected.
New features
-
assoc_similarity(): a new exported function returning the densep x psimilarity matrix of all pairs (including zero-weight pairs), suitable for heatmaps. Separates the similarity-matrix object from the graph-topology object. -
prune_edges(..., p_adjust = ...): multiple-testing correction across allchoose(p, 2)simultaneous tests. Default is Benjamini- Hochberg (BH);holm,bonferroni, andnoneare also available. Two new edge attributes (p_value_adj,p_adjust_method) are added. -
compute_assoc()severe-sparsity warning: a second-tier warning fires when >20% of cells have expected count < 5, or when a table of minimum dimension >= 3 has observations-to-cells ratio below 5.