Compute weighted centrality indices for all variables in a catgraph
Source:R/centrality.R
node_centrality.RdReturns a data frame of five weighted centrality measures for every node
(variable) in a catgraph object. All measures use the phi /
Cramer's V edge weights stored in the graph, so they reflect association
strength rather than mere connectivity.
Value
A data.frame with one row per variable and the following
columns:
variableVariable name (character).
strengthWeighted degree / strength centrality: the sum of all edge weights incident to a node. The primary measure of how strongly a variable is globally associated with all others (Barrat et al., 2004).
w_betweennessWeighted betweenness centrality: the number of weighted shortest paths passing through a node, where path lengths are \(1/w_{ij}\) so that stronger associations correspond to shorter distances (Brandes, 2001).
w_closenessWeighted closeness centrality: the inverse of the mean weighted distance to all other reachable nodes.
w_eigenvectorWeighted eigenvector centrality: a node scores highly if it is connected to other high-scoring nodes (Bonacich, 1987).
w_pagerankWeighted PageRank: a random-walk based centrality robust to isolated subgraphs (Brin & Page, 1998). Uses the igraph default damping factor of 0.85.
The data frame is returned sorted by strength in descending order.
Details
Edges with NA or non-positive weights are dropped from a local
copy of the graph before centrality is computed; the input
catgraph object is never modified. If the graph has no edges
after this cleanup, a zero-valued result is returned (with a uniform
1/p PageRank, the stationary distribution on an edgeless graph).
Strength centrality (Barrat et al., 2004):
$$s_i = \sum_{j \in \mathcal{N}(i)} w_{ij}$$
where \(\mathcal{N}(i)\) is the set of neighbours of node \(i\) and \(w_{ij}\) is the edge weight (phi or Cramer's V). In a categorical association graph this is the most interpretable measure: it is simply the total association strength of a variable with all other variables.
Weighted betweenness (Brandes, 2001):
Path lengths are defined as \(1/w_{ij}\) so that stronger edges are traversed preferentially. A variable with high weighted betweenness sits on many shortest paths and therefore acts as a statistical bridge between otherwise weakly associated variable groups.
Weighted closeness:
$$C_i = \frac{n-1}{\sum_{j \neq i} d(i,j)}$$
where \(d(i,j)\) is the weighted shortest-path distance. A high value indicates a variable that reaches all other variables through short (strong-association) paths.
Eigenvector centrality (Bonacich, 1987):
The leading eigenvector of the weighted adjacency matrix. A variable scores highly when it is associated with other variables that are themselves highly associated variables — a second-order hub measure.
PageRank (Brin & Page, 1998):
A random-walk measure that down-weights the contribution of nodes with many weak connections. Robust when the graph is sparse or has near-zero weight edges.
References
Barrat, A., Barthelemy, M., Pastor-Satorras, R., & Vespignani, A. (2004). The architecture of complex weighted networks. Proceedings of the National Academy of Sciences, 101(11), 3747–3752. doi:10.1073/pnas.0400087101
Bonacich, P. (1987). Power and centrality: A family of measures. American Journal of Sociology, 92(5), 1170–1182. doi:10.1086/228631
Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2), 163–177. doi:10.1080/0022250X.2001.9990249
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1-7), 107–117. doi:10.1016/S0169-7552(98)00110-X
Examples
df <- expand_table(Titanic)
cg <- catgraph(df)
nc <- node_centrality(cg)
nc
#> variable strength w_betweenness w_closeness w_eigenvector w_pagerank
#> 1 Sex 1.0000000 0 0.8869209 1.0000000 1.0000000
#> 2 Class 0.9579945 1 1.0000000 0.9284092 0.9719876
#> 3 Survived 0.8777217 0 0.7680133 0.9201859 0.8882785
#> 4 Age 0.4563527 0 0.5423700 0.4929748 0.5186247
# Identify the most central variable
nc$variable[1]
#> [1] "Sex"
# Raw (unnormalised) values
node_centrality(cg, normalize = FALSE)
#> variable strength w_betweenness w_closeness w_eigenvector w_pagerank
#> 1 Sex 0.9653402 0 0.08678885 1.0000000 0.2959551
#> 2 Class 0.9247906 2 0.09785410 0.9284092 0.2876647
#> 3 Survived 0.8473000 0 0.07515326 0.9201859 0.2628906
#> 4 Age 0.4405356 0 0.05307313 0.4929748 0.1534896