Expand a contingency table or frequency data frame to observation-level format
Source:R/expand_table.R
expand_table.RdConverts multi-dimensional contingency tables (table, array,
ftable), or data frames that contain a frequency/count column, into
a flat data frame where every row represents one observation. This is the
required input format for catgraph.
Arguments
- tbl
A
table,array,ftable, ordata.frame. All standard R contingency table formats are supported, including multi-dimensional arrays such asTitanic(4-D) andHairEyeColor(3-D).- freq_col
Character or integer. Only used when
tblis adata.frame. The name or column index of the frequency/count column. IfNULL(default), the function looks for a column named"Freq","freq","n","count", or"Count"in that order. An error is raised if none is found.- as_factor
Logical. If
TRUE(default), all resulting columns are coerced to factors, preserving the level order from the original object'sdimnames. Set toFALSEto return character columns.- drop_zero
Logical. If
TRUE(default), rows corresponding to zero-count cells are silently dropped. Set toFALSEto keep them (those rows will appear zero times and therefore not affect results, but the factor levels will still be present).
Value
A data.frame with one row per observation and one column per
categorical variable. The column names are taken from dimnames()
of the input object, or from the non-frequency columns of the input
data frame. Row names are reset to NULL.
Details
Accepted input formats:
- One row per observation
Already in the correct format — pass directly to
catgraphwithout callingexpand_table().table/arrayThe standard output of
table(),xtabs(), or built-in datasets such asTitanic,HairEyeColor, andUCBAdmissions.ftableConverted to a
tableinternally before expansion.data.framewith frequency columnThe output of
as.data.frame(some_table), which always contains aFreqcolumn.
Not accepted:
- Raw numeric matrices
A plain matrix of counts without
dimnamescannot be safely converted. Assigndimnamesfirst.- Numeric 0/1 columns
Columns coded as integers or doubles are not treated as categorical. Coerce with
as.factor()oras.character()first, or pass them throughcatgraphwhich will coerce and warn automatically.
References
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Examples
# Built-in 4-D table
df <- expand_table(Titanic)
str(df)
#> 'data.frame': 2201 obs. of 4 variables:
#> $ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 3 3 3 3 3 3 3 3 3 3 ...
#> $ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
#> $ Age : Factor w/ 2 levels "Adult","Child": 2 2 2 2 2 2 2 2 2 2 ...
#> $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
nrow(df) # 2201 passengers
#> [1] 2201
# Built-in 3-D table
df2 <- expand_table(HairEyeColor)
str(df2)
#> 'data.frame': 592 obs. of 3 variables:
#> $ Hair: Factor w/ 4 levels "Black","Blond",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ Eye : Factor w/ 4 levels "Blue","Brown",..: 2 2 2 2 2 2 2 2 2 2 ...
#> $ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
# data.frame with Freq column (output of as.data.frame on a table)
tab_df <- as.data.frame(UCBAdmissions)
df3 <- expand_table(tab_df)
nrow(df3) # 4526 applicants
#> [1] 4526
# Custom data frame with a count column
survey <- data.frame(
gender = c("M", "F", "M", "F"),
smokes = c("yes", "yes", "no", "no"),
n = c(23L, 15L, 48L, 61L)
)
df4 <- expand_table(survey, freq_col = "n")
nrow(df4) # 147 observations
#> [1] 147
# Use directly with catgraph
cg <- catgraph(expand_table(Titanic))
cg
#> catgraph object (pairwise association network)
#> Variables : 4
#> Edges : 6
#> Method : Cramer's V (classical)
#> Weights : min = 0.0976 median = 0.2630 max = 0.4556
#> Note : edges encode pairwise marginal association, not
#> conditional independence. All metrics lie on [0, 1].
#> NMI / AMI weights are not exchangeable with Cramer's V
#> weights across graph objects. See vignette
#> 'Methodological caveats'.