Fix collinearity inflation for ordinal models#903
Fix collinearity inflation for ordinal models#903jmgirard wants to merge 11 commits intoeasystats:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the intercept-handling logic in check_collinearity.R to support ordinal models like clm and clmm. The changes introduce a more flexible method for identifying slope parameters and subsetting the variance-covariance matrix. Feedback suggests adding a safety check to ensure term_assign is synchronized with the matrix dimensions before subsetting to avoid NA values. Additionally, it was recommended to use insight::find_parameters() for better consistency with the easystats ecosystem.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the collinearity check for ordinal models (clm and clmm) by filtering for true slope parameters rather than assuming a single intercept. It also improves the robustness of matrix subsetting. A review comment recommends using the component variable instead of hardcoding $conditional to ensure consistency with other sections of the code.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the collinearity check logic to better handle ordinal models (clm/clmm) by correctly filtering slope parameters. The reviewer noted that the current implementation is susceptible to issues with rank-deficient models because it relies on names(x$beta) and manual model matrix construction. It is recommended to use insight::find_parameters(x)$conditional and insight::get_modelmatrix(x) to ensure robust alignment of parameters and term assignments.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the check_collinearity.zerocount function to improve the handling of ordinal models, specifically clm and clmm objects. The changes introduce a more robust method for identifying and filtering slope parameters by matching them against the model matrix and variance-covariance matrix, ensuring that term assignments are correctly synchronized. This logic replaces the previous simplified intercept-removal code. I have no feedback to provide.
|
@mattansb, your review would be great! |
|
Hoping to see a low VIF like ~1.12 rather than a high one like 4.32 library(ordinal)
library(lme4)
#> Loading required package: Matrix
library(performance) # with the changes
set.seed(999)
n <- 500
# 1. Simulate perfectly orthogonal predictors
x_continuous <- rnorm(n, mean = 0, sd = 1)
x_binary <- sample(c(-0.5, 0.5), size = n, replace = TRUE, prob = c(0.85, 0.15))
subject_id <- factor(rep(1:50, each = 10))
# 2. Generate an ordinal outcome with MANY categories
random_intercepts <- rnorm(50, 0, 1)
latent_y <- 2 * x_continuous + 3 * x_binary + random_intercepts[as.numeric(subject_id)] + rlogis(n)
# Cut into 15 categories to generate 14 distinct thresholds
y_ordinal <- cut(
latent_y,
breaks = 15,
ordered_result = TRUE
)
dat <- data.frame(y_ordinal, x_continuous, x_binary, subject_id)
# 3. Fit models
mod_lmer <- lmer(as.numeric(y_ordinal) ~ x_continuous + x_binary + (1 | subject_id), data = dat)
mod_clmm <- clmm(y_ordinal ~ x_continuous + x_binary + (1 | subject_id), data = dat)
# 4. Compare Collinearity Checks
check_collinearity(mod_lmer)
#> # Check for Multicollinearity
#>
#> Low Correlation
#>
#> Term VIF VIF 95% CI adj. VIF Tolerance Tolerance 95% CI
#> x_continuous 1.00 [1.00, Inf] 1.00 1.00 [0.00, 1.00]
#> x_binary 1.00 [1.00, Inf] 1.00 1.00 [0.00, 1.00]
check_collinearity(mod_clmm)
#> # Check for Multicollinearity
#>
#> Low Correlation
#>
#> Term VIF VIF 95% CI adj. VIF Tolerance Tolerance 95% CI
#> x_continuous 1.12 [1.05, 1.29] 1.06 0.89 [0.78, 0.95]
#> x_binary 1.12 [1.05, 1.29] 1.06 0.89 [0.78, 0.95]Created on 2026-04-23 with reprex v2.1.1 |
|
Looks good! Could you possibly a) add a news item, b) increase version number, and c) if possible, add a test (e.g. based on your example)? |
Fixes #900
What this PR does
This PR resolves an issue where
check_collinearity()calculated artificially inflated VIFs for ordinal models (clmandclmmfrom theordinalpackage).Why the bug occurred
Previously,
check_collinearity()pulled the full variance-covariance matrix. Forclmmmodels, this matrix includes all threshold estimates (which act as multiple intercepts) and random effect variances. Leaving these structural parameters in the matrix artificially inflated the VIFs for the actual fixed predictors.For example, using the reproducible example in #900, the calculated VIFs were artificially inflated to
4.36instead of accurately reflecting the minor covariance induced between parameter estimates.How it is fixed
clmandclmmobjects to subset the variance-covariance matrix down to only the true slope parameters, usingnames(x$beta).term_assigntracking vector to prevent matrix indexing errors later in the function.drop = FALSEwhen subsetting to prevent dimension collapse.When tested against the reproducible example in the original post, the calculated VIFs dropped from
4.36down to1.12, correctly reflecting the minor covariance within the ordinal likelihood space.