predict gives incorrect CIs when collinear columns have been dropped #413

nalimilan · 2021-03-20T17:28:58Z

As noted at #410, predict appears to give incorrect results when some predictors are dropped because of collinearity. We should really do something about this.

julia> x = [1.0 1.0 2.0
            1.0 2.0 3.0
            1.0 -1.0 0.0];

julia> y = [1.0, 3.0, -2.0];

julia> m1 = lm(x, y, dropcollinear=true)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,CholeskyPivoted{Float64,Array{Float64,2}}}}:

Coefficients:
─────────────────────────────────────────────────────────────────
        Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────
x1   0.0       NaN         NaN       NaN     NaN        NaN
x2   2.07143     0.257539    8.04    0.0787   -1.20092    5.34378
x3  -0.428571    0.174964   -2.45    0.2468   -2.65169    1.79455
─────────────────────────────────────────────────────────────────


julia> m2 = lm(x[:, 1:2], y, dropcollinear=false)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Cholesky{Float64,Array{Float64,2}}}}:

Coefficients:
────────────────────────────────────────────────────────────────
        Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────
x1  -0.428571    0.174964  -2.45    0.2468  -2.65169     1.79455
x2   1.64286     0.123718  13.28    0.0479   0.070872    3.21484
────────────────────────────────────────────────────────────────


julia> p1 = predict(m1, x, interval=:confidence);

julia> p2 = predict(m2, x[:, 1:2], interval=:confidence);

julia> @test p1.prediction ≈ p2.prediction
Test Passed

julia> @test p1.upper ≈ p2.upper
Test Failed at REPL[254]:1
  Expression: p1.upper ≈ p2.upper
   Evaluated: [3.354342064529162; 5.728437323463455; 1.6152034019232366] ≈ [3.2437098232940285; 5.727181955909349; 1.200919478058661]
ERROR: There was an error during testing

julia> @test p1.lower ≈ p2.lower
Test Failed at REPL[255]:1
  Expression: p1.lower ≈ p2.lower
   Evaluated: [-0.925770635957734; -0.014151609177741165; -5.758060544780381] ≈ [-0.8151383947225999; -0.012896241623634452; -5.343776620915804]
ERROR: There was an error during testing

In the worst case we could throw an error when calling predict in such cases until we have a fix.

(BTW I'm not sure why the intercept is dropped in this example rather than the last column.)

Cc: @piever @pdeffebach @mkborregaard @dmbates @andreasnoack @Nosferican @DilumAluthge

The text was updated successfully, but these errors were encountered:

pdeffebach · 2021-03-20T18:22:21Z

Thanks for highlighting this.

I can confirm that R does not have this behavior.

The error stems from this line where we work with the pivoted cholesky instead of the original matrix from the original regression.

I haven't tracked down exactly how R is handling it. But there is an interesting warning here

    p <- object$rank
    p1 <- seq_len(p)
    piv <- if(p) qr.lm(object)$pivot[p1]
    if(p < ncol(X) && !(missing(newdata) || is.null(newdata)))
	warning("prediction from a rank-deficient fit may be misleading")

I can trigger the warning


r$> predict(m1, as.data.frame(x), interval = "confidence")                 
        fit         lwr      upr
1  1.214286 -0.81513839 3.243710
2  2.857143 -0.01289624 5.727182
3 -2.071429 -5.34377662 1.200919
Warning message:
In predict.lm(m1, as.data.frame(x), interval = "confidence") :
  prediction from a rank-deficient fit may be misleading

nalimilan · 2021-03-20T19:18:11Z

Thanks. So apparently R doesn't know how to handle it either, so we should probably throw an error (I don't like warnings that tell you we return invalid results).

andreasnoack · 2024-11-21T16:28:01Z

@nalimilan do we have to resolve this before 2.0? Will a "fix" here be breaking any other functionality?

ericqu mentioned this issue Apr 4, 2021

Cooks Distance tests #415

Merged

palday mentioned this issue Sep 29, 2022

Version 2.0 Breaking Changes #500

Open

andreasnoack added the decision label Nov 20, 2024

andreasnoack added this to the Release 2.0 milestone Nov 20, 2024

andreasnoack added the bug label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict gives incorrect CIs when collinear columns have been dropped #413

predict gives incorrect CIs when collinear columns have been dropped #413

nalimilan commented Mar 20, 2021

pdeffebach commented Mar 20, 2021 •

edited

Loading

nalimilan commented Mar 20, 2021

andreasnoack commented Nov 21, 2024

predict gives incorrect CIs when collinear columns have been dropped #413

predict gives incorrect CIs when collinear columns have been dropped #413

Comments

nalimilan commented Mar 20, 2021

pdeffebach commented Mar 20, 2021 • edited Loading

nalimilan commented Mar 20, 2021

andreasnoack commented Nov 21, 2024

pdeffebach commented Mar 20, 2021 •

edited

Loading