You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As noted at #410, predict appears to give incorrect results when some predictors are dropped because of collinearity. We should really do something about this.
julia> x = [1.01.02.01.02.03.01.0-1.00.0];
julia> y = [1.0, 3.0, -2.0];
julia> m1 =lm(x, y, dropcollinear=true)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,CholeskyPivoted{Float64,Array{Float64,2}}}}:
Coefficients:
─────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────
x1 0.0NaNNaNNaNNaNNaN
x2 2.071430.2575398.040.0787-1.200925.34378
x3 -0.4285710.174964-2.450.2468-2.651691.79455
─────────────────────────────────────────────────────────────────
julia> m2 =lm(x[:, 1:2], y, dropcollinear=false)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Cholesky{Float64,Array{Float64,2}}}}:
Coefficients:
────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────
x1 -0.4285710.174964-2.450.2468-2.651691.79455
x2 1.642860.12371813.280.04790.0708723.21484
────────────────────────────────────────────────────────────────
julia> p1 =predict(m1, x, interval=:confidence);
julia> p2 =predict(m2, x[:, 1:2], interval=:confidence);
julia>@test p1.prediction ≈ p2.prediction
Test Passed
julia>@test p1.upper ≈ p2.upper
Test Failed at REPL[254]:1
Expression: p1.upper ≈ p2.upper
Evaluated: [3.354342064529162; 5.728437323463455; 1.6152034019232366] ≈ [3.2437098232940285; 5.727181955909349; 1.200919478058661]
ERROR: There was an error during testing
julia>@test p1.lower ≈ p2.lower
Test Failed at REPL[255]:1
Expression: p1.lower ≈ p2.lower
Evaluated: [-0.925770635957734; -0.014151609177741165; -5.758060544780381] ≈ [-0.8151383947225999; -0.012896241623634452; -5.343776620915804]
ERROR: There was an error during testing
In the worst case we could throw an error when calling predict in such cases until we have a fix.
(BTW I'm not sure why the intercept is dropped in this example rather than the last column.)
The error stems from this line where we work with the pivoted cholesky instead of the original matrix from the original regression.
I haven't tracked down exactly how R is handling it. But there is an interesting warning here
p <- object$rank
p1 <- seq_len(p)
piv <- if(p) qr.lm(object)$pivot[p1]
if(p < ncol(X) && !(missing(newdata) || is.null(newdata)))
warning("prediction from a rank-deficient fit may be misleading")
I can trigger the warning
r$> predict(m1, as.data.frame(x), interval = "confidence")
fit lwr upr
1 1.214286 -0.81513839 3.243710
2 2.857143 -0.01289624 5.727182
3 -2.071429 -5.34377662 1.200919
Warning message:
In predict.lm(m1, as.data.frame(x), interval = "confidence") :
prediction from a rank-deficient fit may be misleading
Thanks. So apparently R doesn't know how to handle it either, so we should probably throw an error (I don't like warnings that tell you we return invalid results).
As noted at #410,
predict
appears to give incorrect results when some predictors are dropped because of collinearity. We should really do something about this.In the worst case we could throw an error when calling
predict
in such cases until we have a fix.(BTW I'm not sure why the intercept is dropped in this example rather than the last column.)
Cc: @piever @pdeffebach @mkborregaard @dmbates @andreasnoack @Nosferican @DilumAluthge
The text was updated successfully, but these errors were encountered: