Negative r2 for linear model with no intercept #481

mousum-github · 2022-04-25T04:16:17Z

The Julia lm produces negative r2 (-0.15702479338842856) for the following:
data = DataFrame(x = 60:70, y = 130:140)
mdl = lm(@formula(y ~ 0 + x), data)
r2(mdl)

The certified value of r2 = 0.999365492298663
https://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA/NoInt1.dat

While investigating the reason, we found that the nulldeviance calculation is different. Ideally, if the model has the intercept term, then the nulldeviance = y’y – n * mean(y)^2 and if the model does not have the intercept, then the nulldeviance = y’y.

In this PR, we have attempted to correct the nulldeviance calculation.

The following is the summary of the changes:

Added a new parameter hasintercept of type Bool in nulldeviance function with existing parameter type LinResp.
nulldeviance(obj::LinearModel) = nulldeviance(obj.rr, hasintercept(obj))
Added a test case to test beta, standard error of the estimate, dispersion and r2 with the certified values given in the above link

codecov-commenter · 2022-04-25T04:21:02Z

Codecov Report

Merging #481 (7c52e8d) into master (42a0d04) will increase coverage by 2.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #481      +/-   ##
==========================================
+ Coverage   85.12%   87.13%   +2.00%     
==========================================
  Files           7        7              
  Lines         827      847      +20     
==========================================
+ Hits          704      738      +34     
+ Misses        123      109      -14

Impacted Files	Coverage Δ
src/lm.jl	`96.18% <100.00%> (-0.06%)`	⬇️
src/GLM.jl	`50.00% <0.00%> (ø)`
src/glmfit.jl	`79.29% <0.00%> (+0.55%)`	⬆️
src/glmtools.jl	`92.99% <0.00%> (+10.28%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42a0d04...7c52e8d. Read the comment docs.

ayushpatnaikgit · 2022-04-27T05:49:06Z

@nalimilan @ViralBShah the PR is only failing on Julia nightly—windows. I don't think it's to do with anything that we've changed. What shall we do in this situation? It's happening with PR 482 as well.

ViralBShah · 2022-04-27T11:31:44Z

Best would be to isolate a minimal test case and file a bug against Julia.

ViralBShah · 2022-04-27T11:33:26Z

Actually do check the logs. This has to do with timezones, so ignore. Need to see what is going on though

nalimilan · 2022-04-27T11:39:59Z

I think the TimeZones issue is a network problem, I've seen it before.

Regarding the PR itself, I think we'll have to adjust the definition of nulldeviance and nullloglikelihood to cover models with no intercept, as currently the null model is defined as the one with only the intercept, but this doesn't make much sense when there's no intercept. This is an issue with have recently discussed too at #479 (comment). Cc: @palday

src/lm.jl

test/runtests.jl

…defination of adjusted r-squared, added test cases, added more metrics in test cases

kleinschmidt

probably good to also test teh case when the model is are not fit to a formula? e.g. the lm(X::Matrix, y::Vector) method. AFAICT it should Just Work TM (there are methods for hasintercept for formula-less models) but good to make sure it doesn't break :)

src/lm.jl

nalimilan · 2022-05-20T07:31:46Z

(For some reason I can't comment in the thread above.) I think eachindex is fine here. People shouldn't play fitting models where the response, predictors and/or weights have different axes. That would be asking for trouble. (Note that the current code is silently incorrect in that case anyway, so throwing an error would be an improvement.)

Thanks for your suggestion. It has been committed. Co-authored-by: Alex Arslan <[email protected]>

Removed indexing as suggested. Co-authored-by: Alex Arslan <[email protected]>

…` function + added one more test case without providing any formula in `lm` function

nalimilan · 2022-05-22T15:05:38Z

Thanks, looks good! I just wonder what release strategy we should adopt. Technically this could be considered as a breaking change (because it changes the definition of nulldeviance and nullloglikelihood, cf. JuliaStats/StatsAPI.jl#14) or as a bugfix (because the previous result didn't make a lot of sense, and was even misleading for the R²). We probably don't want to tag GLM 2.0 just for htis. We could adopt an intermediate approach and print a warning (at least temporarily) for models without an intercept. Opinions?

test/runtests.jl

src/lm.jl

test/runtests.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

Removed `;`. The suggestion is commited. Co-authored-by: Milan Bouchet-Valat <[email protected]>

palday · 2022-05-23T03:15:05Z

@nalimilan I like the idea of issuing a warning for models without intercepts and a minor version bump -- it's a good thing to call out anyway that a lot of classical summary definitions are "weird" for models without intercepts.

nalimilan · 2022-05-23T19:57:21Z

src/lm.jl

+            m = mean(y, weights(wts))
+        end
+    else
+        m = zero(eltype(y))


OK, let's try something like this then? It would also be good to adapt Project.toml to require StatsAPI 1.4, which I'll tag shortly (JuliaStats/StatsAPI.jl#14), so that the docstring is consistent with what we do here.

Suggested change

m = zero(eltype(y))

@warn("Starting from GLM.jl 1.8, null model is defined as having no predictor at all " *

"when a model without an intercept is passed.")

m = zero(eltype(y))

nalimilan · 2022-05-23T19:59:44Z

@kleinschmidt Technically, it's possible to fit a model which has an intercept but for which hasintercept returns false by passing a model matrix which applies full dummy coding without a column of ones, right? Should we try to detect this?

nalimilan · 2022-06-11T13:46:43Z

I don't have rights to edit your branch so I pushed the commit directly to master with the warning. Thanks!

corrected negative r2 calculation for linear model with no intercept

e4c5651

nalimilan reviewed Apr 27, 2022

View reviewed changes

src/lm.jl Outdated Show resolved Hide resolved

src/lm.jl Outdated Show resolved Hide resolved

src/lm.jl Outdated Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

single defination of nulldeviance and nullloglikelihood, updated the …

cda3296

…defination of adjusted r-squared, added test cases, added more metrics in test cases

nalimilan mentioned this pull request May 12, 2022

Adjust definition of null model when there is no intercept JuliaStats/StatsAPI.jl#14

Merged

pbastide mentioned this pull request May 16, 2022

Update nulldeviance for models with no intercept JuliaPhylo/PhyloNetworks.jl#175

Merged

kleinschmidt reviewed May 19, 2022

View reviewed changes

ararslan reviewed May 19, 2022

View reviewed changes

src/lm.jl Outdated Show resolved Hide resolved

src/lm.jl Outdated Show resolved Hide resolved

src/lm.jl Outdated Show resolved Hide resolved

mousum-github and others added 3 commits May 20, 2022 17:06

Update src/lm.jl

060c452

Thanks for your suggestion. It has been committed. Co-authored-by: Alex Arslan <[email protected]>

Update src/lm.jl

259a3d0

Removed indexing as suggested. Co-authored-by: Alex Arslan <[email protected]>

used eachindex(y,wts) instead of i = 1:length(y) in `nulldeviance…

fe521c8

…` function + added one more test case without providing any formula in `lm` function

nalimilan reviewed May 22, 2022

View reviewed changes

test/runtests.jl Outdated Show resolved Hide resolved

src/lm.jl Outdated Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

mousum-github and others added 2 commits May 23, 2022 07:06

Update test/runtests.jl

48e5bb7

Co-authored-by: Milan Bouchet-Valat <[email protected]>

Update test/runtests.jl

7c52e8d

Removed `;`. The suggestion is commited. Co-authored-by: Milan Bouchet-Valat <[email protected]>

nalimilan reviewed May 23, 2022

View reviewed changes

nalimilan closed this Jun 11, 2022

nalimilan mentioned this pull request Jun 11, 2022

Implement nulldeviance and nullloglikelihood for GLMs #479

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative r2 for linear model with no intercept #481

Negative r2 for linear model with no intercept #481

mousum-github commented Apr 25, 2022 •

edited

Loading

codecov-commenter commented Apr 25, 2022 •

edited

Loading

ayushpatnaikgit commented Apr 27, 2022

ViralBShah commented Apr 27, 2022

ViralBShah commented Apr 27, 2022

nalimilan commented Apr 27, 2022

kleinschmidt left a comment

nalimilan commented May 20, 2022

nalimilan commented May 22, 2022

palday commented May 23, 2022

nalimilan May 23, 2022

nalimilan commented May 23, 2022

nalimilan commented Jun 11, 2022

Negative r2 for linear model with no intercept #481

Negative r2 for linear model with no intercept #481

Conversation

mousum-github commented Apr 25, 2022 • edited Loading

codecov-commenter commented Apr 25, 2022 • edited Loading

Codecov Report

ayushpatnaikgit commented Apr 27, 2022

ViralBShah commented Apr 27, 2022

ViralBShah commented Apr 27, 2022

nalimilan commented Apr 27, 2022

kleinschmidt left a comment

Choose a reason for hiding this comment

nalimilan commented May 20, 2022

nalimilan commented May 22, 2022

palday commented May 23, 2022

nalimilan May 23, 2022

Choose a reason for hiding this comment

nalimilan commented May 23, 2022

nalimilan commented Jun 11, 2022

mousum-github commented Apr 25, 2022 •

edited

Loading

codecov-commenter commented Apr 25, 2022 •

edited

Loading