Elr/rel scores #69

elray1 · 2025-01-03T19:02:05Z

fixes #66

…g hubExamples data objects

github-actions · 2025-01-03T19:03:40Z

🚀 Deployed on https://677d76ac2d1b650b7b43c4f2--hubevals-pr-previews.netlify.app

elray1 · 2025-01-03T19:13:10Z

R/score_model_out.R

+#' `metrics` and should only include proper scores (e.g., it should not contain
+#' interval coverage metrics).  If `NULL` (the default), no relative metrics
+#' will be computed.  Relative metrics are only computed if `summarize = TRUE`,
+#' and require that `"model_id"` is included in `by`.


I think the alternative to enforcing this requirement is that we add another argument along the lines of compare as is used in scoringutils::get_pairwise_comparisons, setting a default of "model_id". I think that would be fine, but since essentially all use cases of this function will include "model_id" in by, I don't think it's necessarily worth introducing the extra argument here?

I think the current approach is fine since it would be caught by the validation. The problem with extra arguments that affect other arguments is that it becomes difficult for users to remember the relationships between them.

zkamvar

This is a good start. That being said, I did not yet look at the tests because there is a lot going on there. I will take a look after I get back from lunch.

I did make some suggestions to simplify the validation function.

R/validate.R

R/score_model_out.R

zkamvar · 2025-01-03T20:36:38Z

R/score_model_out.R

+#' `metrics` and should only include proper scores (e.g., it should not contain
+#' interval coverage metrics).  If `NULL` (the default), no relative metrics
+#' will be computed.  Relative metrics are only computed if `summarize = TRUE`,
+#' and require that `"model_id"` is included in `by`.


I think the current approach is fine since it would be caught by the validation. The problem with extra arguments that affect other arguments is that it becomes difficult for users to remember the relationships between them.

Co-authored-by: Zhian N. Kamvar <[email protected]>

nikosbosse · 2025-01-04T09:57:54Z

R/validate.R

+
+  if (length(relative_metrics) > 0 && !"model_id" %in% by) {
+    cli::cli_abort(
+      "Relative metrics require 'model_id' to be included in {.arg by}."


Is this strictly necessary? If we know that "model_id' always needs to be included in by we can just put it there.
On top of that, we also filter out "model_id" in line 140 by = by[by != "model_id"],
(but also it's kind of required in line 145, scores <- scoringutils::summarize_scores(scores = scores, by = by)

I haven't manually checked out the pr and run the code, so this is just from a cursory reading on github - let me know if you'd like me to dig deeper, happy to make a suggestion

I'm good with a cursory review on the level of "this is reasonable or not", given that Zhian is also reviewing.

Regarding your comment above: I don't think this is strictly necessary, but my general preference is to throw errors guiding users towards what we're expecting rather than modify their inputs. I think all of this is clear to you already, but just to say it:

The hard-coded use of compare = "model_id" in the call to add_relative_skill means that we are getting results that are broken down by model, but we're not allowed to include the thing we specify for compare in the by argument to that function.

This also means that when we scoringutils::summarize_scores(scores = scores, by = by), we need to have "model_id" in the by argument for the results to make sense

That means that there is a general situation where the by arguments to add_relative_skill and summarize_scores have to be different; we will always need to either drop the "model_id" entry from by in the call to add_relative_skill or add it to by in the call to score_model_out.

I think it'll be clearer to users if for purposes of hubEvals::score_model_out, by is always expected to be the vector of names of variables by which scores are disaggregated in the result.

zkamvar

I really appreciated your descriptions of the expected errors in the comments!

I went through the tests and while they work, they are complex and will be painful to debug later on.

The bottom line is that the majority of the code in these tests should be encapsulated in test fixtures.

The most complicated expected score table here amounts to 6 rows and 10 columns, which is trivial for a human to read, even with a git diff. I would recommend storing it as a csv file in tests/testthat/fixtures/ instead of having nearly 70 lines of code run every time you want to generate it.

Other than that, there was test noise from warnings (from wilcox.test, which cannot be helped) and I proposed a simplification of the equality tests.

tests/testthat/test-score_model_out_rel_metrics.R

compare data frames directly Co-authored-by: Zhian N. Kamvar <[email protected]>

elray1 · 2025-01-07T18:53:51Z

Thanks for the review @zkamvar! Here's a summary of my responses:

used your suggested simplification of checks for data frame equality; thanks for that
after some floundering in 36a810f which was later overwritten in 1718bcd, the expected scores are now saved as a csv file under testthat/testdata.
i'm proposing to not add the expect_warning wrappers, because: (a) these aren't really warnings that I would "expect" from this code; the fact that the warnings are currently being thrown is not an intended behavior; (b) I've filed an issue to address the situation that results in these warnings at scoringutils which I expect will be resolved soon.

zkamvar

This looks great!

elray1 added 3 commits January 3, 2025 10:27

remove snapshots of hubExamples data for tests, instead directly usin…

120cd25

…g hubExamples data objects

add functionality for relative metrics

4eeb8ed

Merge branch 'main' into elr/rel_scores

0234249

elray1 added 2 commits January 3, 2025 14:04

update docs

1d81151

don't use tidyr in test

94d34cc

elray1 commented Jan 3, 2025

View reviewed changes

elray1 added 2 commits January 3, 2025 14:14

fix typo

462adb6

add some whitespace

a6f22d1

elray1 requested a review from nikosbosse January 3, 2025 19:20

zkamvar requested changes Jan 3, 2025

View reviewed changes

elray1 and others added 3 commits January 3, 2025 16:10

Apply suggestions from code review

bec6883

Co-authored-by: Zhian N. Kamvar <[email protected]>

add comment about validations done by add_relative_skill

c233d32

change place where we document validation of baseline

4afd00a

nikosbosse reviewed Jan 4, 2025

View reviewed changes

nikosbosse approved these changes Jan 4, 2025

View reviewed changes

zkamvar requested changes Jan 6, 2025

View reviewed changes

elray1 and others added 3 commits January 7, 2025 11:12

Apply suggestions from code review

4cd6145

compare data frames directly Co-authored-by: Zhian N. Kamvar <[email protected]>

move get_pariwise_scores_by_loc to helper file

36a810f

refactor expected outputs for pairwise score tests into a test fixture

1718bcd

zkamvar approved these changes Jan 7, 2025

View reviewed changes

elray1 merged commit 040d000 into main Jan 7, 2025
8 checks passed

elray1 deleted the elr/rel_scores branch January 7, 2025 20:51

elray1 restored the elr/rel_scores branch January 8, 2025 02:47

elray1 deleted the elr/rel_scores branch January 8, 2025 02:48

zkamvar mentioned this pull request Jan 8, 2025

support length 1 arrays and relative metrics hubverse-org/hubPredEvalsData#11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elr/rel scores #69

Elr/rel scores #69

elray1 commented Jan 3, 2025 •

edited

Loading

github-actions bot commented Jan 3, 2025 •

edited

Loading

elray1 Jan 3, 2025 •

edited

Loading

zkamvar Jan 3, 2025

zkamvar left a comment

zkamvar Jan 3, 2025

nikosbosse Jan 4, 2025

nikosbosse Jan 4, 2025

elray1 Jan 6, 2025

zkamvar left a comment

elray1 commented Jan 7, 2025 •

edited

Loading

zkamvar left a comment

Elr/rel scores #69

Elr/rel scores #69

Conversation

elray1 commented Jan 3, 2025 • edited Loading

github-actions bot commented Jan 3, 2025 • edited Loading

elray1 Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

zkamvar Jan 3, 2025

Choose a reason for hiding this comment

zkamvar left a comment

Choose a reason for hiding this comment

zkamvar Jan 3, 2025

Choose a reason for hiding this comment

nikosbosse Jan 4, 2025

Choose a reason for hiding this comment

nikosbosse Jan 4, 2025

Choose a reason for hiding this comment

elray1 Jan 6, 2025

Choose a reason for hiding this comment

zkamvar left a comment

Choose a reason for hiding this comment

elray1 commented Jan 7, 2025 • edited Loading

zkamvar left a comment

Choose a reason for hiding this comment

elray1 commented Jan 3, 2025 •

edited

Loading

github-actions bot commented Jan 3, 2025 •

edited

Loading

elray1 Jan 3, 2025 •

edited

Loading

elray1 commented Jan 7, 2025 •

edited

Loading