-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elr/rel scores #69
Elr/rel scores #69
Conversation
#' `metrics` and should only include proper scores (e.g., it should not contain | ||
#' interval coverage metrics). If `NULL` (the default), no relative metrics | ||
#' will be computed. Relative metrics are only computed if `summarize = TRUE`, | ||
#' and require that `"model_id"` is included in `by`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the alternative to enforcing this requirement is that we add another argument along the lines of compare
as is used in scoringutils::get_pairwise_comparisons
, setting a default of "model_id"
. I think that would be fine, but since essentially all use cases of this function will include "model_id"
in by
, I don't think it's necessarily worth introducing the extra argument here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current approach is fine since it would be caught by the validation. The problem with extra arguments that affect other arguments is that it becomes difficult for users to remember the relationships between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start. That being said, I did not yet look at the tests because there is a lot going on there. I will take a look after I get back from lunch.
I did make some suggestions to simplify the validation function.
#' `metrics` and should only include proper scores (e.g., it should not contain | ||
#' interval coverage metrics). If `NULL` (the default), no relative metrics | ||
#' will be computed. Relative metrics are only computed if `summarize = TRUE`, | ||
#' and require that `"model_id"` is included in `by`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current approach is fine since it would be caught by the validation. The problem with extra arguments that affect other arguments is that it becomes difficult for users to remember the relationships between them.
|
||
if (length(relative_metrics) > 0 && !"model_id" %in% by) { | ||
cli::cli_abort( | ||
"Relative metrics require 'model_id' to be included in {.arg by}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this strictly necessary? If we know that "model_id' always needs to be included in by
we can just put it there.
On top of that, we also filter out "model_id" in line 140 by = by[by != "model_id"],
(but also it's kind of required in line 145, scores <- scoringutils::summarize_scores(scores = scores, by = by)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't manually checked out the pr and run the code, so this is just from a cursory reading on github - let me know if you'd like me to dig deeper, happy to make a suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with a cursory review on the level of "this is reasonable or not", given that Zhian is also reviewing.
Regarding your comment above: I don't think this is strictly necessary, but my general preference is to throw errors guiding users towards what we're expecting rather than modify their inputs. I think all of this is clear to you already, but just to say it:
- The hard-coded use of
compare = "model_id"
in the call toadd_relative_skill
means that we are getting results that are broken down by model, but we're not allowed to include the thing we specify forcompare
in theby
argument to that function. - This also means that when we
scoringutils::summarize_scores(scores = scores, by = by)
, we need to have"model_id"
in theby
argument for the results to make sense - That means that there is a general situation where the
by
arguments toadd_relative_skill
andsummarize_scores
have to be different; we will always need to either drop the"model_id"
entry fromby
in the call toadd_relative_skill
or add it toby
in the call toscore_model_out
. - I think it'll be clearer to users if for purposes of
hubEvals::score_model_out
,by
is always expected to be the vector of names of variables by which scores are disaggregated in the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really appreciated your descriptions of the expected errors in the comments!
I went through the tests and while they work, they are complex and will be painful to debug later on.
The bottom line is that the majority of the code in these tests should be encapsulated in test fixtures.
The most complicated expected score table here amounts to 6 rows and 10 columns, which is trivial for a human to read, even with a git diff. I would recommend storing it as a csv file in tests/testthat/fixtures/
instead of having nearly 70 lines of code run every time you want to generate it.
Other than that, there was test noise from warnings (from wilcox.test, which cannot be helped) and I proposed a simplification of the equality tests.
compare data frames directly Co-authored-by: Zhian N. Kamvar <[email protected]>
Thanks for the review @zkamvar! Here's a summary of my responses:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
fixes #66