You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the mean line in a chart is calculated from the mean of the value field (give or take the fix_after_n option).
This is fine for count values, or for rates (%s or fractions etc) that have a static denominator.
However if you pass in a rate as the value field, and that rate is based on a varying denominator, the mean generated by ptd_spc_standard() will just be the mean of the rates. Which is likely not the overall mean rate.
This issue is probably out of scope - what else is the package supposed to do?! And it is kind of the user's responsibility to pass in the right kind of data and to validate the outputs. But still.
One potential action is to add a caveat in the documentation to remind users that the mean calculation for rate data is just the mean of the rates and therefore may have some error.
Another solution is that ptd_spc could gain an argument, for a user-supplied pre-calculated mean column in the data. This would be a breaking change to the user interface.
Here's a reprex for a situation where an event happens 18 times (numerator) out of a possible 90 (denominator). The mean rate for the year is 0.2 (or 20%). But the mean of the rates is 0.216 (which is used for the mean line in the chart below).
tibble::tibble(
date= seq.Date(as.Date("2023-01-01"), as.Date("2023-12-01"), "1 month"),
num= rep(seq(2), 6),
dnm= rep(c(4, 11), 6)
) |>dplyr::summarise(rate=num/dnm, .by="date") |>NHSRplotthedots::ptd_spc("rate", "date")
#> Warning in ptd_add_short_group_warnings(.): Some groups have 'n < 12'#> observations. These have trial limits, which will be revised with each#> additional observation until 'n = fix_after_n_points' has been reached.
Potentially here I could pass in my pre-calculated mean of 0.2, which PTD would have then used instead.
Like this:
tibble::tibble(
date= seq.Date(as.Date("2023-01-01"), as.Date("2023-12-01"), "1 month"),
num= rep(seq(2), 6),
dnm= rep(c(4, 11), 6)
) |>dplyr::mutate(overall_mean= sum(num) / sum(dnm)) |>dplyr::summarise(rate=num/dnm, .by= c("date", "overall_mean")) |># using a data column allows for facets and rebasing# but the user would have to manually allow for theseNHSRplotthedots::ptd_spc("rate", "date", mean_field="overall_mean")
The text was updated successfully, but these errors were encountered:
Currently the mean line in a chart is calculated from the mean of the value field (give or take the
fix_after_n
option).This is fine for count values, or for rates (%s or fractions etc) that have a static denominator.
However if you pass in a rate as the value field, and that rate is based on a varying denominator, the mean generated by
ptd_spc_standard()
will just be the mean of the rates. Which is likely not the overall mean rate.This issue is probably out of scope - what else is the package supposed to do?! And it is kind of the user's responsibility to pass in the right kind of data and to validate the outputs. But still.
One potential action is to add a caveat in the documentation to remind users that the mean calculation for rate data is just the mean of the rates and therefore may have some error.
Another solution is that
ptd_spc
could gain an argument, for a user-supplied pre-calculated mean column in the data. This would be a breaking change to the user interface.Here's a reprex for a situation where an event happens 18 times (numerator) out of a possible 90 (denominator). The mean rate for the year is 0.2 (or 20%). But the mean of the rates is 0.216 (which is used for the mean line in the chart below).
Potentially here I could pass in my pre-calculated mean of 0.2, which PTD would have then used instead.
Like this:
The text was updated successfully, but these errors were encountered: