report() Cohen's d for a t-test differs from output from lsr::cohensD and effectsize::cohens_d #183

leighclark · 2021-06-09T12:47:24Z

Question and context

I apologise if this is the wrong place to ask this and/or if I am doing something obviously stupid - I'm quite new to all this. I have two questions around the output of report() for a t-test.

This is the output from report()

> report(t.test(self_efficacy_mean ~ user_sex, paired = FALSE, alternative = "less", data = ratings2))
Effect sizes were labelled following Cohen's (1988) recommendations.

The Welch Two Sample t-test testing the difference of self_efficacy_mean by user_sex (mean in group female = 4.61, mean in group male = 5.14) suggests that the effect is positive, statistically significant, and medium (difference = 0.53, 95% CI [-Inf, -0.41], t(522.17) = -7.14, p < .001; Cohen's d = -0.63, 95% CI [-0.80, -0.45])

However the d value differs from that which I get from cohensD and cohens_d:

> cohensD(self_efficacy_mean ~ user_sex, method = "unequal", data = ratings2)
[1] 0.4844071

> cohens_d(self_efficacy_mean ~ user_sex, pooled_sd = FALSE, data = ratings2)
Cohen's d |         95% CI
--------------------------
-0.48     | [-0.62, -0.35]

- Estimated using un-pooled SD.

So my first question is, is this a bug or a user error?

My second question refers to the output from report(). Despite the t-test being alternative = "less", report() reports it as being positive and swaps the use_sex groups around. Is this expected behaviour?

Thanks in advance, and also thanks for the wonderful package - it's been very useful!

The text was updated successfully, but these errors were encountered:

IndrajeetPatil · 2021-06-09T15:32:58Z

To answer the first question, unfortunately, this is unavoidable.

The report() function internally uses effectsize::effectsize(), and not effectsize::cohens_d.

As you can see, the values match now:

library(report)
library(effectsize)

mod <- t.test(wt ~ as.factor(am), data = mtcars)

report(mod)
#> Effect sizes were labelled following Cohen's (1988) recommendations.
#> 
#> The Welch Two Sample t-test testing the difference of wt by as.factor(am) (mean in group 0 = 3.77, mean in group 1 = 2.41) suggests that the effect is negative, statistically significant, and large (difference = -1.36, 95% CI [0.85, 1.86], t(29.23) = 5.49, p < .001; Cohen's d = 2.03, 95% CI [1.13, 2.91])

effectsize(mod)
#> Warning in effectsize.htest(mod): Unable to retrieve data from htest object.
#> Using t_to_d() approximation.
#> d    |       95% CI
#> -------------------
#> 2.03 | [1.13, 2.91]

This is because cohens_d works with raw data, while effectsize works with model objects and needs to rely on an approximation.

cohens_d(wt ~ as.factor(am), data = mtcars)
#> Cohen's d |       95% CI
#> ------------------------
#> 1.89      | [1.03, 2.73]
#> 
#> - Estimated using pooled SD.

lsr::cohensD(wt ~ as.factor(am), data = mtcars)
#> [1] 1.892406

^{Created on 2021-06-09 by the reprex package (v2.0.0)}

mattansb · 2021-06-09T15:38:41Z

My second question refers to the output from report(). Despite the t-test being alternative = "less", report() reports it as being positive and swaps the use_sex groups around. Is this expected behaviour?

The "order" is consistent with the t.test() function:

effectsize::cohens_d(extra ~ group, paired = TRUE, data = sleep) # negative d value
#> Cohen's d |         95% CI
#> --------------------------
#> -1.28     | [-2.23, -0.44]

t.test(extra ~ group, paired = TRUE, data = sleep) # negative t value
#> 
#>  Paired t-test
#> 
#> data:  extra by group
#> t = -4.0621, df = 9, p-value = 0.002833
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -2.4598858 -0.7001142
#> sample estimates:
#> mean of the differences 
#>                   -1.58

This is because cohens_d works with raw data, while effectsize works with model objects and needs to rely on an approximation.

Adding to @IndrajeetPatil, when the data is recoverable, report will compute Cohen's d, and not just estimate it:

t1 <- t.test(extra ~ group, paired = TRUE, data = sleep)

t2 <- t.test(sleep$extra[sleep$group==1], sleep$extra[sleep$group==2], paired = TRUE)

report::report(t1)
#> Effect sizes were labelled following Cohen's (1988) recommendations.
#> 
#> The Paired t-test testing the difference of extra by group (mean of the differences = -1.58) suggests that the effect is negative, statistically significant, and large (difference = -1.58, 95% CI [-2.46, -0.70], t(9) = -4.06, p < .01; Cohen's d = -1.35, 95% CI [-2.23, -0.44])

report::report(t2)
#> Effect sizes were labelled following Cohen's (1988) recommendations.
#> 
#> The Paired t-test testing the difference between sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2] (mean of the differences = -1.58) suggests that the effect is negative, statistically significant, and large (difference = -1.58, 95% CI [-2.46, -0.70], t(9) = -4.06, p < .01; Cohen's d = -1.28, 95% CI [-2.23, -0.44])

^{Created on 2021-06-09 by the reprex package (v2.0.0)}

IndrajeetPatil · 2021-06-09T16:31:58Z

Thanks, Mattan. My train commute ended before I could get to the second question 😅

Closing for now. Please feel free to ask any further questions you may have about this.

leighclark · 2021-06-09T17:28:29Z

Thank you all for your responses, I will take a moment (several) to digest them :)

Much appreciated!

L.

leighclark · 2021-06-10T08:23:48Z

Once again thankyou @IndrajeetPatil and @mattansb for taking the time to respond. I've reproduced everything you said above.

I am not sure if this is the right place to post since the issue is closed (sorry if it's not correct) but my last question might be a bit more subjective:

Since there is a difference between the d values, would you consider the output from cohens_d() more "correct" than the default from report()? Would either of you have a preference if you were reporting your own results in a similar case?

Thanks again and all the best from Berlin!

IndrajeetPatil · 2021-06-10T08:29:07Z

Created a new issue to discuss this: #184

Thanks for your feedback! 😸

Hi from Berlin! 😅

mattansb · 2021-06-10T08:33:50Z

Since there is a difference between the d values, would you consider the output from cohens_d() more "correct" than the default from report()? Would either of you have a preference if you were reporting your own results in a similar case?

The output from cohens_d() too depends on the input:

library(effectsize)

t1 <- t.test(extra ~ group, paired = TRUE, data = sleep)

t2 <- t.test(sleep$extra[sleep$group==1], sleep$extra[sleep$group==2], paired = TRUE)

cohens_d(t1)
#> Warning in effectsize.htest(x, type = "d", ci = ci, verbose = verbose): Unable
#> to retrieve data from htest object. Using t_to_d() approximation.
#> d     |         95% CI
#> ----------------------
#> -1.35 | [-2.23, -0.44]

cohens_d(t2)
#> Cohen's d |         95% CI
#> --------------------------
#> -1.28     | [-2.23, -0.44]

cohens_d(extra ~ group, paired = TRUE, data = sleep)
#> Cohen's d |         95% CI
#> --------------------------
#> -1.28     | [-2.23, -0.44]

And as I noted above, report can return the true Cohen's d if the right t.test() is passed to it (t2 in the above example).

So this isn't an issue of effectsize vs report - it's an issue of "can the original data be recovered from the t.test() object.
When the data can be recovered, the results from report will not be an approximate.

Generally, we would like the true value, and not just an approximate (:

Conclusion: don't use formula notation in t.test()! This makes the original data un-recoverable!

IndrajeetPatil added the question ⁉️ Further information is requested label Jun 9, 2021

IndrajeetPatil closed this as completed Jun 9, 2021

IndrajeetPatil mentioned this issue Jun 10, 2021

clarifying when approximations are used by effectsize #184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

report() Cohen's d for a t-test differs from output from lsr::cohensD and effectsize::cohens_d #183

report() Cohen's d for a t-test differs from output from lsr::cohensD and effectsize::cohens_d #183

leighclark commented Jun 9, 2021

IndrajeetPatil commented Jun 9, 2021

mattansb commented Jun 9, 2021 •

edited

Loading

IndrajeetPatil commented Jun 9, 2021

leighclark commented Jun 9, 2021

leighclark commented Jun 10, 2021

IndrajeetPatil commented Jun 10, 2021

mattansb commented Jun 10, 2021

report() Cohen's d for a t-test differs from output from lsr::cohensD and effectsize::cohens_d #183

report() Cohen's d for a t-test differs from output from lsr::cohensD and effectsize::cohens_d #183

Comments

leighclark commented Jun 9, 2021

IndrajeetPatil commented Jun 9, 2021

mattansb commented Jun 9, 2021 • edited Loading

IndrajeetPatil commented Jun 9, 2021

leighclark commented Jun 9, 2021

leighclark commented Jun 10, 2021

IndrajeetPatil commented Jun 10, 2021

mattansb commented Jun 10, 2021

mattansb commented Jun 9, 2021 •

edited

Loading