Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing two_sample default noise_model #140

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

dburkhardt
Copy link

default noise_model in the two_sample helper function now matches the docstring and other testing methods

default noise_model in the two_sample helper function now matches the docstring and other testing methods
@davidsebfischer
Copy link
Contributor

Thanks for the PR @dburkhardt! I put this as None because t-test complains if it gets a noise model that is not gaussian. This way, t-test can always be run easily and people have to choose (I considered this more advanced) if they use a wald test. Happy for feedback about this design choice! I could also change the docstring to make this clear.

@dburkhardt
Copy link
Author

Hmm okay, I think I'm seeing the issue here. So my guess is that for cases where you're comparing two samples of scRNA-seq, "nb" is the correct noise model for the Wald test.

If you're comparing two clusters, I think none of these tests will yield useful p-values because clustering introduces differences between partitions by design (https://linkinghub.elsevier.com/retrieve/pii/S2405471219302698).

What do you think here? If "nb" is the correct noise model for two_sample comparisons (i.e. comparisons of independently generated sets of cells), then why not have that set by default?

@davidsebfischer
Copy link
Contributor

My take on the different tests is that they represent different assumptions on the data distribution and the necessity / way of inclusion of confounding variables. I agree with this

What do you think here? If "nb" is the correct noise model for two_sample comparisons (i.e. comparisons of independently generated sets of cells), then why not have that set by default?

But I would translate it to, if one choses a wald test, then "nb" is set as the default noise model. I can do that internally in the two_sample function. Then the default choise for noise_model is None which is ok for t-tests and which is changed to nb if wald test is chosen. Does match what your intuition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants