Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate entry in results table if max_ll is the same across two model runs #36

Open
grst opened this issue Jun 23, 2023 · 0 comments
Open

Comments

@grst
Copy link

grst commented Jun 23, 2023

I have a case where a certain gene appears twice in the results table. This is annoying, because in that case I can't merge it back into an AnnData object.

FSV M g l max_delta max_ll max_mu_hat max_s2_t_hat model n s2_FSV s2_logdelta time BIC max_ll_null LLR pval qval
311 2.06039e-09 4 ENSG00000117090 54 4.85165e+08 1719.84 0.0110949 2.31245e-11 SE 2068 0.0197839 3.37435e+15 0.00366592 -3409.13 1719.84 -0.000104498 1 1
312 2.04339e-09 4 ENSG00000117090 181.915 4.85165e+08 1719.84 0.0110949 2.31245e-11 SE 2068 0.0194589 3.37435e+15 0.00116491 -3409.13 1719.84 -0.000104498 1 1

I think I tracked it down to

model_results = model_results[model_results.groupby(['g'])['max_ll'].transform(max) == model_results['max_ll']]

where the result from the model run with the max value for max_ll is chosen. In this case, the max_ll value is identical across two model runs, resulting in two values being chosen.

I'm unsure what the best solution is here. Just pick the first one?
The entries seem almost the same anyway, except for FSV and I values.

grst added a commit to grst/spatialtranscriptomics that referenced this issue Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant