Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groot-db output issues #40

Open
LeonardosMageiros opened this issue May 7, 2020 · 2 comments
Open

groot-db output issues #40

LeonardosMageiros opened this issue May 7, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@LeonardosMageiros
Copy link

Hi!

I am exploring the option of using groot-db as it combines 3 well used AMR databases.
I believe that this is a very good idea but i am facing some issues that I believe is good to address.

Here is a typical output that I have from my results:


C_RESFINDER__erm(F)3_M17808 211 801 762M39D
groot-db_CARD__gb|GQ342996|+|797-1793|ARO:3003097|CfxA6 346 997 38D948M11D
groot-db_ARGANNOT
_(Bla)cfxA6:GQ342996:798-1793:966 346 996 38D948M10D
groot-db_RESFINDER__tet(Q)4_Z21523 194 1926 12D1850M64D
groot-db_ARGANNOT
_(Tet)TetQ:Z21523:362-2287:1926 197 1974 1910M64D

It is clear that entries 2-3 and 4-5 are duplicates. Same gene (maybe different allele?) presented 2 times in the report. This makes parsing and summarizing the results quite tricky to handle.
Can you see any way to tackle that?

Also the format of each entry is dependent from the database of origin. So the first column is different for CARD, ARGANNOT and RESFINDER. This is also a bit confusing and difficult to handle.
Do you think that you could homogenize that? if not maybe give a description of the format for each different DB in the report files?

Please let me know what you think.
Thank you in advance
Leonardos

@LeonardosMageiros
Copy link
Author

LeonardosMageiros commented May 7, 2020

To add one additional thing I noticed: in one of my output files I have the following entry:
groot-db_CARD__gb|NC_000913.3|-|484425-485619|ARO:3004043|Escherichia 26 1194 460M9D719M6D

which in card corresponds to that gene:
https://card.mcmaster.ca/ontology/41090

Notice that in the end of the first column only the word Escherichia appears instead of Escherichia coli acrA

Maybe that is something else that needs to be fixed?

Best
Leonardos

@will-rowe
Copy link
Owner

Hi Leonardos

This is great feedback - thank you! As you have spotted, I did no curation when I merged those databases. This would definitely be something to re-visit. The databases in general could do with a bit of TLC.

I will endeavour to get around to this asap, but things are pretty busy for me at the moment so no promises on when I'll be able to do this by!

By the way and in case you didn't see it, a new version of groot is in conda as of yesterday - this version is much more efficient than previous versions so I recommend updating to it if you haven't already

@will-rowe will-rowe self-assigned this May 7, 2020
@will-rowe will-rowe added the enhancement New feature or request label May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants