Replies: 7 comments 1 reply
-
Hi Stephen, thank you for the question! Kraken2 database construction requires two taxonomy files, names.dmp and nodes.dmp. For the bacterial and archaeal portions of the taxonomy, we used the ncbi_names.dmp and ncbi_nodes.dmp available at this link from the HumGut paper - https://arken.nmbu.no/~larssn/humgut/. Based on the HumGut paper, and the ncbi_tax_id column of the HumGut.tsv file on that website, all the 30,691 HumGut genomes were given an NCBI taxonomy assignment; in the taxonomy files, the genomes are placed as "direct descendants" of the given assignment. |
Beta Was this translation helpful? Give feedback.
-
Do you think it would be possible to provide a version that uses the latest GTDB taxonomy? I suspect other users would use this as well. Thanks! |
Beta Was this translation helpful? Give feedback.
-
HumGut was also provided with GTDB taxonomy (GTDB-Tk, release 05-RS95), so it should be possible to make a Phanta db with the latest GTDB taxonomy. Will update. |
Beta Was this translation helpful? Give feedback.
-
RS95 is two releases and two years behind (July 2020) with the newer
releases having much better species-level coverage of gut bacteria. But I
understand using that one is easier. You could likely obtain the latest
version for all the HumGut genomes by downloading taxonomy files from the
UHGG and GTDB websites. Thanks!
…On Thu, Sep 8, 2022 at 3:50 PM yipinto ***@***.***> wrote:
HumGut was also provided with GTDB taxonomy (GTDB-Tk, release 05-RS95), so
it should be possible to make a Phanta db with the latest GTDB taxonomy.
Will update.
—
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQBXLMQCOWDDCOQA4GWEC3V5JUTZANCNFSM6AAAAAAQH3LA5U>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
No urgency on a solution, but would you mind leaving the issue open as it has not been solved? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Thanks Stephen! We have now made the version of the DB that uses GTDB taxonomy for the HumGut genomes. For the time being, we have used the GTDB taxonomy provided by the HumGut authors, but can definitely revisit this / re-annotate with the latest version of GTDB in the future if there is demand. Thank you for the request and we will update here when we have a download link! |
Beta Was this translation helpful? Give feedback.
-
Hi Stephen, |
Beta Was this translation helpful? Give feedback.
-
Could you please clarify the taxonomy used for the Bacteria and Archaea in your database. I understand the mapping is done to the HumGut genomes, but how are those annotated? NCBI or GTDB? And if GTDB, which version? If not an GTDB, could that be provided? Also how are unannotated species clusters handled? Are they represented in the taxonomy or only counted towards the lowest annotated rank? Thanks for answering all my questions.
Beta Was this translation helpful? Give feedback.
All reactions