Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input query was recognized as database #113

Open
cv-1993 opened this issue Jan 24, 2025 · 6 comments
Open

Input query was recognized as database #113

cv-1993 opened this issue Jan 24, 2025 · 6 comments

Comments

@cv-1993
Copy link

cv-1993 commented Jan 24, 2025

Hi Jaebeom Kim,

Thanks for developing Metabuli.

I've just installed Metabuli version 1.0.9.2 using Mamba environment and downloaded the pre-built database refseq 224.
However, when I tried to run Metabuli, it seemed like the second read was recognized as the database. Please see the detail below:

Here is the command: metabuli classify rawReads/L017_1.fastq.gz rawReads/L017_2.fastq.gz databases/Metabuli/refseq224 metabuli --threads 90 --max-ram 700

Here is the output:

MMseqs Version:                                         1.0.9.2
Threads                                                 90
Sequencing type                                         2
Min. sequence similarity score                          0
Min. query coverage                                     0
Min. num. of cons. matches for non-euk. classification  4
Min. num. of cons. matches for euk. classification      9
Min. score for species- or lower-level classification.  0
Allowed extra Hamming distance                          0
Directory where the taxonomy dump files are stored
Mask residues                                           0
Mask residues probability                               0.9
RAM usage in GiB                                        700
Number of matches per query k-mer.                      4
Accession-level DB build/search                         0
Best * --tie-ratio is considered as a tie               0.95
Not storing k-mer's redundancy. Keep it as 1.           0
Print lineage information                               0

Input database "rawReads/L017_2.fastq.gz" has the wrong type (Generic)
Allowed input:
- Directory

Thanks for your help to solve this issue.

@jaebeom-kim
Copy link
Member

jaebeom-kim commented Jan 24, 2025 via email

@cv-1993
Copy link
Author

cv-1993 commented Jan 24, 2025

@jaebeom-kim Thanks for a quick response.
I've added jobid: metabuli classify rawReads/L017_1.fastq.gz rawReads/L017_2.fastq.gz databases/Metabuli/refseq224 metabuli --threads 90 --max-ram 700

However, I still got the same error:
`MMseqs Version: 1.0.9.2
Threads 90
Sequencing type 2
Min. sequence similarity score 0
Min. query coverage 0
Min. num. of cons. matches for non-euk. classification 4
Min. num. of cons. matches for euk. classification 9
Min. score for species- or lower-level classification. 0
Allowed extra Hamming distance 0
Directory where the taxonomy dump files are stored
Mask residues 0
Mask residues probability 0.9
RAM usage in GiB 700
Number of matches per query k-mer. 4
Accession-level DB build/search 0
Best * --tie-ratio is considered as a tie 0.95
Not storing k-mer's redundancy. Keep it as 1. 0
Print lineage information 0

Input database "./rawReads/L017_2.fastq.gz" has the wrong type (Generic)
Allowed input:

  • Directory`

@jaebeom-kim
Copy link
Member

jaebeom-kim commented Jan 24, 2025 via email

@cv-1993
Copy link
Author

cv-1993 commented Jan 27, 2025

@jaebeom-kim
Thanks. The run started after adding JobID.
However, the run crashed after ~10min without any error. It seemed to require higher computational power.
We have 750 GB in RAM and 96 CPU cores. Isn't it sufficient?

Thanks.

@jaebeom-kim
Copy link
Member

Your computing resource is enough. Could you send me all the printed logs? It helps me find where the error happens.

@cv-1993
Copy link
Author

cv-1993 commented Jan 27, 2025

@jaebeom-kim

classify ./rawReads/L017_1.fastq.gz ./rawReads/L017_2.fastq.gz databases/Metabuli/refseq224 metabuli L017 --threads 90 --max-ram 700

Metabuli Version (commit):                              1.0.9.2
Threads                                                 90
Sequencing type                                         2
Min. sequence similarity score                          0
Min. query coverage                                     0
Min. num. of cons. matches for non-euk. classification  4
Min. num. of cons. matches for euk. classification      9
Min. score for species- or lower-level classification.  0
Allowed extra Hamming distance                          0
Directory where the taxonomy dump files are stored
Mask residues                                           0
Mask residues probability                               0.9
RAM usage in GiB                                        700
Number of matches per query k-mer.                      4
Accession-level DB build/search                         0
Best * --tie-ratio is considered as a tie               0.95
Not storing k-mer's redundancy. Keep it as 1.           0
Print lineage information                               0

DB name: refseq_release_224_prokaryote_virus_human
DB creation date: 2024-9-27
Loading the list for taxonomy IDs ... Done
Indexing query file ...Done
Total number of sequences: 67123704
Total read length: 20271358608nt
Extracting query metamers ...
Time spent for metamer extraction: 28
Sorting query metamer list ...
Time spent for sorting query metamer list: 14
Comparing query and reference metamers...
--match-per-kmer was increased to 8 and searching again...
Extracting query metamers ...
Time spent for metamer extraction: 17
Sorting query metamer list ...
Time spent for sorting query metamer list: 10
Comparing query and reference metamers...
--match-per-kmer was increased to 12 and searching again...
Extracting query metamers ...
Time spent for metamer extraction: 11
Sorting query metamer list ...
Time spent for sorting query metamer list: 6
Comparing query and reference metamers...

Up to this point, the run crashed and the terminal just shut down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants