Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Assembled Mitochondrial Genome Output for Some Metagenomes #226

Open
jpinus opened this issue Jan 10, 2025 · 4 comments
Open

Error in Assembled Mitochondrial Genome Output for Some Metagenomes #226

jpinus opened this issue Jan 10, 2025 · 4 comments

Comments

@jpinus
Copy link

jpinus commented Jan 10, 2025

on which platform/server? (Windows? Windows Sublinux? MacOS? Ubuntu? etc.)

linux HPC

MitoZ version?

mitoz 3.6

How did you install MitoZ? (e.g. Docker, Udocker, Singularity, Conda-Pack, Conda, or source code)

conda

Did you run a test after your installation, and was the test run okay?

sure

How much data (roughly) did you use for mitogenome assembly? e.g. 5Gbp?

assembled mitochondrial genomes via NOVOPlasty

The command you used?

mitoz annotate --outprefix ${sample} --fastafiles simplified_${sample}.fasta --thread_number 20 --clade Annelida-segmented-worms

Problem description

I'm currently using MitoZ to assemble mitochondrial genomes from a set of 155 metagenomes. While the assembly worked perfectly for 140 of the metagenomes, I’m encountering an issue with the remaining samples. Even for those where the mitochondrial genomes were circularized, the assembly is incomplete or contains errors, such as misaligned or missing genes.

I've checked my input data for quality, and there don't seem to be any issues with it. I'd appreciate any guidance on how to resolve this.
Out of the 155 metagenomes, 140 worked perfectly, but the others, including some with circularized mitochondrial genomes, are giving inconsistent or incorrect results.

Log messages from MitoZ (stdout and stderr, e.g., both m.log and m.err files)

2025-01-10 10:32:22,691 - mitoz.utility.utility - INFO -
combine_annotations_and_find_control_region() chdir to /opt/extern/bremen/symbiosis/jkiefer/P6960/04_mtDNA/anno/6960_AU/tmp_6960_AU_simplified_6960_AU.fasta_mitoscaf.fa
Traceback (most recent call last):
File "/opt/share/software/packages/mitoz-3.6/conda-env/bin/mitoz", line 10, in
sys.exit(main())
File "/opt/share/software/packages/mitoz-3.6/conda-env/lib/python3.8/site-packages/mitoz/MitoZ.py", line 99, in main
args.func(args)
File "/opt/share/software/packages/mitoz-3.6/conda-env/lib/python3.8/site-packages/mitoz/annotate/annotation.py", line 680, in main
tbl_file, errorsummary_val_file, tbl2asn_gbf, summary_file = combine_annotations_and_find_control_region(
File "/opt/share/software/packages/mitoz-3.6/conda-env/lib/python3.8/site-packages/mitoz/annotate/annotation.py", line 412, in combine_annotations_and_find_control_region
if file_not_empty(mt_file_cdsft):
File "/opt/share/software/packages/mitoz-3.6/conda-env/lib/python3.8/site-packages/mitoz/utility/utility.py", line 55, in file_not_empty
if os.stat(file).st_size > 0:
FileNotFoundError: [Errno 2] No such file or directory: '6960_AU_simplified_6960_AU.fasta_mitoscaf.fa.cds.ft'

@linzhi2013
Copy link
Owner

Hi, you are annotating mitochondrial genomes using MitoZ, instead of assembling.

I do not know what was going on there based on the log you provided. But if there are some unannotated PCGs, you can extend the database (https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ-s-database)

Best

@jpinus
Copy link
Author

jpinus commented Jan 12, 2025

@linzhi2013 yes, because the coverage for the assembling is not high enough and always failed. That's why I do the assembly with NOVOPlast and the annotation of the assembled mitochondrial genome with mitoZ - which worked with exactly the same script for >140 mitochondrial genomes, but unfortunately failed in 6 cases. And the error message for these six cases is always that this one specific file was not found.
I also used mitos2 for annotation and it worked. So the genes are all there, the genomes are complete.

@linzhi2013
Copy link
Owner

The *_mitoscaf.fa.cds.ft file was not generated, so no PCGs were annotated. (1) the seq id cannot be too long; (2) these species are too divergent from the PCG database.

@jpinus
Copy link
Author

jpinus commented Jan 13, 2025

@linzhi2013 but *_mitoscaf.fa.cds.position and *_mitoscaf.fa.cds.position.sorted were generated:

cat mtDNA_mitoscaf.fa.cds.position.sorted
mtDNA COX3 259 1 259 1124 1900 +
mtDNA ATP6 228 1 226 2407 3075 +
mtDNA ND3 117 5 117 3093 3419 +
mtDNA ND6 155 1 154 3422 3877 +
mtDNA ND5 569 12 453 3950 5638 +
mtDNA ND4L 98 23 88 5816 6112 +
mtDNA ND4 447 104 412 6112 7449 +
mtDNA ND2 335 5 334 8654 9625 +
mtDNA ND1 308 22 302 9971 10882 +
mtDNA COX2 228 1 228 11864 12547 +
mtDNA COX1 510 1 505 13955 15484 +
mtDNA ATP8 49 1 44 15593 15748 +
mtDNA CYTB 379 3 364 15971 17059 +

so mitoZ is annotating - these are the genes I want and need for mitoz-tools group_seq_by_gene
of course I could go manuelly there and extract the genes, but i don't get why the pipeline is crashing.

here are all the files which are located in the tmp_*_mitoscaf.fa dir:
*_mitoscaf.fa
*_mitoscaf.fa.cds.position
*_mitoscaf.fa.cds.position.sorted
*_mitoscaf.fa.l-rRNA.ft
*_mitoscaf.fa.l-rRNA.out
*_mitoscaf.fa.l-rRNA.tbl
*_mitoscaf.fa.most_related_species.txt
*_mitoscaf.fa.njs
*_mitoscaf.fa.s-rRNA.ft
*_mitoscaf.fa.s-rRNA.out
*_mitoscaf.fa.s-rRNA.tbl
*_mitoscaf.fa.solar.genewise.gff.cds.position.cds
*_mitoscaf.fa.solar.genewise.gff.cds.position.cds.taxa
*_mitoscaf.fa.solar.genewise.gff.pep
*_mitoscaf.fa.trna
*_mitoscaf.fa.trna.ft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants