You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
While running RepeatModeler, I am consistently getting error at the point where the de novo LTR sequences found by LtrRetriever are being aligned with MAFFT. The RepeatScout / Recon pipeline is working and those sequences are included in the final consensus sequences, but the LTR pipeline seemingly fails after LtrRetriever is complete and MAFFT does not run correctly, and therefore the LTR sequences are not included in the final consensus sequences. I have run RepeatModeler several times on the same data and received the same error message. I have attached a screenshot of the error message. Here is the full line with the error message:
Making blast database:
singularity run $dfam BuildDatabase -name DmelFixDfamDb GCA_000778455.1_CA_8.2_MHAP_genomic.fna
Running RepeatModeler:
nohup singularity run $dfam RepeatModeler -database DmelFixDfamDb -threads 20 -LTRStruct >& run2.out &
(or running without nohup, receive same error message:)
singularity run $dfam RepeatModeler -database PpecDfamDb -threads 20 -LTRStruct
Expected behavior
The final fasta files with consensus families should include both sequences from the Recon/RepeatScout pipeline and the LTR pipeline, but I am not getting any LTR families. Since this genome was used for benchmarking in the publication RepeatModeler2 was presented in, I know I should be expecting ~734 families, however I am only getting ~430 families whenever I run it.
Host system (please complete as much of the following information as you can find out):
This was run on a computing cluster on a linux operating system. More info:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Rocky
Description: Rocky Linux release 8.9 (Green Obsidian)
Release: 8.9
Codename: GreenObsidian
Singularity version: apptainer version 1.3.1-1.el8
The singularity container was downloaded on July 2, 2024
RepeatModeler was run on a computing cluster using 1 node, 4 cores, and 3G per core. Job efficiency:
CPU Utilized: 2-02:29:28
CPU Efficiency: 81.56% of 2-13:54:16 core-walltime
Job Wall-clock time: 15:28:34
Memory Utilized: 9.82 GB
Memory Efficiency: 81.83% of 12.00 GB
The text was updated successfully, but these errors were encountered:
I don't think this is a bug. It simply looks like you didn't give RepeatModeler/MAFFT enough memory and the cluster scheduler killed it off. I would recommend no less than 32GB of memory -- so in your case you would schedule with 8GB per core. Also, if you have the resources, I would increase this to 12-16 cpus for faster runtimes.
After increasing memory up to 128G (8 cores at 16G per core) I am now getting a slightly different error message at around the same step.
Error message from the standard output file:
LTR Structural Analysis
Running LtrHarvest... : 00:03:35 (hh:mm:ss) Elapsed Time
Running Ltr_retriever... : 00:07:12 (hh:mm:ss) Elapsed Time
Aligning instances... : 00:05:59 (hh:mm:ss) Elapsed Time
Clustering...LTRPipeline: Error - could not cluster MAFFT results.
: 00:00:00 (hh:mm:ss) Elapsed Time
LTRPipeline Time: 00:16:47 (hh:mm:ss) Elapsed Time
Error message from the standard error file:
LTRPipeline : Error - could not open /home/sjd028/AlpTeAnalysis/DmelTest_09-02-24/RM_137767.MonSep20852212024/LTR_368240.MonSep21441122024/clusters.dat! at /opt/RepeatModeler/LTRPipeline line 333.
Because I am getting a "cannot open" error I don't think this is related to memory.
I seem to be getting a similar issue as in bug report #241 so I tried editing all sequence identifiers in the genome to be 12 characters in length. Even with doing this I am getting the exact same error messages as with the 13+ character sequence identifiers.
Describe the bug
While running RepeatModeler, I am consistently getting error at the point where the de novo LTR sequences found by LtrRetriever are being aligned with MAFFT. The RepeatScout / Recon pipeline is working and those sequences are included in the final consensus sequences, but the LTR pipeline seemingly fails after LtrRetriever is complete and MAFFT does not run correctly, and therefore the LTR sequences are not included in the final consensus sequences. I have run RepeatModeler several times on the same data and received the same error message. I have attached a screenshot of the error message. Here is the full line with the error message:
/opt/mafft/bin/mafft: line 2718: 2108323 Killed "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreads-$numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -g $gexp -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg $anchoropt -x $maxanchorseparation $oneiterationopt < infile > pre 2>> "$progressfile"
To Reproduce
Genome I used: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000778455.1/
Making blast database:
singularity run $dfam BuildDatabase -name DmelFixDfamDb GCA_000778455.1_CA_8.2_MHAP_genomic.fna
Running RepeatModeler:
nohup singularity run $dfam RepeatModeler -database DmelFixDfamDb -threads 20 -LTRStruct >& run2.out &
(or running without nohup, receive same error message:)
singularity run $dfam RepeatModeler -database PpecDfamDb -threads 20 -LTRStruct
Expected behavior
The final fasta files with consensus families should include both sequences from the Recon/RepeatScout pipeline and the LTR pipeline, but I am not getting any LTR families. Since this genome was used for benchmarking in the publication RepeatModeler2 was presented in, I know I should be expecting ~734 families, however I am only getting ~430 families whenever I run it.
Host system (please complete as much of the following information as you can find out):
This was run on a computing cluster on a linux operating system. More info:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Rocky
Description: Rocky Linux release 8.9 (Green Obsidian)
Release: 8.9
Codename: GreenObsidian
Singularity version: apptainer version 1.3.1-1.el8
The singularity container was downloaded on July 2, 2024
RepeatModeler was run on a computing cluster using 1 node, 4 cores, and 3G per core. Job efficiency:
CPU Utilized: 2-02:29:28
CPU Efficiency: 81.56% of 2-13:54:16 core-walltime
Job Wall-clock time: 15:28:34
Memory Utilized: 9.82 GB
Memory Efficiency: 81.83% of 12.00 GB
The text was updated successfully, but these errors were encountered: