You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Repatmodeler have been great since the very first version (been using it from 10Mb to 5Gb genomes which is fine). However, for some odd reason when annotating this fungal genome (˜42Mb), the rmblastn just got stuck at round 2. We have been trying this on:
rmblastn precompiled 2.14.1
rmblastn compiled from source 2.14.1
rmblastn compiled from source 2.14.0
This is a 30kb TA-rich region. There are some internal tandem repetitions but no overall discernible pattern. Owing to the size (and the several copies present) this may be a satellite region. It's certainly not a TE sequence. But your original question is why does it take so long to align. This is a 30kb sequence of mostly two bases being searched against a handful of 30kb sequences with similar composition. At word-size of 7 this really blows up the search space. In essence almost every position can align to almost any other position.
This is the first time I have seen (or someone has reported) that RECON produced such a long low-complexity region as a family. I suppose we could add a low-complexity filter to the families returned by RECON (as is done for RepeatScout) -- I'll look into this for a future release. Another possibility would be to apply a low-complexity filter to seeding words - as is done in Phil Green's crossmatch. Although that would be a larger undertaking. In this case, rmblast will finish it's work, albeit after much much longer processing time than the rest of the families. On my machine running in a single thread this search took 1 hr 22 minutes.
BTW...thanks for tracking this down to those files and attaching them. That really helped in figuring this one out.
Hi, Repatmodeler have been great since the very first version (been using it from 10Mb to 5Gb genomes which is fine). However, for some odd reason when annotating this fungal genome (˜42Mb), the rmblastn just got stuck at round 2. We have been trying this on:
rmblastn precompiled 2.14.1
rmblastn compiled from source 2.14.1
rmblastn compiled from source 2.14.0
sh -c /home/ijt/bin/rmblast-2.14.1/bin//rmblastn -num_alignments 9999999 -db /mnt/nas2/ijt/fungi/Ryder_fungi_repeat_test/RM_159647.SunJul211952592024/round-2/family-22-cons-2.fa -query /mnt/nas2/ijt/fungi/Ryder_fungi_repeat_test/RM_159647.SunJul211952592024/round-2/family-22.fa -gapopen 20 -gapextend 5 -mask_level 80 -complexity_adjust -word_size 7 -xdrop_ungap 300 -xdrop_gap_final 150 -xdrop_gap 75 -min_raw_gapped_score 150 -dust no -outfmt="6 score perc_sub perc_query_gap perc_db_gap qseqid qstart qend qlen sstrand sseqid sstart send slen kdiv cpg_kdiv transi transv cpg_sites qseq sseq" -num_threads 1 -mt_mode 1 -matrix comparison.matrix 2>/mnt/nas2/ijt/fungi/Ryder_fungi_repeat_test/RM_159647.SunJul211952592024/round-2/ncResults-1721599401-186349-2050.75563326076.err
I actually do not know what seems to be causing the problem so I am attaching the two fasta below. Some suggestions would be appreciated.
family-22-cons-2.fa
family-22.fa
The text was updated successfully, but these errors were encountered: