Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on long-read correction #4

Closed
novikk opened this issue Feb 13, 2019 · 11 comments
Closed

Segmentation fault on long-read correction #4

novikk opened this issue Feb 13, 2019 · 11 comments

Comments

@novikk
Copy link

novikk commented Feb 13, 2019

I'm trying to correct a dataset of real ONT reads and I'm getting a segmentation fault error after the mapping with minimap2:

[irubia@kepler consent]$ /genomics/users/irubia/tools/CONSENT/CONSENT-correct --in /genomics/users/irubia/datasets/ERCC_Mix1_SRR6058582.filtered.fa --out corrected.fa --type ONT
[Wed Feb 13 12:24:42 CET 2019] Self-aligning the long reads (minimap2)
[M::mm_idx_gen::5.006*1.15] collected minimizers
[M::mm_idx_gen::5.682*1.64] sorted minimizers
[M::main::5.682*1.64] loaded/built the index for 116426 target sequence(s)
[M::mm_mapopt_update::6.006*1.60] mid_occ = 473
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 116426
[M::mm_idx_stat::6.203*1.58] distinct minimizers: 8869169 (59.58% are singletons); average occurrences: 2.436; average spacing: 3.014
[M::worker_pipeline::30.410*6.06] mapped 116426 sequences
[M::main] Version: 2.14-r894-dirty
[M::main] CMD: /genomics/users/irubia/tools/CONSENT/minimap2/minimap2 -k15 -w5 -m100 -g10000 -r2000 --max-chain-skip 25 --dual=yes -PD --no-long-join -I100G -t8 /genomics/users/irubia/datasets/ERCC_Mix1_SRR6058582.filtered.fa /genomics/users/irubia/datasets/ERCC_Mix1_SRR6058582.filtered.fa
[M::main] Real time: 30.458 sec; CPU: 184.468 sec; Peak RSS: 1.088 GB
[Wed Feb 13 12:25:14 CET 2019] Correcting the long reads
/genomics/users/irubia/tools/CONSENT/CONSENT-correct: line 173:  9156 Segmentation fault      (core dumped) $LRSCf/bin/CONSENT -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"

I've tried on two different datasets and I'm getting the same error.

OS info:

[irubia@kepler consent]$ lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.4.1708 (Core) 
Release:	7.4.1708
Codename:	Core

[irubia@kepler consent]$ g++ --version
g++ (GCC) 5.4.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[irubia@kepler consent]$ python --version
Python 3.6.2
@morispi
Copy link
Owner

morispi commented Feb 14, 2019

Hey,

Segmentation faults seems to be pretty dataset dependent.
Never encountered one (that I did not fix) with all the experiments I've ran (incl. human) so far.

Few questions so I can further help you:

  1. Does CONSENT still corrects a few long reads, then crashes, or does it fails to perform correction at all?

  2. Have you tried CONSENT on any other dataset? Any small dataset (at least 10x coverage) from any bacterial genome would do, that'd be just to know if the error appears whatever it is you attempt to correct.

  3. Is your data public? Found this link (https://trace.ddbj.nig.ac.jp/DRASearch/run?acc=SRR6058582) Googling the accession ID of your LR file, but no fasta file available for download.
    If it is public, and if you can provide me with a link to download it, I'd be glad to run CONSENT on your dataset and spot the segfault.

Cheers,
Pierre

@godkin1211
Copy link

Yesterday, I encountered this problem, too.
My data are 16s full-length nanopore sequencing reads because I have PAF file already, and then I used CONSENT command directly:

$ CONSENT -a Alignments_32234.paf -s 4 -S 1000 -l 500 -k 9 -c 8 -A 2 -f 4 -m 50 -j 20 -r input.fa -M 150 >> output.fa
Segmentation fault (core dumped)

@morispi
Copy link
Owner

morispi commented Feb 14, 2019

Hi,

I'll have the same 3 questions as just above, please.

Just knowing that there's a segfault somewhere doesn't help me so much if I can't reproduce it to see where it comes from.

Pierre

@novikk
Copy link
Author

novikk commented Feb 14, 2019

@morispi

  1. It doesn't correct any read at all.
  2. I've tried in two datasets, but both are RNA (changing U->T) or cDNA, and neither is working. Is this tool only meant to work with DNA data?
  3. The data is public, you can download it from https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?run=SRR6058582 using the SRA toolkit.

Hope this helps!

@morispi
Copy link
Owner

morispi commented Feb 14, 2019

@novikk

Great! Thanks for the answers and for the link to the dataset.

CONSENT was indeed designed for DNA reads, but I don't see any reason for it to crash on RNA if you switch U to T?
Actually I tried it myself this morning on a tiny dataset containing Us, and all went well.

Downloading the data and investigating the issue later tonight.
I'll keep you updated.

Cheers
P

@morispi
Copy link
Owner

morispi commented Feb 14, 2019

@novikk
Just checked your data.
Problems comes from the fact your long reads header contain spaces.
Changing the spaces to underscores does the trick for me.

@godkin1211 maybe that's the same thing for you?
If the original reads file contains spaces, that'd explain the problem.
As you already have the PAF file however, you should just plain trim everything that follows the first white space (sed 's/ .*//g') so that CONSENT can work.

@novikk
However, it seems like your reads have a mean length of 152bp?
You should check CONSENT parameters and adapt them, as they are meant to be used for much longer reads, that are divided into 500bp windows.
Windows longer than the actual reads might cause further issues :p

Cheers
P

@godkin1211
Copy link

Thanks to @morispi ! It works after replacing those spaces in header.

@morispi
Copy link
Owner

morispi commented Feb 15, 2019

@godkin1211 Great!

@morispi
Copy link
Owner

morispi commented Feb 16, 2019

@novikk

Did you manage to run CONSENT in the end?
Did you also check my comment about the parameters above?

Waiting on your answer to close the issue. :)

Cheers
Pierre

@novikk
Copy link
Author

novikk commented Feb 18, 2019

Hi @morispi, will check it ASAP, probably tomorrow!

@novikk
Copy link
Author

novikk commented Feb 19, 2019

Worked fine after renaming the headers of the FASTA and tuning the "windowSize" parameter.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants