Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plenty of SV candidates to consider, but no final SVs in vcf file? #9

Open
jkgrenier opened this issue Mar 3, 2023 · 4 comments
Open

Comments

@jkgrenier
Copy link

Hi, Im trying to use LEVIATHAN for linked-read data that has a long BXtag barcode sequence (36nt). Will this cause problems?
So far it seems I can successfully generate the LRez barcode index and LEVIATHAN runs, generating a candidates.bedpe file with plenty of "valid SV candidates to consider", but the final output is always "Output 0 SVs".
I dont know if the problem is with the noncanonical barcode length or another parameter that I can relax to recover the best SV calls.
My dataset has some known large inversions, so I expect to find certain reference interval pairs indicating SVs (and I can see evidence of this on a case by case basis).
Thanks,
Jen

@clemaitre
Copy link
Collaborator

Hi,

The large size of the barcode sequences is not a problem, as long as it is stricly a nucleotide sequence (without 'N', otherwise the barcode is considered invalid). If the barcode had not been recognized, you would have get an error when indexing with LRez such as "determineSequencingTechnology: Unrecognized sequencing technology. Please make sure your barcodes originate from a compatible technology or are reported as nucleotides in the BX:Z tag."
With which linked-read technology did you obtain your data ?

You can run LRez stats on your bam file to get some statistics about your linked-reads, such as the number of distinct barcodes, to be sure that they are all well recognized.

Concerning the 0 SV issue, this may happen that none of the candidates are validated. SV candidates output in candidates.bedbe are pairs of regions on the reference genome that share a higher than expected amount of barcodes. These are identified solely based on the barcode signals.
The next step consists in validating and refining the coordinates and types of these SV candidates but using only the split-read and paired mapping signals of the reads (the barcodes are no longer used in this step). Are your data paired-end reads ? Have you allowed split-mapping in the mapping step ? What is the read depth of your data ? Have you changed some parameter values ?

Best,
Claire

@jkgrenier
Copy link
Author

Hi Claire,

I'm developing a new 'haplotagging bead' technology, and the barcodes are a little longer than 10x. They are reported as nucleotides in BX:Z tags in the bam file, and I had no errors with LRez. Sounds like the barcode format is not an issue.

Thanks for the reminder about the requirement for split-read mapping, I'm sure that's the problem! What aligner do you recommend? I used bowtie2 which doesn't support split-reads, but I can easily rerun the mapping step. My dataset does contain PE reads, and I can see the 'evidence' for known large inversions in the candidates.bedpe output files. I can look more carefully for discordant pairs required for your validation phase (as well as mapping with a method that includes split-reads).

Thanks,
Jen

@jkgrenier
Copy link
Author

Hi again,
After using bwa mem to map reads, I can confirm that LEVIATHAN is calling the SVs that we expect to see in these samples.
Thanks!

@clemaitre
Copy link
Collaborator

Hi Jen,

Thank you for the update ! It is really great that Leviathan can find your inversions !

And good luck for your new 'haplotagging bead' technology ! If it has some specificities that are not yet taken into account by LRez or Leviathan, do not hesitate to contact us.

Best,
Claire

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants