-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFA output #2
Comments
Thanks for the suggestion! |
Ideally it would match what's in spoa. It records the walks made by the
sequences through the graph as P lines. Should be straightforward if you're
making the MSA.
…On Sat, Sep 19, 2020, 10:25 Yan Gao ***@***.***> wrote:
Thanks for the suggestion!
That should not be very hard to do, I will give it a try!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEIFKW53XKERRIENQITSGRTHLANCNFSM4RS2AGKQ>
.
|
Hi Erik, I just updated the abPOA repo, please try out the GFA format output with parameter '-r4'. Yan |
Hey! This looks great. Here's one of the graphs I made with it.
In Bandage: This is the view of the graph given by odgi viz:
And this will show if there are any inversions in the graph.
I was surprised to not see any, because one of the sequences in the set is inverted relative to the others.
This test is not possible to complete using spoa, or I would show a direct comparison. You can see the orientation in the pggb output though:
So it would seem that when you write the sequence in the GFA, you'd want to indicate that it's inverting if it aligns in the reverse orientation. I patched spoa to support this as well, so it shouldn't be hard to fix here. By the way, the direct abPOA output is much nicer than pggb here. In pggb, the smoothxg step can't handle very large graphs due to the memory requirements of the exact alignment calculation in spoa. Swapping in abPOA should resolve that to some extent. This is the input if you'd like to test: https://github.com/ekg/HLA-zoo/blob/master/seqs/DRB1-3123.fa. |
That sounds really useful. Limitations on the length of the POA length
cause a lot of problems. What kind of procedure are you using for the
seeding?
…On Sat, May 15, 2021, 12:29 Yan Gao ***@***.***> wrote:
Hi @ekg <https://github.com/ekg> ,
I added a minimizer-based seeding in the latest version of abPOA ((v1.2.0)[
https://github.com/yangao07/abPOA/releases/tag/v1.2.0]).
It can reduce the memory usage for very long input sequence, e.g this DRB1.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEIA64UG5VSF6O3N5PTTNZEHZANCNFSM4RS2AGKQ>
.
|
The seeding step is actually very simple. |
How do you choose the reference sequence for the seeding?
…On Sun, May 16, 2021, 07:43 Yan Gao ***@***.***> wrote:
The seeding step is actually very simple.
I collected all the minimizer hits between two sequences based on the
input order.
Then, I applied a similar chaining strategy as used in minimap2 to find
all the co-linear chains, but not allowing large gaps.
Lastly, all the non-overlapping co-linear chains are chained together to
produce the final chaining result, which guides the POA in the next step.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEIMDUQRKTIA5YAP6V3TN5LRJANCNFSM4RS2AGKQ>
.
|
Based on the input order, the i-th sequence is the reference for the (i+1)-th. |
That's very pragmatic. Other strategies might be more sensitive, but could
have much higher performance costs. If we can order the input sequences as
they would sit in a neighbor joining tree, we might be able to get that to
perform as well as possible.
…On Sun, May 16, 2021 at 2:05 PM Yan Gao ***@***.***> wrote:
Based on the input order, the i-th sequence is the reference for the
(i+1)-th.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEONFZV2X56LUOM45CDTN6YIDANCNFSM4RS2AGKQ>
.
|
What are the longest sequences you've tested this on?
On Sun, May 16, 2021 at 3:44 PM Erik Garrison ***@***.***>
wrote:
… That's very pragmatic. Other strategies might be more sensitive, but could
have much higher performance costs. If we can order the input sequences as
they would sit in a neighbor joining tree, we might be able to get that to
perform as well as possible.
On Sun, May 16, 2021 at 2:05 PM Yan Gao ***@***.***> wrote:
> Based on the input order, the i-th sequence is the reference for the
> (i+1)-th.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#2 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AABDQEONFZV2X56LUOM45CDTN6YIDANCNFSM4RS2AGKQ>
> .
>
|
The longest one is actually just this DRB1. |
I will share some slices of a human chromosome that are from 10kb to 1mbp.
…On Mon, May 17, 2021, 03:44 Yan Gao ***@***.***> wrote:
The longest one is actually just this DRB1.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQENLQABILD7Q2DPKXFTTOBYIBANCNFSM4RS2AGKQ>
.
|
That will be great. |
Just updated to v1.2.1. |
Have you considered adding GFA output to abPOA?
I am likely to make a patch to do this, but I wouldn't mind if you beat me to it. 🙂
The text was updated successfully, but these errors were encountered: