GFA output #2

ekg · 2020-09-19T05:27:56Z

Have you considered adding GFA output to abPOA?

I am likely to make a patch to do this, but I wouldn't mind if you beat me to it. 🙂

yangao07 · 2020-09-19T08:25:12Z

Thanks for the suggestion!
That should not be very hard to do, I will give it a try!

ekg · 2020-09-19T11:36:14Z

Ideally it would match what's in spoa. It records the walks made by the sequences through the graph as P lines. Should be straightforward if you're making the MSA.

…

On Sat, Sep 19, 2020, 10:25 Yan Gao ***@***.***> wrote: Thanks for the suggestion! That should not be very hard to do, I will give it a try! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQEIFKW53XKERRIENQITSGRTHLANCNFSM4RS2AGKQ> .

yangao07 · 2020-09-22T03:19:28Z

Hi Erik,

I just updated the abPOA repo, please try out the GFA format output with parameter '-r4'.
Let me know if this works for you!

Yan

ekg · 2020-09-22T11:58:14Z

Hey! This looks great. Here's one of the graphs I made with it.

abpoa ~/HLA-zoo/seqs/DRB1-3123.fa -r 4 >DRB1-3123.abpoa.gfa

In Bandage:

This is the view of the graph given by odgi viz:

odgi build -g DRB1-3123.abpoa.gfa -o - | odgi unchop -i - -o - | odgi sort -i - -o - >DRB1-3123.abpoa.og
odgi viz -i DRB1-3123.abpoa.og -b -w 20 -y 500 -o DRB1-3123.abpoa.og.viz.png

And this will show if there are any inversions in the graph.

odgi viz -i DRB1-3123.abpoa.og -b -w 20 -y 500 -o DRB1-3123.abpoa.og.viz.inv.png -z

I was surprised to not see any, because one of the sequences in the set is inverted relative to the others.

edyeet -X -n 20 -N -n 10 DRB1-3123.fa DRB1-3123.fa
...
gi|345525392:5000-18402 13403   0       13403   -       gi|568815567:3779003-3792415    13413   2       13404   13391   13403   33      id:f:0.999552   ma:i:13391      mm:i:1  ni:i:0  nd:i:10ns:i:11 ed:i:22 al:i:13413      se:f:0.0016402  cg:Z:4894=10D6889=1X1608=11I

This test is not possible to complete using spoa, or I would show a direct comparison. You can see the orientation in the pggb output though:

pggb -i DRB1-3123.fa -s 3000 -p 70 -a 70 -n 10 -t 16 -v -l -k 8

So it would seem that when you write the sequence in the GFA, you'd want to indicate that it's inverting if it aligns in the reverse orientation. I patched spoa to support this as well, so it shouldn't be hard to fix here.

By the way, the direct abPOA output is much nicer than pggb here. In pggb, the smoothxg step can't handle very large graphs due to the memory requirements of the exact alignment calculation in spoa. Swapping in abPOA should resolve that to some extent.

This is the input if you'd like to test: https://github.com/ekg/HLA-zoo/blob/master/seqs/DRB1-3123.fa.

yangao07 · 2021-05-15T10:29:03Z

Hi @ekg ,

I added a minimizer-based seeding in the latest version of abPOA (v1.2.0).
It can reduce the memory usage for very long input sequence, e.g this DRB1.
Also, it should produce similar or even better alignment results.

Yan

ekg · 2021-05-15T20:43:41Z

That sounds really useful. Limitations on the length of the POA length cause a lot of problems. What kind of procedure are you using for the seeding?

…

On Sat, May 15, 2021, 12:29 Yan Gao ***@***.***> wrote: Hi @ekg <https://github.com/ekg> , I added a minimizer-based seeding in the latest version of abPOA ((v1.2.0)[ https://github.com/yangao07/abPOA/releases/tag/v1.2.0]). It can reduce the memory usage for very long input sequence, e.g this DRB1. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQEIA64UG5VSF6O3N5PTTNZEHZANCNFSM4RS2AGKQ> .

yangao07 · 2021-05-16T05:43:36Z

The seeding step is actually very simple.
I collected all the minimizer hits between two sequences based on the input order.
Then, I applied a similar chaining strategy as used in minimap2 to find all the co-linear chains, but not allowing large gaps.
Lastly, all the non-overlapping co-linear chains are chained together to produce the final chaining result, which guides the POA in the next step.

ekg · 2021-05-16T09:52:54Z

How do you choose the reference sequence for the seeding?

…

On Sun, May 16, 2021, 07:43 Yan Gao ***@***.***> wrote: The seeding step is actually very simple. I collected all the minimizer hits between two sequences based on the input order. Then, I applied a similar chaining strategy as used in minimap2 to find all the co-linear chains, but not allowing large gaps. Lastly, all the non-overlapping co-linear chains are chained together to produce the final chaining result, which guides the POA in the next step. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQEIMDUQRKTIA5YAP6V3TN5LRJANCNFSM4RS2AGKQ> .

yangao07 · 2021-05-16T12:05:10Z

Based on the input order, the i-th sequence is the reference for the (i+1)-th.

ekg · 2021-05-16T13:44:31Z

That's very pragmatic. Other strategies might be more sensitive, but could have much higher performance costs. If we can order the input sequences as they would sit in a neighbor joining tree, we might be able to get that to perform as well as possible.

…

On Sun, May 16, 2021 at 2:05 PM Yan Gao ***@***.***> wrote: Based on the input order, the i-th sequence is the reference for the (i+1)-th. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQEONFZV2X56LUOM45CDTN6YIDANCNFSM4RS2AGKQ> .

ekg · 2021-05-16T13:45:19Z

What are the longest sequences you've tested this on? On Sun, May 16, 2021 at 3:44 PM Erik Garrison ***@***.***> wrote:

…

That's very pragmatic. Other strategies might be more sensitive, but could have much higher performance costs. If we can order the input sequences as they would sit in a neighbor joining tree, we might be able to get that to perform as well as possible. On Sun, May 16, 2021 at 2:05 PM Yan Gao ***@***.***> wrote: > Based on the input order, the i-th sequence is the reference for the > (i+1)-th. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AABDQEONFZV2X56LUOM45CDTN6YIDANCNFSM4RS2AGKQ> > . >

yangao07 · 2021-05-17T01:44:17Z

The longest one is actually just this DRB1.

ekg · 2021-05-17T06:14:27Z

I will share some slices of a human chromosome that are from 10kb to 1mbp.

…

On Mon, May 17, 2021, 03:44 Yan Gao ***@***.***> wrote: The longest one is actually just this DRB1. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQENLQABILD7Q2DPKXFTTOBYIBANCNFSM4RS2AGKQ> .

yangao07 · 2021-05-17T07:41:38Z

That will be great.
Curious what will happen if we feed abPOA with that long sequences.

yangao07 · 2021-05-18T08:42:27Z

Just updated to v1.2.1.
Removes a redundant and very time-consuming sorting step.

subwaystation mentioned this issue Sep 23, 2020

Add best alignment score to abpoa_res_t #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GFA output #2

GFA output #2

ekg commented Sep 19, 2020

yangao07 commented Sep 19, 2020

ekg commented Sep 19, 2020 via email

yangao07 commented Sep 22, 2020

ekg commented Sep 22, 2020 •

edited

Loading

yangao07 commented May 15, 2021 •

edited

Loading

ekg commented May 15, 2021 via email

yangao07 commented May 16, 2021

ekg commented May 16, 2021 via email

yangao07 commented May 16, 2021

ekg commented May 16, 2021 via email

ekg commented May 16, 2021 via email

yangao07 commented May 17, 2021

ekg commented May 17, 2021 via email

yangao07 commented May 17, 2021

yangao07 commented May 18, 2021

GFA output #2

GFA output #2

Comments

ekg commented Sep 19, 2020

yangao07 commented Sep 19, 2020

ekg commented Sep 19, 2020 via email

yangao07 commented Sep 22, 2020

ekg commented Sep 22, 2020 • edited Loading

yangao07 commented May 15, 2021 • edited Loading

ekg commented May 15, 2021 via email

yangao07 commented May 16, 2021

ekg commented May 16, 2021 via email

yangao07 commented May 16, 2021

ekg commented May 16, 2021 via email

ekg commented May 16, 2021 via email

yangao07 commented May 17, 2021

ekg commented May 17, 2021 via email

yangao07 commented May 17, 2021

yangao07 commented May 18, 2021

ekg commented Sep 22, 2020 •

edited

Loading

yangao07 commented May 15, 2021 •

edited

Loading