Better global alignment when aligning in other direction #78

glennhickey · 2024-10-17T17:32:32Z

@adamnovak has been picking through the HPRC graph and finding suspect alignments. Here is one from CHM13#0#chr3:164033777-164033842. If I align it with abpoa in its forward orientation (on chm13) I get

(where gaps are transparent).
But if I reverse complement I get

which seems much cleaner -- ie there is only 1 gap per row except 3 cases, where the gap seems more properly placed on the right.

Are these alignment somehow scoring equivalently, even though by eye one seems much better? If not, is this expected or a bug? Do you have any suggestions on how it could be improved?

All the information to reproduce is here (see README for command lines):
https://public.gi.ucsc.edu/~hickey/debug/abpoa_direction_oct17_2024/

Thanks so much!

The text was updated successfully, but these errors were encountered:

yangao07 · 2024-10-17T20:27:53Z

The difference comes from two reasons:

you used seeding and progressive tree to order the input sequence, which does not work well for this repeat region sequences. I did get less gaps with the seeding disabled for forward strand.
the more-than-one-gap alignment in the first MSA is actually optimal, even though its RC gets a one-gap alignment, because some gap is not penalized as it already exist in the partial order alignment graph. So, input order is very important for determining the number of gaps in the alignment.

Although I don't know which is better, they are all expected results.

Forward strand without seeding:

Reverse strand without seeding:

yangao07 · 2024-10-17T20:44:33Z

But I do agree that they are not real optimal alignment results.
It may not be easy, but I will try to improve it.

glennhickey · 2024-10-18T18:00:09Z

Thanks for the quick follow-up. By eye it still seems that the reverse with seeding is the best. I understand that the difference between the different scenarios is explainable by the order, and it's not reflected in the current scoring scheme.

I'm still not sure I understand the difference when aligning the different strands -- shouldn't the order be unaffected?

In any case, it does seem like there is room for future improvements -- we are happy to test any ideas you come up with!

yangao07 · 2024-10-18T18:07:57Z

The difference between different strands is because abpoa always puts gaps in the left-most position.
To get the same result, gaps should be put on the right side for the reverse-comp strand.

yangao07 · 2024-11-18T14:56:17Z

Hi @glennhickey again,

I am adding a parameter for abpoa to deal with this type of homopolymer sequence alignment to reach better visual alignment results. I also encountered this type of issue during my project.
Do you have any ready-to-use data that can work as an evaluation dataset? Like a set of input sequences with the expected MSA result or consensus sequence.
I can use some simulation data, but maybe you have some real-scenario ones.

Cheers!

glennhickey · 2024-11-18T19:26:55Z

Apart from what I've shared in github issues here, we have a couple small simulated tests in Cactus

https://github.com/UCSantaCruzComputationalGenomicsLab/cactusTestData

where we use mafComparator to compare to the provided truth MAF.

If you end up making a significant change, I should be able to plug it into Cactus and, say, make a new pangenome graph and measure some stats on that...

yangao07 · 2024-11-18T19:45:43Z

Is there any specific score parameters/matrix I should use for this data?

glennhickey · 2024-11-18T22:59:07Z

Hmm, that data's probably best with the (current) default cactus scores, ex
https://public.gi.ucsc.edu/~hickey/debug/abpoa_fail_mar21.mat
https://public.gi.ucsc.edu/~hickey/debug/abpoa_fail_mar21.cmd

yangao07 · 2024-11-26T14:47:34Z

Hi @glennhickey , I did come up with some heuristics for improved graph alignment. Would be great if you can have some comments on that.
Do you happen to have some time to talk via zoom?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better global alignment when aligning in other direction #78

Better global alignment when aligning in other direction #78

glennhickey commented Oct 17, 2024

yangao07 commented Oct 17, 2024 •

edited

Loading

yangao07 commented Oct 17, 2024

glennhickey commented Oct 18, 2024

yangao07 commented Oct 18, 2024

yangao07 commented Nov 18, 2024

glennhickey commented Nov 18, 2024

yangao07 commented Nov 18, 2024

glennhickey commented Nov 18, 2024

yangao07 commented Nov 26, 2024

Better global alignment when aligning in other direction #78

Better global alignment when aligning in other direction #78

Comments

glennhickey commented Oct 17, 2024

yangao07 commented Oct 17, 2024 • edited Loading

yangao07 commented Oct 17, 2024

glennhickey commented Oct 18, 2024

yangao07 commented Oct 18, 2024

yangao07 commented Nov 18, 2024

glennhickey commented Nov 18, 2024

yangao07 commented Nov 18, 2024

glennhickey commented Nov 18, 2024

yangao07 commented Nov 26, 2024

yangao07 commented Oct 17, 2024 •

edited

Loading