seqfold not finding folding at certain sequence lengths #20

tuspjo · 2024-07-09T09:26:49Z

Hello,
Thank you for making and maintaining this tool.
I'm trying to figure out the best folding for a sequence of unknown length, so I'm trying all reasonable sequence lengths 2-100 nt
the issue is that for certain lengths, seqfold is unable to find more than a single paired base, even though there are plenty of hairpins in the sequence.
in the attached image, each line is the dotbrackt view of one seqfold run, each with +1nt of the same sequence

do you know what this software behaviour could be caused by?

Best

Tue

jjti · 2024-08-03T18:53:25Z

Hey @tuspjo,

This might happen because the "winning" structures include both the reward from the binding of basepairs and the penalization from the creation of hairpins. Ie this isn't exactly what's happening:

seqfold is unable to find more than a single paired base, even though there are plenty of hairpins in the sequence.

It's finding those structures that include one or multiple hairpins but penalizing those hairpins (and the structures that include those hairpins) in a way that the basepair binding before the hairpins doesn't overcome the penalty from the hairpins themselves.

You should be able to inspect this manually using something like below:

from seqfold import dg_cache

# `dg_cache` returns a 2D array where each (i,j) combination returns the MFE from i to j inclusive
cache = dg_cache("GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC")

Take the starting/ending indexes of the hairpins you expect to form. They are likely to have a negative MFE. But expanding it outwards, that MFE starts to become positive (less likely) as the dangling ends and cost of the hairpin comes into play

jjti · 2024-08-03T18:58:16Z

Follow up, what are the free energies from these two lines? If the second one has a higher (less negative) free energy, it implies that the single large hairpin was preferred to the penalty of having another basepair in the dangling end outside of the two hairpin structure. The actual penalization code for that sort of thing is here:

seqfold/seqfold/fold.py

Line 491 in 7e008d5

d_h, d_s = emap.NN[pair] if pair in emap.NN else emap.TERMINAL_MM[pair]

leshane assigned leshane and jjti and unassigned leshane Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seqfold not finding folding at certain sequence lengths #20

seqfold not finding folding at certain sequence lengths #20

tuspjo commented Jul 9, 2024 •

edited

Loading

jjti commented Aug 3, 2024 •

edited

Loading

jjti commented Aug 3, 2024 •

edited

Loading

seqfold not finding folding at certain sequence lengths #20

seqfold not finding folding at certain sequence lengths #20

Comments

tuspjo commented Jul 9, 2024 • edited Loading

jjti commented Aug 3, 2024 • edited Loading

jjti commented Aug 3, 2024 • edited Loading

tuspjo commented Jul 9, 2024 •

edited

Loading

jjti commented Aug 3, 2024 •

edited

Loading

jjti commented Aug 3, 2024 •

edited

Loading