Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seqfold not finding folding at certain sequence lengths #20

Open
tuspjo opened this issue Jul 9, 2024 · 2 comments
Open

seqfold not finding folding at certain sequence lengths #20

tuspjo opened this issue Jul 9, 2024 · 2 comments
Assignees

Comments

@tuspjo
Copy link

tuspjo commented Jul 9, 2024

Hello,
Thank you for making and maintaining this tool.
I'm trying to figure out the best folding for a sequence of unknown length, so I'm trying all reasonable sequence lengths 2-100 nt
the issue is that for certain lengths, seqfold is unable to find more than a single paired base, even though there are plenty of hairpins in the sequence.
in the attached image, each line is the dotbrackt view of one seqfold run, each with +1nt of the same sequence
image

do you know what this software behaviour could be caused by?

Best

Tue

@leshane leshane assigned leshane and jjti and unassigned leshane Aug 1, 2024
@jjti
Copy link
Collaborator

jjti commented Aug 3, 2024

Hey @tuspjo,

This might happen because the "winning" structures include both the reward from the binding of basepairs and the penalization from the creation of hairpins. Ie this isn't exactly what's happening:

seqfold is unable to find more than a single paired base, even though there are plenty of hairpins in the sequence.

It's finding those structures that include one or multiple hairpins but penalizing those hairpins (and the structures that include those hairpins) in a way that the basepair binding before the hairpins doesn't overcome the penalty from the hairpins themselves.

You should be able to inspect this manually using something like below:

from seqfold import dg_cache

# `dg_cache` returns a 2D array where each (i,j) combination returns the MFE from i to j inclusive
cache = dg_cache("GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC")

Take the starting/ending indexes of the hairpins you expect to form. They are likely to have a negative MFE. But expanding it outwards, that MFE starts to become positive (less likely) as the dangling ends and cost of the hairpin comes into play

@jjti
Copy link
Collaborator

jjti commented Aug 3, 2024

Follow up, what are the free energies from these two lines? If the second one has a higher (less negative) free energy, it implies that the single large hairpin was preferred to the penalty of having another basepair in the dangling end outside of the two hairpin structure. The actual penalization code for that sort of thing is here:

d_h, d_s = emap.NN[pair] if pair in emap.NN else emap.TERMINAL_MM[pair]

Screenshot 2024-08-03 at 2 54 47 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants