You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've just been trying to run your handy phastSim method on several different sequences. It seems to fail on a few -- for example on the genome AE001825:
With a simple tree file (AE001825.1.phastsim-001.phastsimtree): ((phastsim0:0.1,phastsim1:0.1,phastsim2:0.1,phastsim3:0.1,phastsim4:0.1,phastsim5:0.1,phastsim6:0.1,phastsim7:0.1,phastsim8:0.1,phastsim10:0.1):0.001);
The AE001825 sequence does have some non-ACGT characters (R & K), which I presume is the issue for phastSim? Should I randomly select A/G for R, or G/T for K.
Best wishes,
Paul.
The text was updated successfully, but these errors were encountered:
Indeed, the non-ACGT characters in the reference are the issue. I will think about what the best default behaviour for phastSim should be in this case - definitively we should give a clearer error message.
The reason I am hesitant to let phastSim automatically pick a nucleotide at random is because it would make the interpretation of the output more complicated/inconsistent (the concise output formats need to be interpreted in terms of differences wrt the reference).
But we could write an additional script that makes a random sampling from reference ambiguous characters and creates a new ambiguity-free reference, if useful.
Yes -- my work around would be to randomly sample from IUPAC ambiguity characters. These are rare in most assembled sequences, so shouldn't throw phastSIM's results off by too much (AFAIK).
Hi Nicola,
I've just been trying to run your handy phastSim method on several different sequences. It seems to fail on a few -- for example on the genome AE001825:
phastSim --outpath ./ --outputFile AE001825.1.phastsim-001.temp --reference AE001825.1.fasta --treeFile AE001825.1.phastsim-001.phastsimtree --createFasta --hyperMutProbs 0.01 0.04 --hyperMutRates 100 10 --indels --insertionRate GAMMA 0.1 1.0 --deletionRate CONSTANT 0.1 --insertionLength GEOMETRIC 0.9 --deletionLength NEGBINOMIAL 2 0.95
With a simple tree file (AE001825.1.phastsim-001.phastsimtree):
((phastsim0:0.1,phastsim1:0.1,phastsim2:0.1,phastsim3:0.1,phastsim4:0.1,phastsim5:0.1,phastsim6:0.1,phastsim7:0.1,phastsim8:0.1,phastsim10:0.1):0.001);
The
AE001825
sequence does have some non-ACGT characters (R
&K
), which I presume is the issue forphastSim
? Should I randomly selectA
/G
forR
, orG
/T
forK
.Best wishes,
Paul.
The text was updated successfully, but these errors were encountered: