We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi,
The example code here uses an aligner with the following scoring scheme: Match:2, Mismatch:-1, all_gaps_penalties=1
import ssw_aligner query='CAGACAATCAGCATGTTTCCGGCAGCGCCGGTAG' target='TTCCACCATTTGTCCGGACCGGGC' def get_striped_ed_aligner(query_seq,match_score=2): return ssw_aligner.StripedSmithWaterman( query_seq, gap_open_penalty=1, gap_extend_penalty=1, match_score=match_score, mismatch_score=-1, suppress_sequences=True, score_only=False, score_size=2, ## score is < 255 this should be 0; >255 then 1; 2 don't know zero_index=True, mask_length=0, ## Turn off suboptimal mask_auto=False, ## Turn off suboptimal ) algn=get_striped_ed_aligner(query,match_score=2)(target) cigcnt=defaultdict(lambda:0) def add_tl(d,t,l): d[t]+=l; return [add_tl(cigcnt,t,l) for l, t in algn._tuples_from_cigar()] print(cigcnt) print(algn.__repr__())
This results with CIGAR counts and the alignment stats:
defaultdict(<function <lambda> at 0x7f35c0ec2048>, {'M': 19, 'I': 5, 'D': 1}) { 'optimal_alignment_score': 28, 'suboptimal_alignment_score': 0, 'query_begin': 7, 'query_end': 30, 'target_begin': 1, 'target_end_optimal': 21, 'target_end_suboptimal': -1, 'cigar': '7M1I2M1D5M1I1M3I4M', 'query_sequence': '', 'target_sequence': '' }
The issue here is that the CIGAR string returned in the alignment above is for a suboptimal alignment of score 26, see the computations below:
#CAGACAA |TCAGCATGTT TCCGG CAGCGCCGG| TAG # T |TCCACAT TTGTCCGG A CCGG| GC # 'cigar': '7M 1I2M1D 5M1I1M 3I 4M', # M=5+2+2+5+1+4=19 Mismatch=2 Match=17 I=5 D=1 # score=2*17-2-5-1=26 print(2*17-2-5-1) 26
The optimal alignment with the reported score has a different CIGAR string, see below.
#CAGACAA |TC AG CATGTT TCCGGCAGCGCCGG| TAG # T |TC CA CAT TTG TCCGG A CCGG| GC # 'cigar':'2M1D1M1I3M1I2M1D 5M1I1M3I 4M', # M=18 Mismatch=0 I=6 D=2 # score=2*18-6-2=28 print(2*18-6-2) 28
Is it possible to fix it such that the reported CIGAR string is always for an optimal alignment?
The text was updated successfully, but these errors were encountered:
Possibly solved in scikit-bio/scikit-bio#1679 ... though seems not quite from tests.
Sorry, something went wrong.
No branches or pull requests
Hi,
The example code here uses an aligner with the following scoring scheme:
Match:2, Mismatch:-1, all_gaps_penalties=1
This results with CIGAR counts and the alignment stats:
The issue here is that the CIGAR string returned in the alignment above is for a suboptimal alignment of score 26, see the computations below:
The optimal alignment with the reported score has a different CIGAR string, see below.
Is it possible to fix it such that the reported CIGAR string is always for an optimal alignment?
The text was updated successfully, but these errors were encountered: