You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On line 476 of intSiteLogic.R, we've assigned the "tStart" and "tEnd" from the alignment output to the GRanges start and end positions. This is correct most of the time, but the "tStart" and "tEnd" in the psl output file correspond to the edges of the alignment. If there are bases at the beginning and end of the read that do not align, they are not considered when assigning the position.
I think this is a mistake in logic. Instead, the range assignment should be incorporate the flanking nucleotides that may not align, especially on the LTR end. This region is known to have a drop in sequence quality that can lead to base miscalling. These miscalled bases may not be aligned which will change the position of the integration site position calling. A quality metric used for filtering is that the "qStart" must be less than or equal to 5nt (line 519, adjustible with maxAlignStart), which means that the called position could be up to 5 nt away from the actual junction of the LTR.
This is fixed downstream in our reports by standardizing within a 5bp window, but could be refined in the current informatic processing. The proposed change would look something like below:
In both tables, There are higher counts at 1 than at -1 in the + orientation and lower counts on the - orientation. Similar observations apply to +2 vs -2, +3 vs -3, and so on.
This might occur if a more fragments are mapped to a position that is in the host DNA a few bases away from the junction rather than to a position in the host DNA on the opposite side of hte insertion. To say this another way, it is more likely that base pairs of host DNA are omitted rather than added. I do not show this here, but a similar observation applies to ...
So rerunning those tables with/without your proposed fix would be particularly helpful.
I do not understand the `fixed downstream' sentence. Maybe OK for rough and ready reporting, but makes me itch to think this might roll into data that are used for finer analyses.
On line 476 of intSiteLogic.R, we've assigned the "tStart" and "tEnd" from the alignment output to the GRanges start and end positions. This is correct most of the time, but the "tStart" and "tEnd" in the psl output file correspond to the edges of the alignment. If there are bases at the beginning and end of the read that do not align, they are not considered when assigning the position.
I think this is a mistake in logic. Instead, the range assignment should be incorporate the flanking nucleotides that may not align, especially on the LTR end. This region is known to have a drop in sequence quality that can lead to base miscalling. These miscalled bases may not be aligned which will change the position of the integration site position calling. A quality metric used for filtering is that the "qStart" must be less than or equal to 5nt (line 519, adjustible with maxAlignStart), which means that the called position could be up to 5 nt away from the actual junction of the LTR.
This is fixed downstream in our reports by standardizing within a 5bp window, but could be refined in the current informatic processing. The proposed change would look something like below:
intSiteLogic.R : Line 476
changed to:
The text was updated successfully, but these errors were encountered: