-
Notifications
You must be signed in to change notification settings - Fork 2
Quality score (QS) sequences
Felipe A. Louza edited this page Jun 26, 2020
·
4 revisions
gsufsort can also output (command --qs
) the Quality Scores (QS) permuted according to the BWT symbols:
This option is valid only for .fastq
or .fq
files.
Given the first DNA read in dataset/reads.fastq
:
head -4 dataset/reads.fastq
@HWI-ST928:79:C0GNWACXX:6:1101:1184:2104 1:N:0:TAAGGCGATATCCTCT
AGTTAGGACTATTCGAACATTATGTCACAAACGTGATGTCACAAAGCCGAATTGTCTGGAGTTAAGACTATACGAACATTATGAAACAAACGTGATGTCAC
+
@C@FDEDDHHGHHJIIGGHJJIJGIJIHGIIFGEFIIJJJGHIGGF@DHEHIIIIJIIGGIIIGE@CEEHHEE@B?AAECDDCDDCCCBB<=<?<?CCC>A
Then, run:
./gsufsort dataset/reads.fastq --docs 1 --bwt --qs
## gsufsort ##
## store_to_disk ##
dataset/reads.fastq.bwt 103 bytes (n = 103)
dataset/reads.fastq.bwt.qs 103 bytes (n = 103)
The QS permuted sequence is written at dataset/reads.fastq.bwt.qs
:
tail dataset/reads.fastq.bwt.qs
ACCHHD@ICGIIHCDJJBIHBI@DGGFGEC?JFAGHE>CIGCIJ?GFEH@BICDIDEJDEEI<EGDI?JII<FG@IH@EEJHCGJHID=GJ<IIIICAHGH
gsufsort can invert the QS permuted sequence together with the BWT (options --ibwt --qs
).
./gsufsort --ibwt --qs dataset/reads.fastq.bwt
See the resulting file:
less +1 dataset/reads.fastq.iqs
@C@FDEDDHHGHHJIIGGHJJIJGIJIHGIIFGEFIIJJJGHIGGF@DHEHIIIIJIIGGIIIGE@CEEHHEE@B?AAECDDCDDCCCBB<=<?<?CCC>A
Compare the output with the original file:
head -4 dataset/reads.fastq | sed -n 4~4p - | diff -s dataset/reads.fastq.iqs -
Files dataset/reads.fastq.iqs and - are identical