Skip to content

Quality score (QS) sequences

Felipe A. Louza edited this page Jun 26, 2020 · 4 revisions

quality score (QS) sequences

gsufsort can also output (command --qs) the Quality Scores (QS) permuted according to the BWT symbols:

This option is valid only for .fastq or .fq files.

Running example

Given the first DNA read in dataset/reads.fastq:

head -4 dataset/reads.fastq 
@HWI-ST928:79:C0GNWACXX:6:1101:1184:2104 1:N:0:TAAGGCGATATCCTCT
AGTTAGGACTATTCGAACATTATGTCACAAACGTGATGTCACAAAGCCGAATTGTCTGGAGTTAAGACTATACGAACATTATGAAACAAACGTGATGTCAC
+
@C@FDEDDHHGHHJIIGGHJJIJGIJIHGIIFGEFIIJJJGHIGGF@DHEHIIIIJIIGGIIIGE@CEEHHEE@B?AAECDDCDDCCCBB<=<?<?CCC>A

Then, run:

./gsufsort dataset/reads.fastq --docs 1 --bwt --qs
## gsufsort ##
## store_to_disk ##
dataset/reads.fastq.bwt	103 bytes (n = 103)
dataset/reads.fastq.bwt.qs	103 bytes (n = 103)

The QS permuted sequence is written at dataset/reads.fastq.bwt.qs:

tail dataset/reads.fastq.bwt.qs 
ACCHHD@ICGIIHCDJJBIHBI@DGGFGEC?JFAGHE>CIGCIJ?GFEH@BICDIDEJDEEI<EGDI?JII<FG@IH@EEJHCGJHID=GJ<IIIICAHGH

gsufsort can invert the QS permuted sequence together with the BWT (options --ibwt --qs).

./gsufsort --ibwt --qs dataset/reads.fastq.bwt

See the resulting file:

less +1 dataset/reads.fastq.iqs
@C@FDEDDHHGHHJIIGGHJJIJGIJIHGIIFGEFIIJJJGHIGGF@DHEHIIIIJIIGGIIIGE@CEEHHEE@B?AAECDDCDDCCCBB<=<?<?CCC>A

Compare the output with the original file:

head -4 dataset/reads.fastq | sed -n 4~4p - | diff -s dataset/reads.fastq.iqs -
Files dataset/reads.fastq.iqs and - are identical
Clone this wiki locally