Skip to content

A wrapper to Heng Li's kseq/readfq, an efficient FastQ/Fasta parser

License

Notifications You must be signed in to change notification settings

pdishuck/nimreadfq

 
 

Repository files navigation

nimreadfq

A Nim wrapper for Heng Li's kseq/readfq, an efficient and fast parser for FastQ and Fasta files. nimreadfq supports reading of FastQ and Fasta files from stdin (use "-"), gzipped or flat files and is very fast (see benchmark below).

The main function is readFQ(), an iterator that yields FQRecord(s). An alternative is readFQPtr(), which returns FQRecordPtr(s). The difference is that the latter uses ptr char instead of strings and is thus potentially faster but memory is reused during iterations.

See example.nim and tests/tester.nim for code examples.

The initial Nim integration (and hard work) was done by Haibao Tang as part of his bio-pipeline repo. Haibao generously granted full rights to his code base, after which I started this separate package called nimreadfq for integration into nimble.

Benchmark

nimreadfq is almost an order of magnitude faster than packages with similar functionality.

Below are timing for reading 500k sequences on a Surface Book 2 running WSL2 (first 500k sequences from SRR8616947_1):

Gzipped FastQ:

  • readfq gz: 1.490s
  • bioseq gz: 18.731s

Flat file FastQ:

  • readfq: 1.250s
  • bioseq: 8.898s
  • fastx: 6.486s

How to reproduce results:

cd ./benchmark
nimble build
./benchmark

About

A wrapper to Heng Li's kseq/readfq, an efficient FastQ/Fasta parser

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 61.3%
  • Nim 38.3%
  • C++ 0.4%