-
Notifications
You must be signed in to change notification settings - Fork 61
daligner executable
Mark Lakata edited this page Jun 20, 2016
·
2 revisions
This is an example taken from a FALCON workflow.
The input to this flow is a raw_reads.db
file in the current directory.
$ daligner -v -t16 -H15000 -e0.7 -s1000 raw_reads.1 raw_reads.1
The options are:
-v Verbose output
-t16 "Tuple supression [sic] frequency."
If a kmer appears more than 16 times, don't count
it as a seed hit. This is to avoid homopolymers, which don't help in alignment.
-H15000 "HGAP threshold (in bp.s)"
?
-e0.7 "Average error [sic]"
The alignment correlation must be greater than 70% (error less than 30%)
-s1000 "Trace spacing"
?
raw_reads.1 raw_reads is shorthand for raw_reads.db. The 1 means use partition 1.
raw_reads.1 By repeating the same file name, this compares the reads to themselves.
Here is the log output:
Building index for raw_reads.1
Kshift=28
BSHIFT=8
TooFrequent=16
(Kshift-1)/BSHIFT + (TooFrequent < INT32_MAX)=4
sizeof(KmerPos)=16
nreads=23595
Kmer=14
block->reads[nreads].boff=400033006
kmers=399702676
sizeof(KmerPos)*(kmers+1)=6395242832
Allocated 399702677 of 16 (6395242832 bytes) at 0x7f3f56627010
Kmer count = 399,702,676
Using 11.91Gb of space
Revised kmer count = 294,457,040
Index occupies 4.39Gb
Comparing raw_reads.1 to raw_reads.1
Capping mutual k-mer matches over 10000 (effectively -t100)
Hit count = 682,284,336
Highwater of 24.72Gb space
682,284,336 14-mers (4.264076e-09 of matrix)
1,051,303 seed hits (6.570335e-12 of matrix)
377,595 confirmed hits (2.359858e-12 of matrix)
Building index for c(raw_reads.1)
Kshift=28
BSHIFT=8
TooFrequent=16
(Kshift-1)/BSHIFT + (TooFrequent < INT32_MAX)=4
sizeof(KmerPos)=16
nreads=23595
Kmer=14
block->reads[nreads].boff=400033006
kmers=399702676
sizeof(KmerPos)*(kmers+1)=6395242832
Allocated 399702677 of 16 (6395242832 bytes) at 0x7f3c5a02d010
Kmer count = 399,702,676
Using 11.91Gb of space
Revised kmer count = 294,457,040
Index occupies 4.39Gb
Comparing raw_reads.1 to c(raw_reads.1)
Capping mutual k-mer matches over 10000 (effectively -t100)
Hit count = 643,810,060
Highwater of 23.57Gb space
643,810,060 14-mers (4.023624e-09 of matrix)
960,715 seed hits (6.004186e-12 of matrix)
346,001 confirmed hits (2.162404e-12 of matrix)
The output is a collection of *.las (local alignment) files:
$ ls -l *.las
-rw-r--r-- 1 mlakata Domain Users 4837180 Jun 20 11:18 raw_reads.1.raw_reads.1.C0.las
-rw-r--r-- 1 mlakata Domain Users 5034968 Jun 20 11:18 raw_reads.1.raw_reads.1.C1.las
-rw-r--r-- 1 mlakata Domain Users 4960928 Jun 20 11:18 raw_reads.1.raw_reads.1.C2.las
-rw-r--r-- 1 mlakata Domain Users 5061924 Jun 20 11:18 raw_reads.1.raw_reads.1.C3.las
-rw-r--r-- 1 mlakata Domain Users 5222656 Jun 20 11:17 raw_reads.1.raw_reads.1.N0.las
-rw-r--r-- 1 mlakata Domain Users 5482888 Jun 20 11:17 raw_reads.1.raw_reads.1.N1.las
-rw-r--r-- 1 mlakata Domain Users 5495064 Jun 20 11:17 raw_reads.1.raw_reads.1.N2.las
-rw-r--r-- 1 mlakata Domain Users 5596832 Jun 20 11:17 raw_reads.1.raw_reads.1.N3.las
The reason for the plurality of files is that each file is the result of 1 thread (4 threads is baked into daligner), and
each thread is run twice, once in the normal-normal direction (N
) and once in the normal-reverse complement direction (C
).