You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying to polish one of our mock community datasets with racon-gpu, but am seeing slow performance during the overlap alignment phase.
I can see many alignments are not being run on the GPU, but the CPU instead. Admittedly, slow performance was exacerbated by the use of only four CPU cores. I've had a little look around the code and as I understand it, an alignment can be prevented from running on the GPU under two conditions:
I see there is also an error mode for exceeded_max_alignment_difference but I can't seem to find a case where that is actually raised by CUDAAligner.
I've checked the stats on the reads I am assembling and polishing with and the N50 is 28.3 Kbp (nice one @joshquick), so I'm thinking perhaps our longest reads are getting thrown off the GPU and are left to run on the CPU afterwards.
Just to check I was on the right track, I've filtered this data set of reads longer than 15Kbp and run the polishing again; and see there's now very little time spent aligning overlaps on the CPU. Though, I'm not entirely sure if this is just because the reads are <= 15 Kbp, or if there are fewer reads.
The text was updated successfully, but these errors were encountered:
I thought I would try raising this myself, but it seems to linearly require more memory, meaning you must run fewer batches. This ends up taking much more GPU time overall, and presumably wastes a lot of memory in cases where the read overlaps are assigned to a batch are much shorter than the maximum allowed length. I wonder if there would be any point in having batches of different sizes and binning the overlaps; or ordering the overlaps by size and creating/destroying increasingly larger batches.
I've been trying to polish one of our mock community datasets with racon-gpu, but am seeing slow performance during the overlap alignment phase.
I can see many alignments are not being run on the GPU, but the CPU instead. Admittedly, slow performance was exacerbated by the use of only four CPU cores. I've had a little look around the code and as I understand it, an alignment can be prevented from running on the GPU under two conditions:
I see there is also an error mode for
exceeded_max_alignment_difference
but I can't seem to find a case where that is actually raised by CUDAAligner.I've checked the stats on the reads I am assembling and polishing with and the N50 is 28.3 Kbp (nice one @joshquick), so I'm thinking perhaps our longest reads are getting thrown off the GPU and are left to run on the CPU afterwards.
I've found where the CUDABatchAligner is initialised and see it has hard-coded limits of 15000 for both the max query and max target. Is this a specific limit for performance reasons, or would it be possible to perhaps allow users to set these limits themselves? Does the choice here affect the memory allocation on the GPU later? Ideally we'd want to raise it to at least 25Kbp, if not 50Kbp.
Just to check I was on the right track, I've filtered this data set of reads longer than 15Kbp and run the polishing again; and see there's now very little time spent aligning overlaps on the CPU. Though, I'm not entirely sure if this is just because the reads are <= 15 Kbp, or if there are fewer reads.
The text was updated successfully, but these errors were encountered: