Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doppelmark too slow #10

Open
fgvieira opened this issue Apr 22, 2021 · 0 comments
Open

Doppelmark too slow #10

fgvieira opened this issue Apr 22, 2021 · 0 comments

Comments

@fgvieira
Copy link

Dear all,

I recently decided to check doppelmark speed on a BAM file, given claims that it could process a ~250Gb file in little over 30min using 40 cpus (can't find the blog now). However, on a file with 1'709'696'127 reads it took over 10 hours and the average cpu usage was only 3 (even though I set it to 40)!

I am using the command:

doppelmark --parallelism 40 --logtostderr -v 2 --optical-distance -1 --bam input.bam --index input.bai --clear-existing --disk-mate-shards 0 -tag-duplicates -clip-padding 200 -scratch-dir /tmp --metrics output.metrics --output output.bam

Am I doing anything wrong? Is there any options that can be used to increase the speed?

thanks,

@fgvieira fgvieira changed the title Doppelmark too slow Doppelmark slow Apr 22, 2021
@fgvieira fgvieira changed the title Doppelmark slow Doppelmark too slow Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant