Add Random Subsampling to Nanopore mNGS Pipeline #371
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Switching from using head for subsampling to using
seqtk sample
. I timed the head vs. seqtk sample approach out on a 6.7 GB nanopore sample and the seqtk option took about 2.3 times longer to run then head, but only took 27 seconds to run. Since 6.7 GB is on the larger side for a nanopore sample i think we're safe subbing outhead
forseqtk sample
The long-read-mngs docker container already has seqtk installed, so no changes are needed to add that in there.
ticket: CZID-9693
tested on this sample on staging and cross referenced this sample that used the existing subsampling scheme to verify that new approach gave similar results