Change check_own process to not include input files in output #113
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #107
Only when you are using a self-provided reference file, you would get duplicates in your reference file at some point. This was caused by the
check_own
process, which did this:seqkit seq {input} -o {output}.gz
and then returns any gzip files including those in the input. This causes a problem when the input file has a .gz extension, leading to both the input and seqkit output files being passed on to the next process, which combines both files together and renames any duplicates.Resolved by disabling "includeInputs" and also I used a fixed output file name and explicitly passed that on. Feel free to tweak this change if you like but it should resolve the problem.