Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test: Create new repo for integretion test datasets. #294

Open
tcezard opened this issue Nov 20, 2017 · 2 comments
Open

Integration test: Create new repo for integretion test datasets. #294

tcezard opened this issue Nov 20, 2017 · 2 comments

Comments

@tcezard
Copy link

tcezard commented Nov 20, 2017

Looking into github policies we should be able to upload the stripped down dataset and maybe the full size as it would not breech github policy.
see https://help.github.com/articles/working-with-large-files/ and https://help.github.com/articles/conditions-for-large-files/
This would potentially allow us to version the md5/qc associated with the results alongside.

@mwhamgenomics
Copy link
Collaborator

mwhamgenomics commented Nov 21, 2017

If we do this, we would need to find a way of making file checksums deterministic regardless of run location. This might mean doing something like cat-ing the file and piping into md5 via a blacklist consisting of, e.g:

vcf:
    '^##GATKCommandLine.+$'
    '^##reference.+$'
bam:
    '^@PG +ID:GATK IndelRealigner.+$'
samtools_stats:
    '^# The command line was:.+$'

@tcezard
Copy link
Author

tcezard commented Nov 22, 2017

md5 and metrics of demultiplexing of full size dataset can be found in logs/integration_tests/2017-11-21_11\:48\:23.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants