Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reference data limit number of contigs to 5000 #95

Open
tcezard opened this issue Feb 4, 2019 · 0 comments
Open

reference data limit number of contigs to 5000 #95

tcezard opened this issue Feb 4, 2019 · 0 comments

Comments

@tcezard
Copy link

tcezard commented Feb 4, 2019

Associated with issue EdinburghGenomics/Analysis-Driver#344
The QC for genome with lots of contigs can be very slow becuase GATK3.4 does not work very well with such genomes.
The reference data process should check the number of contigs in the in the fasta file and offer to merge contigs in order to limit the QC time.

Add new option in reference_data.py that will merge contigs entries in chunck of 20Mb minimum.
The new genome version can only be used for qc.
A comment will be added to describe what the modification is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant