-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting Beagle imputed VCF for TRTools #209
Comments
Thank you for developing annotaTR for addressing this issue @gymreklab! I have been making some tests this week, here is my experience (also for the future reference for others who might be having similar problems): I reinstalled TRTools, and tried annotaTR on the same test file as before (chr18 Beagle-imputed VCF [this time also with GP and AP format fields] - imputed using Saini et al. 2018 STR reference panel http://gymreklab.com/2018/03/05/snpstr_imputation.html) as below:
Here I also tried with
I then thought that perhaps annotaTR was not designed for Saini et al. 2018 imputation reference panel, as the documentation (https://trtools.readthedocs.io/en/latest/source/annotaTR.html) mentions the latest TR reference panel from EnsemblTR (by the way I was not aware of this new publication and reference panel, looks great, congratulations!), so I switched to this reference panel - using a chr18 VCF (filtered for dosage R2 > 0.3 TRs) again for testing and using Beagle v5.4 (22Jul22.46e) for imputation with the same settings as before (with AP and GP). I first tried getting only VCF output to see how the output looks like:
This worked very well with either The outputted VCF is having TRDS format field that is exactly what I wanted to obtain (especially with However, adding "pgen" to
I kept getting this kind of error:
Here, 21490 should be the number of TRs I had in the input file, and the error is indicating that pgen file contains 0 variants (even though pgen file itself does not seems to be an empty file). By the way, if I use
I tried with your example file:
...which worked actually for bestguess_norm, but not for beagleap_norm as the input file did not have AP format information I think. The main differences I can see in this your example input file are:
Regarding the last difference: I also tried to provide an input file of raw Beagle imputed SNPs/TRs (without DR2 > 0.3 filter first), matching fully with the imputation reference panel (testing chr21) -> and this did not work either, it throws the following error:
...I got a similar error like this earlier when I was analyzing chr18 Beagle-imputed VCF, which was due to a monomorphic variant as far as I remember and another variant with no alternative allele the other time, for example chr18:61752586 in the reference panel with only the reference allele of GGAAGGAAGGAA and no alternative allele (that is why I chose to do a dosage R2 > 0.3 first earlier, this way I can remove such variants also, and anyway in GWAS I want to analyze only TRs whose at least 1 allele have dosage R2 > 0.3, for instance). I did not try anything else further here onward. Can you please help with this? Thanks again for developing TRTools and thank you very much in advance! Best, |
Thanks @maegsul for testing it out and this detailed report of issues you ran into. I am going to reopen this issue and we will work on tracking these down, will update shortly. |
Hi again, Some comments and questions:
|
Thanks a lot @gymreklab! To go through the comments and the questions:
This actually runs TRTools on The reference panel I use is https://s3.amazonaws.com/snp-str-imputation/1000genomes/1kg.snp.str.chr18.vcf.gz and the first SNP and STR variants appearing in the file are as below:
And my input imputed VCF looks like below (first SNP and first STR):
Do these files look OK and similar? (if you need more information, I can provide as well). Also, if you have a test file available for the use of annotaTR on 2018 Saini et al. imputation reference panel, I can also test it.
Thank you once again! |
Hi @maegsul below is a test using the Saini panel to impute in a couple 1000 Genomes samples (which is a bit awkward since those are also the samples in the ref panel, but this is just to test the pipeline steps) followed by annotaTR + checking the pgen files with plink2. The first SNP/STR in the ref panel and output match what I see. However I am still not sure why you got an error loading the reference panel with annotaTR. that step is to supposed to happen before the imputed VCF is even read so should only depend on the reference panel itself.
AnnotaTR outputs to the terminal:
and outputs these files:
|
Hi
First of all, thanks for developing TRTools! I have a question regarding imputed STR input files.
I have a Beagle (v5.4) imputed VCF file that looks like below:
The imputation reference panel is from: http://gymreklab.com/2018/03/05/snpstr_imputation.html that was described in the Saini et al. paper: https://www.nature.com/articles/s41467-018-06694-0
I typically split these STR alleles and perform biallelic STR GWAS on binary phenotypes of interest, however I am also interested in a length-based STR GWAS. That is how I came across with associaTR available in TRTools, that seems to be capable of running length-based GWAS. I made a test as below with chr18 STRs:
associaTR results.tsv input_chr18_fortesting.vcf.gz cases phenotypes_forTest_caseControl.npy --same-samples --beagle-dosages --vcftype hipstr
...however this gave an error as:
Of note, above I set --vcftype as hipstr, as Saini et al. paper seems to use HipSTR.
Then I realized this section of the documentation regarding Beagle: https://trtools.readthedocs.io/en/stable/CALLERS.html#beagle and used trtools_prep_beagle_vcf.sh convert my imputed VCF into a VCF that can be used in associaTR:
trtools_prep_beagle_vcf.sh hipstr 1kg.snp.str.chr18.vcf.gz input_chr18_fortesting.vcf.gz converted_input_chr18_fortesting.vcf.gz
...where 1kg.snp.str.chr18.vcf.gz was the reference panel downloaded from http://gymreklab.com/2018/03/05/snpstr_imputation.html, as mentioned above. I got another error here:
Is this error related to the fact that there is no INFO tag of "START" (not sure what it is?) in the chr18 STR imputation reference file*, and/or something else related to the "unknown file type" (maybe my file is not in "HipSTR" format?)?
Can you please help with this error? Thank you very much in advance!
Best,
Fahri
*PS: This is how the imputation reference file header and a randomly chosen STR look like:
The text was updated successfully, but these errors were encountered: