-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extract the variant information #9
Comments
Hi, I have the same question about the steps after generating the combined VCF file. Thanks! |
Hi all, Apologize for the delay. I have made a quick R script to convert the vcf to DENDRO input here. Please find the following script and let me know if it works. Best, |
thank you for your script. But I still have a question: the last step of calling variants is to use this command: my question is that the output.g.vcf contains the variants information of all samples, but there are no sample information in the output.g.vcf, only a merged information left in it. So how do you konw each sample(or each cell)'s varinats information? |
Sorry for the delay. You will have a sample by variants matrics with each row as a variants and each column as a sample. Do you mind elaborate on "each sample's variants information"? |
Thank you for your reply. I got a merged variants vcf file, instead of a sample by variants matrix, after doing the last step I say above(using the command 'java -jar pathtogatk/gatk/GenomeAnalysisTK.jar'). "each sample's variants information" is just the meaning of a sample by variants matrics with each row as a variants and each column as a sample. So I wonder whether you have another steps after merging the variants information of all the sample. By the way, the datasets you used is the single-cell data with Smart-seq2 protocol, which means each sample just a cell? Best wishes. 感谢您的回复。我不确定我的问题您是否理解,所以用中文在表述一遍: java -jar pathtogatk/gatk/GenomeAnalysisTK.jar -T GenotypeGVCFs -R pathtostar/STAR_hg19/ucsc.hg19.fasta 另外,您文章中使用的数据是基于Smart-seq2的单细胞数据,想跟您确认下,是不是每个样本只有一个细胞的数据? 再次感谢您,希望得到您的回复! 祝您工作顺利! |
嗯嗯 我了解了。Smart-seq2的数据是每个cell有一个BAM/fastaq file的。后面的我还是用英文写吧... You are correct. Every erc.g.vcf file contains information of a single cell: SRR2973275 and SRR2973351 are two cells. The above code will return a D-by-2 table with D as total number of variants.
Sorry for the confusion... |
Hi Zilu,
the DENDRO is an wonderful tool. It may be very useful to me. But I have a big problem when I am building the pipeline.
problem:
我不知道我有没有表述清楚,所以用中文在问一遍:
Thanks.
The text was updated successfully, but these errors were encountered: