Skip to content
This repository has been archived by the owner on Sep 14, 2021. It is now read-only.

refactor 2-way PCA command line options #71

Open
deflaux opened this issue Jun 1, 2015 · 2 comments
Open

refactor 2-way PCA command line options #71

deflaux opened this issue Jun 1, 2015 · 2 comments
Assignees

Comments

@deflaux
Copy link
Contributor

deflaux commented Jun 1, 2015

There seems to be a minor issue with the order of command line options. In the example below, --num-reduce-partitions is shown in two different positions.

Works fine:

spark-submit   --class com.google.cloud.genomics.spark.examples.VariantsPcaDriver   \
--conf spark.shuffle.spill=true   \
--master spark://hadoop-m:7077 \
googlegenomics-spark-examples-assembly-1.0.jar \
--client-secrets client_secrets.json \
--variant-set-id 10473108253681171589  3049512673186936334 \
--references 1:1:249881990 chr1:1:250226910 \
--output-path output/two-way-chr1-pca.tsv  \
--num-reduce-partitions 15

Yields a parse error:

spark-submit   --class com.google.cloud.genomics.spark.examples.VariantsPcaDriver   \
--conf spark.shuffle.spill=true   \
--master spark://hadoop-m:7077 \
googlegenomics-spark-examples-assembly-1.0.jar \
--client-secrets client_secrets.json \
--variant-set-id 10473108253681171589  3049512673186936334 \
--references 1:1:249881990 chr1:1:250226910 \
--num-reduce-partitions 15 \
--output-path output/two-way-chr1-pca.tsv

[scallop] Error: Failed to parse the trailing argument list: 'output/two-way-chr1-pca.tsv'
@pgrosu
Copy link

pgrosu commented Jun 1, 2015

@deflaux Give it a try it with underscores like this, so that the minus sign does not get interpreted:

--output-path output/two_way_chr1_pca.tsv

Let us know if that works.

Thanks,
~p

@elmer-garduno elmer-garduno self-assigned this Jun 1, 2015
@deflaux deflaux changed the title 2-way PCA command line options parse error refactor 2-way PCA command line options Jan 22, 2016
@deflaux
Copy link
Contributor Author

deflaux commented Jan 22, 2016

In general, it would be better to have specific command line options for the variant set and its associated references on the case and control sides of the comparison.

The --min-allele-frequency option for the control variant set will only work on variant sets with field AF such as 1,000 genomes phase 1 and phase 3 variants.

Perhaps something like:

--control-variant-set-id 10473108253681171589 \
--control references 1:1:249881990 \
--case-variant-set-id 3049512673186936334 \
--case-references chr1:1:250226910 \

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants