Skip to content

Genetic correlation calculation pipeline via summary statistics for PheWeb

Notifications You must be signed in to change notification settings

statgen/pheweb-rg-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pheweb-rg-pipeline

Pipeline for calculating genetic correlations via summary statistics between >1,000 phenotypes in PheWeb. Genetic correlation is on the observed scale (i.e. not liability scale).

Pipeline allows to choose the following tools:

Pipeline can be run locally or on SLURM.

Required tools

Required data:

How to run

Input

The input file must be named pheno-list.json. It has the same format as in the PheWeb data import pipeline.

[
 {
  "assoc_files": ["/home/watman/ear-length.epacts.gz"],
  "phenocode": "ear-length",
  "num_cases": 10000,
  "num_controls": 100
 },
 {
  "assoc_files": ["/home/watman/eats-kimchi.autosomal.epacts.gz"],
  "phenocode": "eats-kimchi",
  "num_cases": 14000,
  "num_controls": 100
 }
]

The key difference are:

  • Fields num_cases and num_controls are required
  • Only one file per trait is allowed in assoc_files (i.e. cannot split the summary statistics by chromosome)

If you have a tab-delimited file (no header) with the following columns: full path to summary stat file, phenocode, number of cases, number of controls, use tab2pheno-list.py -i [file.tsv] to create pheno-list.json.

Further details on how to create the input file are at https://github.com/statgen/pheweb.

Run LDSC

- Configuration

Before running the pipeline you may need to change your LDSC/nextflow.config file:

  • Specify path to the directory where LDSC is installed in the LDSC field.
  • Specify path to the HapMap SNP list (i.e. w_hm3.snplist) in the LDSC_snplist field.
  • Specify path to the LD scores directory (i.e. eur_w_ld_chr/) in the LDSC_scores field.
  • Provide column names inside the columns configuration scope.
  • Set no_effect to 0 if analyzing regression coefficients or 1 if analyzing odds-ratios.

- Locally

Inside the LDSC/nextflow.config file, set the number of cpus you want to use via the cpus parameter:

...
$local {
  cpus = 4
}
...

Place your input file pheno-list.json inside the directory where you want to save results (this will also be the working directory for all intermediate files). Then, in the same directory run:

nextflow run /path/to/LDSC.nf

- With SLURM

Inside the LDSC/nextflow.config file, uncomment executor = "slurm" line and comment executor = "local" line:

executor = "slurm"
// executor = "local"

Set SLURM queue name via the queue parameter.

Place your input file pheno-list.json inside the directory where you want to save results (this will also be the working directory for all intermediate files). Then, in the same directory run:

nextflow run /path/to/LDSC.nf

Run SumHer

- Configuration

Before running the pipeline you may need to change your SumHer/nextflow.config file:

  • Specify path to the LDAK executable in the LDAK field.
  • Specify path to the directory with the referfernce panel (in Plink's bim and bam files; may be split by chromosome) in the ref_panel field.

- Locally

Inside the SumHer/nextflow.config file, set the number of cpus you want to use via the cpus parameter e.g.:

...
$local {
  cpus = 4
}
...

Place your input file pheno-list.json inside the directory where you want to save results (this will also be the working directory for all intermediate files). Then, in the same directory run:

nextflow run /path/to/SumHer.nf

- With SLURM

Inside the SumHer/nextflow.config file:

  1. uncomment executor = "slurm" line and comment executor = "local" line e.g.:
executor = "slurm"
// executor = "local"
  1. Set SLURM queue name via the queue parameter.
  2. Set maximal number of parallel SLURM jobs via the queueSize e.g.:
$slurm {
  queueSize = 1000
}

Place your input file pheno-list.json inside the directory where you want to save results (this will also be the working directory for all intermediate files). Then, in the same directory run:

nextflow run /path/to/LDSC.nf

Output

Your final output (matrix of correlations among all the traits) is in the workdir directory in the result/ALL.RG.txt file.

The pipeline creates directories:

  • work: Nexflow working directory with output files from all steps.
  • result: directory with final merged result

About

Genetic correlation calculation pipeline via summary statistics for PheWeb

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published