-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathDockstore.cwl
282 lines (258 loc) · 6.96 KB
/
Dockstore.cwl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
#!/usr/bin/env cwl-runner
class: CommandLineTool
id: Seqware-Sanger-Somatic-Workflow
label: Seqware-Sanger-Somatic-Workflow
$namespaces:
foaf: http://xmlns.com/foaf/0.1/
dct:creator:
'@id': http://sanger.ac.uk/...
foaf:name: Keiran Raine
foaf:mbox: mailto:[email protected]
dct:contributor:
- foaf:name: Brian O'Connor
foaf:mbox: mailto:[email protected]
- foaf:name: Denis Yuen
foaf:mbox: mailto:[email protected]
requirements:
- class: DockerRequirement
dockerPull: quay.io/pancancer/pcawg-sanger-cgp-workflow:2.1.0
cwlVersion: v1.0
inputs:
tumor:
type: File
inputBinding:
position: 1
prefix: --tumor
secondaryFiles:
- .bai
refFrom:
type: File
inputBinding:
position: 3
prefix: --refFrom
bbFrom:
type: File
inputBinding:
position: 4
prefix: --bbFrom
normal:
type: File
inputBinding:
position: 2
prefix: --normal
secondaryFiles:
- .bai
coreNum:
type: int?
inputBinding:
position: 5
prefix: --coreNum
memGB:
type: int?
inputBinding:
position: 6
prefix: --memGB
run-id:
type: string?
inputBinding:
position: 7
prefix: --run-id
outputs:
somatic_cnv_vcf_gz:
type: File
outputBinding:
glob: '*.somatic.cnv.vcf.gz'
secondaryFiles:
- .md5
- .tbi
- .tbi.md5
somatic_cnv_tar_gz:
type: File
outputBinding:
glob: '*.somatic.cnv.tar.gz'
secondaryFiles:
- .md5
somatic_indel_vcf_gz:
type: File
outputBinding:
glob: '*.somatic.indel.vcf.gz'
secondaryFiles:
- .md5
- .tbi
- .tbi.md5
somatic_indel_tar_gz:
type: File
outputBinding:
glob: '*.somatic.indel.tar.gz'
secondaryFiles:
- .md5
somatic_sv_vcf_gz:
type: File
outputBinding:
glob: '*.somatic.sv.vcf.gz'
secondaryFiles:
- .md5
- .tbi
- .tbi.md5
somatic_sv_tar_gz:
type: File
outputBinding:
glob: '*.somatic.sv.tar.gz'
secondaryFiles:
- .md5
somatic_snv_mnv_vcf_gz:
type: File
outputBinding:
glob: '*.somatic.snv_mnv.vcf.gz'
secondaryFiles:
- .md5
- .tbi
- .tbi.md5
somatic_snv_mnv_tar_gz:
type: File
outputBinding:
glob: '*.somatic.snv_mnv.tar.gz'
secondaryFiles:
- .md5
somatic_genotype_tar_gz:
type: File
outputBinding:
glob: '*.somatic.genotype.tar.gz'
secondaryFiles:
- .md5
somatic_imputeCounts_tar_gz:
type: File
outputBinding:
glob: '*.somatic.imputeCounts.tar.gz'
secondaryFiles:
- .md5
somatic_verifyBamId_tar_gz:
type: File
outputBinding:
glob: '*.somatic.verifyBamId.tar.gz'
secondaryFiles:
- .md5
bas_tar_gz:
type: File
outputBinding:
glob: '*.bas.tar.gz'
secondaryFiles:
- .md5
qc_metrics:
type: File
outputBinding:
glob: '*.qc_metrics.tar.gz'
secondaryFiles:
- .md5
timing_metrics:
type: File
outputBinding:
glob: '*.timing_metrics.tar.gz'
secondaryFiles:
- .md5
baseCommand: [/start.sh, python, /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py]
doc: |
PCAWG Sanger variant calling workflow is developed by Wellcome Trust Sanger Institute
(http://www.sanger.ac.uk/), it consists of software components calling somatic substitutions,
indels and structural variants using uniformly aligned tumour / normal WGS sequences.
The workflow has been dockerized and packaged using CWL workflow language, the source code
is available on GitHub at: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker.
## Run the workflow with your own data
### Prepare compute environment and install software packages
The workflow has been tested in Ubuntu 16.04 Linux environment with the following hardware and software settings.
#### Hardware requirement (assuming X30 coverage whole genome sequence)
- CPU core: 16
- Memory: 64GB
- Disk space: 1TB
#### Software installation
- Docker (1.12.6): follow instructions to install Docker https://docs.docker.com/engine/installation
- CWL tool
```
pip install cwltool==1.0.20180116213856
```
### Prepare input data
#### Input aligned tumor / normal BAM files
The workflow uses a pair of aligned BAM files as input, one BAM for tumor, the other for normal,
both from the same donor. Here we assume file names are *tumor_sample.bam* and *normal_sample.bam*,
and both files are under *bams* subfolder.
#### Reference data files
The workflow also uses two precompiled reference files (*GRCh37d5_CGP_refBundle.tar.gz*,
*GRCh37d5_battenberg.tar.gz*) as input, they can be downloaded from the
ICGC Data Portal under https://dcc.icgc.org/releases/PCAWG/reference_data/pcawg-sanger.
We assume the two reference files are downloaded and put under *reference* subfolder.
#### Job JSON file for CWL
Finally, we need to prepare a JSON file with input, reference and output files specified. Please replace
the *tumor* and *normal* parameters with your real BAM file names. Parameters for output are file name
suffixes, usually don't need to be changed.
Name the JSON file: *pcawg-sanger-variant-caller.job.json*
```
{
"tumor":
{
"path":"bams/tumor_sample.bam",
"class":"File"
},
"normal":
{
"path":"bams/normal_sample.bam",
"class":"File"
},
"refFrom":
{
"path":"reference/GRCh37d5_CGP_refBundle.tar.gz",
"class":"File"
},
"bbFrom":
{
"path":"reference/GRCh37d5_battenberg.tar.gz",
"class":"File"
},
"somatic_snv_mnv_tar_gz":
{
"path":"somatic_snv_mnv_tar_gz",
"class":"File"
},
"somatic_cnv_tar_gz":
{
"path":"somatic_cnv_tar_gz",
"class":"File"
},
"somatic_sv_tar_gz":
{
"path":"somatic_sv_tar_gz",
"class":"File"
},
"somatic_indel_tar_gz":
{
"path":"somatic_indel_tar_gz",
"class":"File"
},
"somatic_imputeCounts_tar_gz":
{
"path":"somatic_imputeCounts_tar_gz",
"class":"File"
},
"somatic_genotype_tar_gz":
{
"path":"somatic_genotype_tar_gz",
"class":"File"
},
"somatic_verifyBamId_tar_gz":
{
"path":"somatic_verifyBamId_tar_gz",
"class":"File"
}
}
```
### Run the workflow
#### Option 1: Run with CWL tool
- Download CWL workflow definition file
```
wget -O pcawg-sanger-variant-caller.cwl "https://raw.githubusercontent.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/2.0.3/Dockstore.cwl"
```
- Run `cwltool` to execute the workflow
```
nohup cwltool --debug --non-strict pcawg-sanger-variant-caller.cwl pcawg-sanger-variant-caller.job.json > pcawg-sanger-variant-caller.log 2>&1 &
```
#### Option 2: Run with the Dockstore CLI
See the *Launch with* section below for details.