-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdocs.html
362 lines (354 loc) · 24.5 KB
/
docs.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="description" content="A tool for systematic identification and comparison of processes, phenotypes and diseases associated with RNA-binding proteins from condition-specific CLIP-seq profiles">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<title>Documentation – Seten</title>
<link rel="shortcut icon" href="assets/img/favicon.png" type="image/png">
<link rel="stylesheet" href="assets/css/libs/bootstrap.min.css" charset="utf-8">
<link rel="stylesheet" href="assets/css/libs/font-awesome.css" charset="utf-8">
<link rel="stylesheet" href="assets/css/app.css" charset="utf-8">
</head>
<body>
<div class="header">
<div class="container">
<h1>Seten</h1>
<p>
A tool for systematic identification and comparison of processes, phenotypes and diseases associated with RNA-binding proteins from condition-specific CLIP-seq profiles
</p>
</div>
</div>
<nav class="navbar navbar-inverse navbar-static-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">Seten</a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav">
<li><a href="index.html"><i class="fa fa-home"></i> Start</a></li>
<li><a href="tutorial.html"><i class="fa fa-question-circle"></i> Tutorial</a></li>
<li class="active"><a href="docs.html"><i class="fa fa-book"></i> Docs</a></li>
<li><a href="contact.html"><i class="fa fa-envelope"></i> Contact</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
<a href="https://github.com/gungorbudak/seten" title="GitHub page for Seten web interface" target="_blank">
<i class="fa fa-github"></i> GitHub
</a>
</li>
<li>
<a href="https://github.com/gungorbudak/seten-cli" title="GitHub page for Seten command line interface" target="_blank">
<i class="fa fa-terminal"></i> Seten CLI
</a>
</li>
</ul>
</div>
<!--/.nav-collapse -->
</div>
</nav>
<div class="container text-justify">
<h2>Documentation</h2>
<hr>
<div class="row">
<div class="col-sm-8">
<a name="introduction"></a>
<h3>Introduction</h3>
<p>
Seten is a tool for systematic identification and comparison of processes, phenotypes and diseases associated with RNA-binding proteins from condition-specific CLIP-seq profiles. It has a <a href="#features-browser-user-interface">browser user interface</a> and <a href="#features-command-line-interface">command line interface</a> both of which <a href="#implementation">implement the same method</a>.
</p>
<a name="features"></a>
<h3>Features</h3>
<a name="features-web-interface"></a>
<h4>Web interface</h4>
<p>
Seten's web interface (this website) provides an on-demand application for Seten's implementation.
</p>
<ul>
<li>Runs almost all latest versions of modern browsers across all operating systems</li>
<li>Requires no installation</li>
<li>Utilizes HTML5 File API to read local files (fast and safe)</li>
<li>Utilizes HTML5 Web Workers for parallel execution of tasks on multiple cores</li>
<li>Visualizes multiple results as bar charts and bubble charts (for comparison)</li>
<li>Provides options to visualize results with different thresholds</li>
<li>Provides export buttons for bar charts, tables and bubble charts of the results</li>
</ul>
<a name="features-command-line-interface"></a>
<h4>Command line interface</h4>
<p>
Seten also has a command line interface that implements the same method but also lets you analyze multiple datasets in a single command.
</p>
<ul>
<li>Runs on all operating systems</li>
<li>Requires single-line command installation (requires Python 2.7+, Python package manager)</li>
<li>Utilizes standard Python multiprocessing module for parallel execution of tasks on multiple cores</li>
<li>Inputs single BED file or multiple BED files in a directory</li>
<li>Outputs results in TSV formatted tables</li>
</ul>
<p>
Please refer to <a href="https://github.com/gungorbudak/seten-cli" target="_blank">its Github page</a> for install and more detailed usage.
</p>
<a name="implementation"></a>
<h3>Implementation</h3>
<p>
<a href="assets/img/docs_1.png" target="_blank">
<img class="img-responsive" src="assets/img/docs_1.png" alt="Seten implementation" />
</a>
</p>
<p>
Binding sites from the input BED file are mapped onto their corresponding HGNC symbols using a mapping table from Ensembl. After mapping is complete, if multiple scores are available for a gene, highest of the scores is taken to represent that gene, which results in distinct set of genes and their corresponding scores representing the extent of binding by an RBP. Next, for every gene set in a gene set collection; (i) functional enrichment is performed by using Fisher’s Exact test, (ii) gene set enrichment is performed by implementing a permutation based test where for each gene set, the common genes between the gene set and the dataset are found, multiple times (e.g. 1000), the scores of common genes are compared with the scores of randomly picked genes using Mann Whitney U test, and a final p-value is computed by <i>max</i>(1 – # sign. tests / # total tests, 1 / # total tests), and (iii) after correcting p-values from functional enrichment part, p-values for every gene set from functional enrichment and gene set enrichment are combined to have a single p-value for each gene set by using Fisher’s method of p-value integration. Fisher’s exact test is one of the common ways of measuring functional enrichment of a gene set in a dataset, which takes the number of common genes (but not the extent of binding in the context of CLIP-seq profiles) between a gene set and the input dataset. However, since each gene in the input resulting from CLIP-seq experiments can have a different score, by permuting multiple times and checking if the scores of common genes are significantly higher than randomly picked scores (p < 0.05; 1000 iterations), we’re also taking the scores into account which can’t be handled by standard functional enrichment tests. We have validated that p-values obtained from functional enrichment and gene set enrichment are independent and hence employed Fisher’s method for combining these p-values.
</p>
<a name="datasets"></a>
<h3>Datasets</h3>
<p>
The following datasets are used to test Seten during development and also their gene set association results are given in Seten's browser user interface (this website) under <strong>Explore</strong> panel. We downloaded human RBP datasets including target binding affinity scores from CLIPdb <sup>1</sup> and ENCODE project <sup>2</sup>. Since there are multiple samples for some RBP-cell line pairs in CLIPdb and ENCODE, we simply merged them before the gene set association analyses. After preprocessing these datasets, we obtained 68 unique RBP-cell line pairs for CLIPdb and 138 unique RBP-cell line pairs for the ENCODE project.
</p>
<a name="inputs"></a>
<h3>Inputs</h3>
<a name="inputs-file"></a>
<h4>File</h4>
<p>
The input file for Seten can be a UCSC BED formatted text file or Tab-separated (two-column) gene - score file. The BED file is the most common format used by peak callers. However, if you have a two-column gene - score file, you can also use it but please make sure you use the official gene name for the corresponding organism (e.g. FlyBase Gene Name for fruit fly datasets) and don't include the header line.
</p>
<p>
A BED file is a UCSC BED formatted text file which stores chromosomal location data as lines. Seten requires (at least) 5-column BED file having chrom, chromStart, chromEnd, name, score. Go to <a href="https://genome.ucsc.edu/FAQ/FAQformat.html#format1" target="_blank">UCSC's website to obtain more information about BED format</a>.
</p>
<p>
The first 10 lines of a sample BED file is as follows:
</p>
<pre>
chr1 566740 566760 binding_site_1 25 . 2.83477e-06
chr1 566860 566880 binding_site_2 10 . 0.00657314
chr1 569040 569060 binding_site_3 11 . 0.00469793
chr1 881160 881180 binding_site_4 9 . 0.00843972
chr1 1035940 1035960 binding_site_5 8 . 0.00997733
chr1 1152440 1152460 binding_site_6 8 . 0.00997733
chr1 1242120 1242140 binding_site_7 9 . 0.00843972
chr1 1328880 1328900 binding_site_8 10 . 0.00657314
chr1 1337360 1337380 binding_site_9 8 . 0.00997733
chr1 1438640 1438660 binding_site_10 12 . 0.00315889
...</pre>
<a name="inputs-organism"></a>
<h4>Organism</h4>
<p>
There are six available organisms provided in Seten's web interface:
</p>
<ul>
<li>Human (hg19 build and hg38 build)</li>
<li>Mouse (mm10 build)</li>
<li>Rat (rn6 build)</li>
<li>Fruit fly (dme6 build)</li>
<li>Worm (cel235 build)</li>
<li>Yeast (r64-1-1 build)</li>
</ul>
<p>
It's possible to extend the Seten for more organisms; however, this is only available for Seten's command line interface. Please refer to <a href="https://github.com/gungorbudak/seten-cli" target="_blank">its Github page</a> for more information.
</p>
<a name="inputs-gene-set-collections"></a>
<h4>Gene set collections</h4>
<p>
Following gene set collections are available in Seten's both web interface and command line interface. In parenthesis, the availability of that gene set collection for the corresponding organisms is given.
</p>
<ul>
<li>Pathways: BIOCARTA (human, mouse)</li>
<li>Pathways: REACTOME (human, mouse, rat, fruit fly, worm, yeast)</li>
<li>Pathways: KEGG (human, mouse, rat, fruit fly, worm, yeast)</li>
<li>GO: Biological Process (human, mouse, rat, fruit fly, worm, yeast)</li>
<li>GO: Molecular Function (human, mouse, rat, fruit fly, worm, yeast)</li>
<li>GO: Cellular Compartment (human, mouse, rat, fruit fly, worm, yeast)</li>
<li>Human Phenotype Ontology (human)</li>
<li>MalaCards Disease Ontology (human)</li>
</ul>
<a name="inputs-enrichment-method"></a>
<h4>Enrichment method</h4>
<p>
By default, Seten uses a gene set enrichment method based on permutation. It also provides a traditional functional enrichment analysis method (implementing Fisher's exact test or FET).
</p>
<a name="inputs-scoring-method"></a>
<h4>Scoring method</h4>
<p>
The scoring method is used to compute single gene level score from multiple scores for the same gene (after mapping to genes from chromosomal locations or given multiple scores in the two column gene - score file) in CLIP-seq datasets. By default, maximum method is selected, which gets the maximum score among all scores for the same gene and does the gene set enrichment based on that score. Other methods are minimum, mean, median, sum.
</p>
<p>
Note that it's only available for gene set enrichment analysis.
</p>
<a name="inputs-correction-method"></a>
<h4>Correction method</h4>
<p>
The correction method is used to correct p-values obtained from functional enrichment analysis (Fisher's exact test or FET). Currently, Seten's web interface has only one method, which is false discovery rate (FDR) or Benjamini - Hochberg. Seten's command line interface includes several more methods or correcting these p-values.
</p>
<p>
Note that it's only available for functional enrichment analysis.
</p>
<a name="inputs-gene-set-cutoff"></a>
<h4>Gene set cutoff</h4>
<p>
The gene set cutoff (defaulting to 350) is used to set an upper limit to gene set enrichment analysis or functional analysis for doing the analysis on the gene sets that are more specific.
</p>
<a name="inputs-overlap-cutoff"></a>
<h4>Overlap cutoff</h4>
<p>
The overlap cutoff (defaulting to 5) is used to set a lower limit to number of overlapping genes between each gene set and the CLIP-seq dataset to eliminate the gene sets that have a few genes in common with the given datasets.
</p>
<p>
For getting statistically meaningful results, it's better to set it as at least 5.
</p>
<a name="inputs-significance-cutoff"></a>
<h4>Significance cutoff</h4>
<p>
The significance cutoff (defaulting to 0.05) is used to set an upper limit to the p-value obtained from Mann-Whitney U test done in each permutation of gene set enrichment analysis to check if the scores (of genes common between each gene set and the given dataset) are significantly higher than randomly picked scores the same dataset.
</p>
<p>
Note that it's only available for gene set enrichment analysis.
</p>
<a name="inputs-number-of-iterations"></a>
<h4>Number of iterations</h4>
<p>
The number of iterations (defaulting to 1000) is the number of times to permutate and check for significance using randomly picked scores.
</p>
<p>
Note that it's only available for gene set enrichment analysis.
</p>
<a name="outputs"></a>
<h3>Outputs</h3>
<a name="outputs-bar-charts"></a>
<h4>Bar charts</h4>
<p>
Bar charts are available only in web interface. They represent significant results based on percent (defaults to > 5%) and p-value (defaults to < 0.01) thresholds. These thresholds can be configured in the options. X-axis of bar charts has the names of gene sets and y-axis has the significance values obtained by –<i>Log<sub>10</sub></i>(Combined p-value of the gene set).
</p>
<a name="outputs-tables"></a>
<h4>Tables</h4>
<p>
Tables are available in both interfaces of Seten. They are provided for each gene set collection, and they represent gene set association results. In browser user interface, tables can show significant results based on percent (defaults to > 5%) and p-value (defaults to < 0.01) thresholds. Its columns are;
<ul>
<li>Gene set name</li>
<li>Overlapping genes</li>
<li>Number of overlapping genes</li>
<li>Number of genes in the gene set</li>
<li>Percent of overlapping genes in the genes of the gene set</li>
<li>p-value for functional enrichment</li>
<li>Corrected p-value for functional enrichment</li>
<li>p-value for gene set enrichment</li>
<li>Combined p-value from functional (after correction) and gene set enrichment</li>
</ul>
</p>
<a name="outputs-bubble-charts"></a>
<h4>Bubble charts</h4>
<p>
Bubble charts are available only in web interface. They represent significant results based on percent (defaults to > 5%) and p-value (defaults to < 0.01) thresholds. These thresholds can be configured in the options. X-axis of bubble charts has the names of RBPs and y-axis has the gene set names. The radius of bubbles comes from their corresponding significance values obtained by –<i>Log<sub>10</sub></i>(Combined p-value of the gene set).
</p>
<a name="tutorial"></a>
<h3>Tutorial</h3>
<p>
Please go to <a href="tutorial.html" target="_blank">tutorial page</a> that can guide you to use Seten.
</p>
<a name="support"></a>
<h3>Support</h3>
<p>
Please go to <a href="contact.html" target="_blank">contact page</a> to report any issues or reach to us.
</p>
<a name="source-code"></a>
<h3>Source code</h3>
<p>
Source code of the two interfaces available on GitHub.
</p>
<ul>
<li>
<a href="https://github.com/gungorbudak/seten" target="_blank">
Seten (web interface) source code
</a>
</li>
<li>
<a href="https://github.com/gungorbudak/seten-cli" target="_blank">
Seten (command line interface) source code
</a>
</li>
</ul>
<a name="references"></a>
<h3>References</h3>
<ol>
<li>Yang, Y. C. T., Di, C., Hu, B., Zhou, M., Liu, Y., Song, N., ... & Lu, Z. J. (2015). CLIPdb: a CLIP-seq database for protein-RNA interactions. BMC genomics, 16(1), 51.</li>
<li>Lovci, M. T., Ghanem, D., Marr, H., Arnold, J., Gee, S., Parra, M., ... & Massirer, K. B. (2013). Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges. Nature structural & molecular biology, 20(12), 1434-1442.</li>
<li>Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., ... & Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545-15550.</li>
<li>Köhler, S., Doelken, S. C., Mungall, C. J., Bauer, S., Firth, H. V., Bailleul-Forestier, I., ... & FitzPatrick, D. R. (2013). The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic acids research, gkt1026.</li>
<li>Rappaport, N., Twik, M., Nativ, N., Stelzer, G., Bahir, I., Stein, T. I., ... & Lancet, D. (2014). MalaCards: A Comprehensive Automatically‐Mined Database of Human Diseases. Current Protocols in Bioinformatics, 1-24.</li>
</ol>
</div>
<div class="col-sm-4">
<h3>Content</h3>
<ul>
<li><a href="#introduction">Introduction</a></li>
<li>
<a href="#features">Features</a>
<ul>
<li><a href="#features-web-interface">Web interface</a></li>
<li><a href="#features-command-line-interface">Command line interface</a></li>
</ul>
</li>
<li><a href="#implementation">Implementation</a></li>
<li><a href="#datasets">Datasets</a></li>
<li>
<a href="#inputs">Inputs</a>
<ul>
<li><a href="#inputs-file">File</a></li>
<li><a href="#inputs-organism">Organism</a></li>
<li><a href="#inputs-gene-set-collections">Gene set collections</a></li>
<li><a href="#inputs-enrichment-method">Enrichment method</a></li>
<li><a href="#inputs-scoring-method">Scoring method</a></li>
<li><a href="#inputs-correction-method">Correction method</a></li>
<li><a href="#inputs-gene-set-cutoff">Gene set cutoff</a></li>
<li><a href="#inputs-overlap-cutoff">Overlap cutoff</a></li>
<li><a href="#inputs-significance-cutoff">Significance cutoff</a></li>
<li><a href="#inputs-number-of-iterations">Number of iterations</a></li>
</ul>
</li>
<li>
<a href="#outputs">Outputs</a>
<ul>
<li><a href="#outputs-bar-charts">Bar charts</a></li>
<li><a href="#outputs-tables">Tables</a></li>
<li><a href="#outputs-bubble-charts">Bubble charts</a></li>
</ul>
</li>
<li><a href="#tutorial">Tutorial</a></li>
<li><a href="#support">Support</a></li>
<li><a href="#source-code">Source code</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
</div>
</div>
<div class="container">
<div class="panel panel-success">
<div class="panel-heading"><h3 class="panel-title">Publication</h3></div>
<div class="panel-body">
<p>Please cite Seten in your publications as:</p>
<p>Budak, G., Srivastava, R., & Janga, S. C. (2017). <a href="https://rnajournal.cshlp.org/content/23/6/836.short" target="_blank">Seten: a tool for systematic identification and comparison of processes, phenotypes, and diseases associated with RNA-binding proteins from condition-specific CLIP-seq profiles.</a> <i>RNA</i>, 23(6), 836-846.</p>
</div>
</div>
</div>
<div class="footer text-center">
<div class="container">
<span>
Seten: A tool for systematic identification and comparison of processes, phenotypes and diseases associated with RNA-binding proteins from condition-specific CLIP-seq profiles
</span>
</div>
</div>
<script src="assets/js/libs/jquery.min.js" charset="utf-8"></script>
<script src="assets/js/libs/bootstrap.min.js" charset="utf-8"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-48281803-10', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>