forked from miRkwood-RNA/miRkwood
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
190 lines (124 loc) · 6.44 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
SYNOPSIS
miRkwood is an application that allows for the fast and easy identification of microRNAs. It is specifically designed for plant microRNAs.
INSTALL
See file miRkwood_installation.md.
USAGE
miRkwood comes in two distinct pipelines, according to the input data type.
-mirkwood.pl (abinitio pipeline): scans a genomic sequence and finds all potential microRNA precursors.
Input: a FASTA file.
-mirkwood-bed.pl (smallRNAseq pipeline): analyses small RNA deep sequencing data and find all potential microRNAs.
Input : a BED file.
OPTIONS
-mirkwood.pl: perl -I/{miRkwood_path}/cgi-bin/lib/ mirkwood.pl [options]
Mandatory options:
--input
Path to the fasta file.
--output
Output directory. If non existing it will be created. The directory
must be empty.
Additional options:
--both-strands
Scan both strands.
--species-mask
Mask coding regions against the given organism.
--shuffles
Compute thermodynamic stability (shuffled sequences).
--filter-mfei
Select only sequences with MFEI < -0.6.
--filter-rrna
Filter out ribosomal RNAs (using RNAmmer).
--filter-trna
Filter out tRNAs (using tRNAscan-SE).
--align
Flag conserved mature miRNAs (alignment with miRBase + miRdup).
--varna
Allow the structure generation using Varna.
--help
Print a brief help message and exits.
--man
Prints the manual page and exits.
-mirkwood-bed.pl: perl -I/{miRkwood_path}/cgi-bin/lib/ mirkwood-bed.pl [options]
Mandatory options:
--input
Path to the BED file (created with our script mirkwood-bam2bed.pl).
--genome
Path to the genome (fasta format).
--output
Output directory. If non existing it will be created. The directory
must be empty.
Additional options:
--shuffles
Compute thermodynamic stability (shuffled sequences).
--align
Flag conserved mature miRNAs (alignment with miRBase + miRdup).
--no-filter-mfei
Don't filter out sequences with MFEI >= -0.6. Default : only keep
sequences with MFEI < -0.6.
--mirbase
If you have a gff file containing known miRNAs for this assembly,
use this option to give the path to this file.
--gff
List of annotation files (gff or gff3 format). Reads matching with
an element of these files will be filtered out. For instance you can
filter out CDS by providing a suitable GFF file.
--no-filter-bad-hairpins
By default the candidates with a quality score of 0 and no
conservation are discarded from results and are stored in a BED
file. Use this option to keep all results.
--min-read-positions-nb
Minimum number of positions for each read to be kept. Default : 0.
--max-read-positions-nb
Maximum number of positions for each read to be kept. Default : 5
(reads that map at more than 5 positions are filtered out).
--varna
Allow the structure generation using Varna.
--help
Print a brief help message and exits.
--man
Prints the manual page and exits.
OUTPUT
For both pipelines:
alignments : folder containing all alignments files
(only if option --align is on).
images: folder containing images created by VARNA
(only if option --varna is on).
results: folder containing all results files, in several
formats (csv, fa, gff, html and txt).
sequences: folder containing sequences for each candidate
in fasta and dotbracket format, alternatives sequences
if they exist and optimal structure if it is different
from the stemloop structure.
YML: folder containing all candidates data in YAML format.
basic_candidates.yml: contains a summary of all candidates
with basic informations (this file is needed to create
the results files).
log.log: log file (hey, what did you expect?)
run_options.cfg: config file with the chosen options.
ab initio pipeline only:
masks: folder containing results of BlastX, rnammer and tRNAscan-SE.
input_sequences.fas: your sequences.
smallRNAseq pipeline only:
read_clouds: folder containing all text files for the candidates
read clouds.
bed_sizes.txt: tabulated file with the number of reads in each BED file.
summary.txt: contains a summary of your options and of results.
Depending on the options you chose for your job you may find
some of the following files:
your_bed_your_GFF.tar.gz: a compressed BED containing all reads matching
to features from your GFF file, for each GFF file that you
provided.
your_bed_multimapped.tar.gz: a compressed BED containing all reads from your
input BED file mapping at less than --min-read-positions-nb positions
or more than --max-read-positions-nb positions.
your_bed_miRNAs.tar.gz: a compressed BED containing all reads from your
input BED file corresponding to miRNAs present in miRBase.
your_bed_orphan_clusters.tar.gz: a compressed BED containing all reads from your
input BED file that fall into a peak but that don't correspond to
a valid miRNA candidate.
your_bed_orphan_hairpins.tar.gz: a compressed BED containing all candidates
with a quality score of 0 and no conservation. By default
these candidates are excluded from final results, but you can
change this behaviour with flag option --no-filter-bad-hairpins.
your_bed_filtered.bed: a BED containing all reads from your
input BED file that have not been filtered out in one of the
previous categories.