Workflow Best Practices #29

cschu · 2016-06-25T18:08:49Z

Build Workflows
- RNA-seq quantification
- RNA-seq variant calling
- ChIP-seq
Associated Datasets/Training Data
Interactive Tours based on content
Share with GTN/GOBLET
Galaxy Flavour for Workflow

Associated Coding Hack Tasks

Workflow improvements @bgruening @yvanlebras @kpoterlo @jennaj @firaan1 @MoHeydarian @ssander5 @kmurat @cschu

Submit workflows to ToolShed via the Galaxy workflow interface #4 (Submit workflows from workflow designer)
Creation/Export of High Res Images for workflows
Workflow Export to Unix @ssander5 @frederikcoppens

firaan1 · 2016-06-25T18:11:54Z

follow

MoHeydarian · 2016-06-25T18:12:44Z

follow

ghost · 2016-06-25T18:12:59Z

follow

ssander5 · 2016-06-25T18:26:12Z

follow

ssander5 · 2016-06-25T19:02:09Z

Any thoughts on including RADtag workflow? I know stacks/tuxedo is part of the program list on galaxy and RRL genome information is useful to a lot of folks, and I'm not sure that there is a workflow on galaxy yet for it!

frederikcoppens · 2016-06-25T19:43:11Z

FreeBayes vs GATK

https://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detection-methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/

yvanlebras · 2016-06-25T19:58:00Z

Follow and maybe @devikaatgit want too ;)

yvanlebras · 2016-06-25T19:58:55Z

yes @ssander5 I have Stacks workflows ;)

MoHeydarian · 2016-06-25T20:18:09Z

We've talked about generating basic workflows for::

RNA-seq for reference transcript quantification
chIP-seq to call peaks and ID occupied promoters

The idea is to provide standard workflows to give an idea of the basics for end-users. We can couple this with a short discussion of alternative options for mapping and quantification for each workflow and list the various benefits/shortcomings of these options, as it is likely impossible for any two bioinformaticians to agree on what the best practice would be.

Ideally we will contact Gigascience and see if they are interested in a 'best practices' paper of Galaxy workflows. They were very interested in this last year.

There are versions of these on the cloud under Shared Data -> Workflows.

devikaatgit · 2016-06-25T20:43:03Z

I am very much interested in this...but I have a point to state, separate workflows for prokaryotic and eukaryotic genomes/transcriptomes are needed as exons are not a thing with proks... Also I am not sure if paired end reads and single end reads can be run using the same workflow...

I will be sharing 2 dataset for 2 conditions as samples for prokaryotic RNA-seq, and if required the reference genome for them too.

My idea of workflow is for a microbial RNA-seq quantification that would be BowTie--> StringTie--> Ballgown after required trimming and quality check (Since, prokaryotes do not have exons - tophat/HISAT and similar spliced aligners are not needed, i feel Bowtie itself does an efficient job).

Since ballgown is yet not available in galaxy, it could be modified to BowTie --> Stringtie --> Cuffmerge and Cuffdiff. (Stringtie seems to be faster than Cufflinks???)...

Of course htseq is also a good option. Plus this is just a basic idea that I have, i am not sure about how "galaxy workflows" work,, so suggestions/comments are welcome and expected.

MoHeydarian · 2016-06-25T21:16:02Z

@devikaatgit you are correct that we would need different workflows for different RNA-seq purposes (SE v PE, transcript quantification v transcript reconstruction, prokaryotic v eukaryotic, etc). If we make a workflow for all of the conditions we end up with an overwhelming list of options.

I propose instead to illustrate the general steps in the a given RNA/chIP-seq workflow and discuss the alternative tools that can be used at the various analysis steps with their benefits and shortcomings. Among our group we have lots of experience with these alternative options and can highlight these nicely in a manuscript (for a preprint, or submit to Gigascience, or both). The end user can then modify our basic workflows to their specific experimental design easily in the workflow editor.

We should discuss it as a group, here or in person when we gather next.

ghost · 2016-06-25T21:17:55Z

I think @devikaatgit raised good point about separate workflows for prokaryotes

yvanlebras · 2016-06-25T21:31:58Z

I'm ok to begin basic workflows based on stacks for::

RAD-seq
- for population genomics without reference genome
- for genetic map without reference genome
- for genetic map with reference genome

ssander5 · 2016-06-25T23:24:56Z

@yvanlebras I have some test data that might be good to use for a Shared Data Library to go with your stacks pipeline.

RADseq data from Oaks (chosen because the genome available, has small data sets, unique endpoint analysis for RADseq)
Publication: Hipp AL et al., "A framework phylogeny of the American oak clade based on sequenced RAD data.",PLoS One, 2014 Apr 4;9(4):e93975

Libraries:

SRR1873227 - 124M
SRR1873257 - 167M
SRR1873259 - 172M

Typical Issues with RADseq Captured:

Chloroplast contamination
Double digest (weird adapter placement - on 3’ side, must remove)
barcodes still present

Aims Captured:

Small, representative data - Y
Highlights typical issues -Y
Published analyses -Y (uses stacks pipeline - test Yvans)
Publicly accessible - Y
Data reduction methods captured - Y
- Took only three of the several species for reduced size of analysis
- No manipulation was required.

cschu · 2016-06-26T01:01:49Z

@devikaatgit I have checked with Dave B., who started to wrap Ballgown last year. He confirmed that he has not finished the wrapper, yet. I remember finding this particular wrapper quite challenging, myself, so I opened a ticket requesting the wrapper to be created again. #41

devikaatgit · 2016-06-26T13:31:44Z

Thanks for that @cschu .

Can anybody tell me why BOWTIE is not integrated to galaxy??? or is that I just missed it?

Just now noticed - I couldnt find Bowtie to build index and align fastq reads that I have (not Fastqsanger/Illumina)... As I had discussed with some of you earlier, I used bowtie Commandline tool from desktop and took to galaxy for the remaining steps only, as it was easier and took less time to upload BAM files - compared to fastq files.

yvanlebras · 2016-06-26T13:34:56Z

Hi @devikaatgit,

Bowtie is on the usegalaxyserver https://usegalaxy.org/root?tool_id=toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.2.6.2 and I found it also on the dedicated cloud one

----- Mail original -----

De: "devikaatgit" [email protected]
À: "bxlab/galaxy_hackathon" [email protected]
Cc: "Yvan Le Bras" [email protected], "Mention"
[email protected]
Envoyé: Dimanche 26 Juin 2016 15:31:46
Objet: Re: [bxlab/galaxy_hackathon] Workflow Best Practices (#29)

Thanks for that @cschu .

Can anybody tell me why BOWTIE is not integrated to galaxy??? or is that I
just missed it?

Just now noticed - I couldnt find Bowtie to build index and align fastq reads
that I have (not Fastqsanger/Illumina)... As I had discussed with some of
you earlier, I used bowtie Commandline tool from desktop and took to galaxy
for the remaining steps only, as it was easier and took less time to upload
BAM files - compared to fastq files.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or mute the thread .

Yvan Le Bras, PhD @Yvan2935 <°))))><
e-Biogenouest project http://www.e-biogenouest.org
CNRS UMR 6074 IRISA-INRIA, Campus de Beaulieu, 35042 Rennes Cedex
tél.: +33 (0) 2 99 84 71 79 / +33 (0) 6.10.43.96.51
[email protected]

devikaatgit · 2016-06-26T13:53:14Z

Hi thank you for that, but I was looking for BOWTIE specifically (since my reads are shorter than 50bp, BOWTIE is faster and sensitve than BOWTIE2 )... There is the option to map with BOWTIE for fastqillumina reads, but not normal fastq reads... Anybody knows how FASTQgroomer functions? Is it used to convert fastq reads to fastqillumina?

frederikcoppens · 2016-06-26T14:11:05Z

I'm installing original bowtie in the Cloud instance, should appear in a bit

devikaatgit · 2016-06-26T14:22:02Z

That would be great @frederikcoppens THUMBS UP!!!

frederikcoppens · 2016-06-26T14:23:10Z

which reference do you need, the tools there but without ref it's not very useful ;-)

devikaatgit · 2016-06-26T14:24:42Z

http://bacteria.ensembl.org/Staphylococcus_aureus_subsp_aureus_str_jkd6008/Info/Index

devikaatgit · 2016-06-26T14:27:33Z

Anybody has some good preloaded workflows for RNA-seq in galaxy? Please share...

frederikcoppens · 2016-06-26T14:38:53Z

ref is there, let me know if you have issues
@MoHeydarian do you an RNA-seq workflow available

yvanlebras · 2016-06-26T15:22:22Z

@ssander5 thanks for the RADseq data!

@MoHeydarian or @frederikcoppens, We just finish yesterday a first total STACKS 1.4.0 Galaxy integration, can you install coresponding tools (through the suite_stacks main TS repo) on the dedicated AWS VM ?

devikaatgit · 2016-06-26T16:18:00Z

Pls see the following link https://github.com/nekrut/galaxy/wiki/Reference-based-RNA-seq

devikaatgit · 2016-06-26T16:22:57Z

https://usegalaxy.org/u/devikasub/w/workflow-constructed-from-history-gccworkflow-1

This is a workflow that I created for the purpose. Sample input datasets are available at

https://usegalaxy.org/u/devikasub/h/bacterial-rna-seq-2-condition-single-replicate-datasets

@MoHeydarian @yvanlebras pls see and comment nd also let me know whats to be done next.

ghost · 2016-06-26T17:00:25Z

Hi I am repeatly getting the following while trying to run chip-seq worflkow on cloud:

Internal Server Error
Galaxy was unable to successfully complete your request

An error occurred.
This may be an intermittent problem due to load or other unpredictable factors, reloading the page may address the problem.

The error has been logged to our team.

MoHeydarian · 2016-06-26T17:41:26Z

@kpoterlo I think the issue was that the wrong output selection from Trimmomatic was chained to BWA in the workflow. I corrected the Trimmomatic output chained to BWA and the workflow goes to completion, though there is an error with MACs appearing now.

The corrected version is under shared data -> workflows and indicated with 'v2'.

ghost · 2016-06-26T18:18:57Z

@MoHeydarian thanks, running it for mouse p63 chip-seq in keratinocytes

MoHeydarian · 2016-06-26T18:20:28Z

We have working versions of the RNAseq and chIPseq workflows under shared data -> workflows.

The RNAseq workflow got a bit messy in order for Deseq2 to use htseq input. For the purposes of showing an introductory example workflow, it may be worth using Cuffdiff to quantify transcript expression (for a clearer example).

I have also shared two histories with example data that works with these workflows. You can find these under the tool icon in the history panel under 'Histories shared with me'.

Comments and discussion, go!

yvanlebras · 2016-06-26T21:01:57Z

@MoHeydarian @frederikcoppens I'm testing Stacks tools on the VM but encounter difficulties because it appears the installation of binaries fails?

error message:

Fatal error: Exit code 127 (Error in Stacks execution)
/mnt/galaxy/tmp/job_working_directory/001/1007/tool_script.sh: line 9: denovo_map.pl: command not found

I think this is due to the galaxy config of the AWS and related to conda. Can you try this:

sudo cp /mnt/galaxy/galaxy-app/config/dependency_resolvers_conf.xml.sample /mnt/galaxy/galaxy-app/config/dependency_resolvers_conf.xml

put conda at the first on the list in:

sudo nano /mnt/galaxy/galaxy-app/config/dependency_resolvers_conf.xml

then add in config/galaxi.ini:

conda_auto_install = True
conda_auto_init = True

restart and desinstall / reinstall stacks?

yvanlebras · 2016-06-29T05:40:10Z

@MoHeydarian @frederikcoppens Can you also add MultiQC tool to the AWS instance ? I just finish a MultiQC Galaxy tour that you can add to the instance too. Maybe MultiQC can be add as a final step of Data hackathon workflow using compatible tools like FastQC, cutadapt, Tophat2, Featurecounts, Samtools stats, Picard, Bismark...

cschu assigned frederikcoppens Jun 25, 2016

ssander5 mentioned this issue Jun 25, 2016

Enhancing Shared Data Libraries #30

Open

jennaj mentioned this issue Jun 26, 2016

2016 GCC Data Hackathon Hub #31

Open

26 tasks

yvanlebras mentioned this issue Jun 29, 2016

Wrapping up the GCC2016 Datathon #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow Best Practices #29

Workflow Best Practices #29

cschu commented Jun 25, 2016 •

edited

Loading

firaan1 commented Jun 25, 2016

MoHeydarian commented Jun 25, 2016

ghost commented Jun 25, 2016

ssander5 commented Jun 25, 2016

ssander5 commented Jun 25, 2016

frederikcoppens commented Jun 25, 2016

yvanlebras commented Jun 25, 2016

yvanlebras commented Jun 25, 2016

MoHeydarian commented Jun 25, 2016

devikaatgit commented Jun 25, 2016

MoHeydarian commented Jun 25, 2016

ghost commented Jun 25, 2016

yvanlebras commented Jun 25, 2016

ssander5 commented Jun 25, 2016 •

edited

Loading

cschu commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

yvanlebras commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

frederikcoppens commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

frederikcoppens commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

frederikcoppens commented Jun 26, 2016

yvanlebras commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

ghost commented Jun 26, 2016

MoHeydarian commented Jun 26, 2016

ghost commented Jun 26, 2016

MoHeydarian commented Jun 26, 2016

yvanlebras commented Jun 26, 2016 •

edited

Loading

yvanlebras commented Jun 29, 2016

Workflow Best Practices #29

Workflow Best Practices #29

Comments

cschu commented Jun 25, 2016 • edited Loading

firaan1 commented Jun 25, 2016

MoHeydarian commented Jun 25, 2016

ghost commented Jun 25, 2016

ssander5 commented Jun 25, 2016

ssander5 commented Jun 25, 2016

frederikcoppens commented Jun 25, 2016

yvanlebras commented Jun 25, 2016

yvanlebras commented Jun 25, 2016

MoHeydarian commented Jun 25, 2016

devikaatgit commented Jun 25, 2016

MoHeydarian commented Jun 25, 2016

ghost commented Jun 25, 2016

yvanlebras commented Jun 25, 2016

ssander5 commented Jun 25, 2016 • edited Loading

cschu commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

yvanlebras commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

frederikcoppens commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

frederikcoppens commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

frederikcoppens commented Jun 26, 2016

yvanlebras commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

devikaatgit commented Jun 26, 2016

ghost commented Jun 26, 2016

MoHeydarian commented Jun 26, 2016

ghost commented Jun 26, 2016

MoHeydarian commented Jun 26, 2016

yvanlebras commented Jun 26, 2016 • edited Loading

yvanlebras commented Jun 29, 2016

cschu commented Jun 25, 2016 •

edited

Loading

ssander5 commented Jun 25, 2016 •

edited

Loading

yvanlebras commented Jun 26, 2016 •

edited

Loading