Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow Best Practices #29

Open
cschu opened this issue Jun 25, 2016 · 33 comments
Open

Workflow Best Practices #29

cschu opened this issue Jun 25, 2016 · 33 comments
Assignees

Comments

@cschu
Copy link
Collaborator

cschu commented Jun 25, 2016

  1. Build Workflows
    • RNA-seq quantification
    • RNA-seq variant calling
    • ChIP-seq
  2. Associated Datasets/Training Data
  3. Interactive Tours based on content
  4. Share with GTN/GOBLET
  5. Galaxy Flavour for Workflow

Associated Coding Hack Tasks

  1. Workflow improvements @bgruening @yvanlebras @kpoterlo @jennaj @firaan1 @MoHeydarian @ssander5 @kmurat @cschu
@firaan1
Copy link

firaan1 commented Jun 25, 2016

follow

3 similar comments
@MoHeydarian
Copy link

follow

@ghost
Copy link

ghost commented Jun 25, 2016

follow

@ssander5
Copy link
Collaborator

follow

@ssander5
Copy link
Collaborator

Any thoughts on including RADtag workflow? I know stacks/tuxedo is part of the program list on galaxy and RRL genome information is useful to a lot of folks, and I'm not sure that there is a workflow on galaxy yet for it!

@yvanlebras
Copy link
Collaborator

Follow and maybe @devikaatgit want too ;)

@yvanlebras
Copy link
Collaborator

yes @ssander5 I have Stacks workflows ;)

@MoHeydarian
Copy link

We've talked about generating basic workflows for::

  • RNA-seq for reference transcript quantification
  • chIP-seq to call peaks and ID occupied promoters

The idea is to provide standard workflows to give an idea of the basics for end-users. We can couple this with a short discussion of alternative options for mapping and quantification for each workflow and list the various benefits/shortcomings of these options, as it is likely impossible for any two bioinformaticians to agree on what the best practice would be.

Ideally we will contact Gigascience and see if they are interested in a 'best practices' paper of Galaxy workflows. They were very interested in this last year.

There are versions of these on the cloud under Shared Data -> Workflows.

@devikaatgit
Copy link

I am very much interested in this...but I have a point to state, separate workflows for prokaryotic and eukaryotic genomes/transcriptomes are needed as exons are not a thing with proks... Also I am not sure if paired end reads and single end reads can be run using the same workflow...

I will be sharing 2 dataset for 2 conditions as samples for prokaryotic RNA-seq, and if required the reference genome for them too.

My idea of workflow is for a microbial RNA-seq quantification that would be BowTie--> StringTie--> Ballgown after required trimming and quality check (Since, prokaryotes do not have exons - tophat/HISAT and similar spliced aligners are not needed, i feel Bowtie itself does an efficient job).

Since ballgown is yet not available in galaxy, it could be modified to BowTie --> Stringtie --> Cuffmerge and Cuffdiff. (Stringtie seems to be faster than Cufflinks???)...

Of course htseq is also a good option. Plus this is just a basic idea that I have, i am not sure about how "galaxy workflows" work,, so suggestions/comments are welcome and expected.

@MoHeydarian
Copy link

@devikaatgit you are correct that we would need different workflows for different RNA-seq purposes (SE v PE, transcript quantification v transcript reconstruction, prokaryotic v eukaryotic, etc). If we make a workflow for all of the conditions we end up with an overwhelming list of options.

I propose instead to illustrate the general steps in the a given RNA/chIP-seq workflow and discuss the alternative tools that can be used at the various analysis steps with their benefits and shortcomings. Among our group we have lots of experience with these alternative options and can highlight these nicely in a manuscript (for a preprint, or submit to Gigascience, or both). The end user can then modify our basic workflows to their specific experimental design easily in the workflow editor.

We should discuss it as a group, here or in person when we gather next.

@ghost
Copy link

ghost commented Jun 25, 2016

I think @devikaatgit raised good point about separate workflows for prokaryotes

@yvanlebras
Copy link
Collaborator

I'm ok to begin basic workflows based on stacks for::

  • RAD-seq
    • for population genomics without reference genome
    • for genetic map without reference genome
    • for genetic map with reference genome

@ssander5
Copy link
Collaborator

ssander5 commented Jun 25, 2016

@yvanlebras I have some test data that might be good to use for a Shared Data Library to go with your stacks pipeline.

RADseq data from Oaks (chosen because the genome available, has small data sets, unique endpoint analysis for RADseq)
Publication: Hipp AL et al., "A framework phylogeny of the American oak clade based on sequenced RAD data.",PLoS One, 2014 Apr 4;9(4):e93975

Libraries:

  • SRR1873227 - 124M
  • SRR1873257 - 167M
  • SRR1873259 - 172M

Typical Issues with RADseq Captured:

  • Chloroplast contamination
  • Double digest (weird adapter placement - on 3’ side, must remove)
  • barcodes still present

Aims Captured:

  • Small, representative data - Y
  • Highlights typical issues -Y
  • Published analyses -Y (uses stacks pipeline - test Yvans)
  • Publicly accessible - Y
  • Data reduction methods captured - Y
    • Took only three of the several species for reduced size of analysis
    • No manipulation was required.

@cschu
Copy link
Collaborator Author

cschu commented Jun 26, 2016

@devikaatgit I have checked with Dave B., who started to wrap Ballgown last year. He confirmed that he has not finished the wrapper, yet. I remember finding this particular wrapper quite challenging, myself, so I opened a ticket requesting the wrapper to be created again. #41

@devikaatgit
Copy link

Thanks for that @cschu .

Can anybody tell me why BOWTIE is not integrated to galaxy??? or is that I just missed it?

Just now noticed - I couldnt find Bowtie to build index and align fastq reads that I have (not Fastqsanger/Illumina)... As I had discussed with some of you earlier, I used bowtie Commandline tool from desktop and took to galaxy for the remaining steps only, as it was easier and took less time to upload BAM files - compared to fastq files.

@yvanlebras
Copy link
Collaborator

Hi @devikaatgit,

Bowtie is on the usegalaxyserver https://usegalaxy.org/root?tool_id=toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.2.6.2 and I found it also on the dedicated cloud one

----- Mail original -----

De: "devikaatgit" [email protected]
À: "bxlab/galaxy_hackathon" [email protected]
Cc: "Yvan Le Bras" [email protected], "Mention"
[email protected]
Envoyé: Dimanche 26 Juin 2016 15:31:46
Objet: Re: [bxlab/galaxy_hackathon] Workflow Best Practices (#29)

Thanks for that @cschu .

Can anybody tell me why BOWTIE is not integrated to galaxy??? or is that I
just missed it?

Just now noticed - I couldnt find Bowtie to build index and align fastq reads
that I have (not Fastqsanger/Illumina)... As I had discussed with some of
you earlier, I used bowtie Commandline tool from desktop and took to galaxy
for the remaining steps only, as it was easier and took less time to upload
BAM files - compared to fastq files.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or mute the thread .


Yvan Le Bras, PhD @Yvan2935 <°))))><
e-Biogenouest project http://www.e-biogenouest.org
CNRS UMR 6074 IRISA-INRIA, Campus de Beaulieu, 35042 Rennes Cedex
tél.: +33 (0) 2 99 84 71 79 / +33 (0) 6.10.43.96.51
[email protected]

@devikaatgit
Copy link

Hi thank you for that, but I was looking for BOWTIE specifically (since my reads are shorter than 50bp, BOWTIE is faster and sensitve than BOWTIE2 )... There is the option to map with BOWTIE for fastqillumina reads, but not normal fastq reads... Anybody knows how FASTQgroomer functions? Is it used to convert fastq reads to fastqillumina?

@frederikcoppens
Copy link
Collaborator

I'm installing original bowtie in the Cloud instance, should appear in a bit

@devikaatgit
Copy link

That would be great @frederikcoppens THUMBS UP!!!

@frederikcoppens
Copy link
Collaborator

which reference do you need, the tools there but without ref it's not very useful ;-)

@devikaatgit
Copy link

Anybody has some good preloaded workflows for RNA-seq in galaxy? Please share...

@frederikcoppens
Copy link
Collaborator

ref is there, let me know if you have issues
@MoHeydarian do you an RNA-seq workflow available

@yvanlebras
Copy link
Collaborator

@ssander5 thanks for the RADseq data!

@MoHeydarian or @frederikcoppens, We just finish yesterday a first total STACKS 1.4.0 Galaxy integration, can you install coresponding tools (through the suite_stacks main TS repo) on the dedicated AWS VM ?

@devikaatgit
Copy link

Pls see the following link https://github.com/nekrut/galaxy/wiki/Reference-based-RNA-seq

@devikaatgit
Copy link

https://usegalaxy.org/u/devikasub/w/workflow-constructed-from-history-gccworkflow-1

This is a workflow that I created for the purpose. Sample input datasets are available at

https://usegalaxy.org/u/devikasub/h/bacterial-rna-seq-2-condition-single-replicate-datasets

@MoHeydarian @yvanlebras pls see and comment nd also let me know whats to be done next.

@ghost
Copy link

ghost commented Jun 26, 2016

Hi I am repeatly getting the following while trying to run chip-seq worflkow on cloud:

Internal Server Error
Galaxy was unable to successfully complete your request

An error occurred.
This may be an intermittent problem due to load or other unpredictable factors, reloading the page may address the problem.

The error has been logged to our team.

@MoHeydarian
Copy link

@kpoterlo I think the issue was that the wrong output selection from Trimmomatic was chained to BWA in the workflow. I corrected the Trimmomatic output chained to BWA and the workflow goes to completion, though there is an error with MACs appearing now.

The corrected version is under shared data -> workflows and indicated with 'v2'.

@ghost
Copy link

ghost commented Jun 26, 2016

@MoHeydarian thanks, running it for mouse p63 chip-seq in keratinocytes

@MoHeydarian
Copy link

We have working versions of the RNAseq and chIPseq workflows under shared data -> workflows.

The RNAseq workflow got a bit messy in order for Deseq2 to use htseq input. For the purposes of showing an introductory example workflow, it may be worth using Cuffdiff to quantify transcript expression (for a clearer example).

I have also shared two histories with example data that works with these workflows. You can find these under the tool icon in the history panel under 'Histories shared with me'.

Comments and discussion, go!

@yvanlebras
Copy link
Collaborator

yvanlebras commented Jun 26, 2016

@MoHeydarian @frederikcoppens I'm testing Stacks tools on the VM but encounter difficulties because it appears the installation of binaries fails?

error message:

Fatal error: Exit code 127 (Error in Stacks execution)
/mnt/galaxy/tmp/job_working_directory/001/1007/tool_script.sh: line 9: denovo_map.pl: command not found

I think this is due to the galaxy config of the AWS and related to conda. Can you try this:

sudo cp /mnt/galaxy/galaxy-app/config/dependency_resolvers_conf.xml.sample /mnt/galaxy/galaxy-app/config/dependency_resolvers_conf.xml

put conda at the first on the list in:

sudo nano /mnt/galaxy/galaxy-app/config/dependency_resolvers_conf.xml

then add in config/galaxi.ini:

conda_auto_install = True
conda_auto_init = True

restart and desinstall / reinstall stacks?

@yvanlebras
Copy link
Collaborator

@MoHeydarian @frederikcoppens Can you also add MultiQC tool to the AWS instance ? I just finish a MultiQC Galaxy tour that you can add to the instance too. Maybe MultiQC can be add as a final step of Data hackathon workflow using compatible tools like FastQC, cutadapt, Tophat2, Featurecounts, Samtools stats, Picard, Bismark...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants