-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow Best Practices #29
Comments
follow |
3 similar comments
follow |
follow |
follow |
Any thoughts on including RADtag workflow? I know stacks/tuxedo is part of the program list on galaxy and RRL genome information is useful to a lot of folks, and I'm not sure that there is a workflow on galaxy yet for it! |
Follow and maybe @devikaatgit want too ;) |
yes @ssander5 I have Stacks workflows ;) |
We've talked about generating basic workflows for::
The idea is to provide standard workflows to give an idea of the basics for end-users. We can couple this with a short discussion of alternative options for mapping and quantification for each workflow and list the various benefits/shortcomings of these options, as it is likely impossible for any two bioinformaticians to agree on what the best practice would be. Ideally we will contact Gigascience and see if they are interested in a 'best practices' paper of Galaxy workflows. They were very interested in this last year. There are versions of these on the cloud under Shared Data -> Workflows. |
I am very much interested in this...but I have a point to state, separate workflows for prokaryotic and eukaryotic genomes/transcriptomes are needed as exons are not a thing with proks... Also I am not sure if paired end reads and single end reads can be run using the same workflow... I will be sharing 2 dataset for 2 conditions as samples for prokaryotic RNA-seq, and if required the reference genome for them too. My idea of workflow is for a microbial RNA-seq quantification that would be BowTie--> StringTie--> Ballgown after required trimming and quality check (Since, prokaryotes do not have exons - tophat/HISAT and similar spliced aligners are not needed, i feel Bowtie itself does an efficient job). Since ballgown is yet not available in galaxy, it could be modified to BowTie --> Stringtie --> Cuffmerge and Cuffdiff. (Stringtie seems to be faster than Cufflinks???)... Of course htseq is also a good option. Plus this is just a basic idea that I have, i am not sure about how "galaxy workflows" work,, so suggestions/comments are welcome and expected. |
@devikaatgit you are correct that we would need different workflows for different RNA-seq purposes (SE v PE, transcript quantification v transcript reconstruction, prokaryotic v eukaryotic, etc). If we make a workflow for all of the conditions we end up with an overwhelming list of options. I propose instead to illustrate the general steps in the a given RNA/chIP-seq workflow and discuss the alternative tools that can be used at the various analysis steps with their benefits and shortcomings. Among our group we have lots of experience with these alternative options and can highlight these nicely in a manuscript (for a preprint, or submit to Gigascience, or both). The end user can then modify our basic workflows to their specific experimental design easily in the workflow editor. We should discuss it as a group, here or in person when we gather next. |
I think @devikaatgit raised good point about separate workflows for prokaryotes |
I'm ok to begin basic workflows based on stacks for::
|
@yvanlebras I have some test data that might be good to use for a Shared Data Library to go with your stacks pipeline. RADseq data from Oaks (chosen because the genome available, has small data sets, unique endpoint analysis for RADseq) Libraries:
Typical Issues with RADseq Captured:
Aims Captured:
|
@devikaatgit I have checked with Dave B., who started to wrap Ballgown last year. He confirmed that he has not finished the wrapper, yet. I remember finding this particular wrapper quite challenging, myself, so I opened a ticket requesting the wrapper to be created again. #41 |
Thanks for that @cschu . Can anybody tell me why BOWTIE is not integrated to galaxy??? or is that I just missed it? Just now noticed - I couldnt find Bowtie to build index and align fastq reads that I have (not Fastqsanger/Illumina)... As I had discussed with some of you earlier, I used bowtie Commandline tool from desktop and took to galaxy for the remaining steps only, as it was easier and took less time to upload BAM files - compared to fastq files. |
Hi @devikaatgit, Bowtie is on the usegalaxyserver https://usegalaxy.org/root?tool_id=toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.2.6.2 and I found it also on the dedicated cloud one ----- Mail original -----
Yvan Le Bras, PhD @Yvan2935 <°))))>< |
Hi thank you for that, but I was looking for BOWTIE specifically (since my reads are shorter than 50bp, BOWTIE is faster and sensitve than BOWTIE2 )... There is the option to map with BOWTIE for fastqillumina reads, but not normal fastq reads... Anybody knows how FASTQgroomer functions? Is it used to convert fastq reads to fastqillumina? |
I'm installing original bowtie in the Cloud instance, should appear in a bit |
That would be great @frederikcoppens THUMBS UP!!! |
which reference do you need, the tools there but without ref it's not very useful ;-) |
Anybody has some good preloaded workflows for RNA-seq in galaxy? Please share... |
ref is there, let me know if you have issues |
@ssander5 thanks for the RADseq data! @MoHeydarian or @frederikcoppens, We just finish yesterday a first total STACKS 1.4.0 Galaxy integration, can you install coresponding tools (through the suite_stacks main TS repo) on the dedicated AWS VM ? |
Pls see the following link https://github.com/nekrut/galaxy/wiki/Reference-based-RNA-seq |
https://usegalaxy.org/u/devikasub/w/workflow-constructed-from-history-gccworkflow-1 This is a workflow that I created for the purpose. Sample input datasets are available at https://usegalaxy.org/u/devikasub/h/bacterial-rna-seq-2-condition-single-replicate-datasets @MoHeydarian @yvanlebras pls see and comment nd also let me know whats to be done next. |
Hi I am repeatly getting the following while trying to run chip-seq worflkow on cloud: Internal Server Error An error occurred. The error has been logged to our team. |
@kpoterlo I think the issue was that the wrong output selection from Trimmomatic was chained to BWA in the workflow. I corrected the Trimmomatic output chained to BWA and the workflow goes to completion, though there is an error with MACs appearing now. The corrected version is under shared data -> workflows and indicated with 'v2'. |
@MoHeydarian thanks, running it for mouse p63 chip-seq in keratinocytes |
We have working versions of the RNAseq and chIPseq workflows under shared data -> workflows. The RNAseq workflow got a bit messy in order for Deseq2 to use htseq input. For the purposes of showing an introductory example workflow, it may be worth using Cuffdiff to quantify transcript expression (for a clearer example). I have also shared two histories with example data that works with these workflows. You can find these under the tool icon in the history panel under 'Histories shared with me'. Comments and discussion, go! |
@MoHeydarian @frederikcoppens I'm testing Stacks tools on the VM but encounter difficulties because it appears the installation of binaries fails? error message:
I think this is due to the galaxy config of the AWS and related to conda. Can you try this:
put conda at the first on the list in:
then add in config/galaxi.ini:
restart and desinstall / reinstall stacks? |
@MoHeydarian @frederikcoppens Can you also add MultiQC tool to the AWS instance ? I just finish a MultiQC Galaxy tour that you can add to the instance too. Maybe MultiQC can be add as a final step of Data hackathon workflow using compatible tools like FastQC, cutadapt, Tophat2, Featurecounts, Samtools stats, Picard, Bismark... |
Associated Coding Hack Tasks
The text was updated successfully, but these errors were encountered: