-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard workflow for the "Introduction to RNA-seq" episode #16
Comments
Hi Fabricio, thanks a lot - that would be great! On my side, I agree with your suggested workflow (my preference though would be to directly use tximeta rather than |
Thank you for your feedback, Charlotte. Indeed, tximeta would be better. I will see what I can do to avoid having to download huge FASTQ files from ENA to use in the episode. I think there might be some nice FASTQ files on ExperimentHub that I can use. I will also try to write a short intro on what RNA-seq is. |
Hi Fabricio,
Thank you for you interest in helping to develop the rnaseq workshop materials – they definitely need it!
One issue almost everyone struggles with in regards to RNA-Seq analysis is that the first part, trimming, alignment and count generation, is typically done outside of R (although this isn’t absolute) and requires more computing resources than a laptop has. The second part of QC, statistical analyses and data mining can very easily be done in R on a laptop.
So how can we teach the first part in the scope of a 2-day Carpentries workshop? I am not sure it is possible to fully do so in the time frame provided. Many workshops/workflows/vignettes (including rnaseqGene<https://urldefense.com/v3/__https:/bioconductor.org/packages/release/workflows/html/rnaseqGene.html__;!!DZ3fjg!7G3V-22W0vN6qtSF3xQgnLr-h-QQRAcAEXmNJWfZ7BQjLQKeNm6gUCKZp-gzIKFo9pdPR-_HLjJBC_TlpJKS7iH_BX4$> that you cited) only give examples of codes that can be used on a cluster, but do not go through how to actually use them. There is also the issue of cluster resources – some people like me have access to institutional resources but each will have very specific ways to access them, different schedulers, and may or may not already have the necessary software installed. There is also the possibility of doing it all on AWS like the Data Carpentries’ Genomics Curriculum does. However, a scheduled workshop will already have the AWS instance set up with all software installed so they do not learn how to do it on their own. The set-up instructions<https://datacarpentry.org/genomics-workshop/setup.html> does go through how to set up your own AWS and has very detailed instructions here<https://datacarpentry.org/genomics-workshop/AMI-setup/>, and also ways to use on your own local machine (MacOS or Linux only) but nothing on memory/processing requirements. And both of these, IMO, would be very difficult for a beginner to actually do on their own.
How were you thinking to incorporate the fastp and salmon into the current workshop? We could just do what others do and talk about the issues/things to be aware of and just give examples of codes that could be run, but not actually try to run them in the workshop. This is still valuable knowledge and maybe all we can do in this context.
I have long had the idea to develop a practical workshop going through actually setting up an AMS instance, getting the required software, uploading fastq files, running trimming + quantification, and downloading counts. This would fit in the spirit of democratization of bioinformatics, although in practice would require people to have access to a credit card. But I have very little experience with AMS myself because I do have access to a cluster that someone else installs all the software I need, and that is what I teach others at my institution to use.
I tagged you on a slack thread we had yesterday discussing this very issue. I would love to continue this discussion there, over email and/or in the bioc-teaching monthly calls (although I will not be able to attend next Monday due to the holiday here). We should also get the new carpentries instructors involved, especially those not in Westernized countries, on how these skills can be taught locally.
I look forward to future discussions on this!
Jenny
From: Fabrício Almeida-Silva ***@***.***>
Sent: Friday, January 13, 2023 7:15 AM
To: carpentries-incubator/bioc-rnaseq ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [carpentries-incubator/bioc-rnaseq] Standard workflow for the "Introduction to RNA-seq" episode (Issue #16)
Thank you for your feedback, Charlotte.
Indeed, tximeta<https://urldefense.com/v3/__https:/bioconductor.org/packages/tximeta__;!!DZ3fjg!9g3wyG57B1QLYZWRAXtIEjLDFK3k4sPXbK4Soo1WglwMKW2Y6OpLRD8-sv7oiyRrC8Oa_ne3QfHcvsFLujYzOrAMR-M$> would be better. I will see what I can do to avoid having to download huge FASTQ files from ENA to use in the episode. I think there might be some nice FASTQ files on ExperimentHub that I can use.
I will also try to write a short intro on what RNA-seq is.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https:/github.com/carpentries-incubator/bioc-rnaseq/issues/16*issuecomment-1381836997__;Iw!!DZ3fjg!9g3wyG57B1QLYZWRAXtIEjLDFK3k4sPXbK4Soo1WglwMKW2Y6OpLRD8-sv7oiyRrC8Oa_ne3QfHcvsFLujYzvq2GpnY$>, or unsubscribe<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/ACREQWPZSFSY3RVEDJXLYFLWSFINRANCNFSM6AAAAAAT2FX3ZU__;!!DZ3fjg!9g3wyG57B1QLYZWRAXtIEjLDFK3k4sPXbK4Soo1WglwMKW2Y6OpLRD8-sv7oiyRrC8Oa_ne3QfHcvsFLujYz7gs5zuE$>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Thank you for bringing these points up, Jenny. fastp and salmon can be run on a laptop without problems (I myself have done it on an Ubuntu laptop with 8 GB RAM). The issue here might be compatibility with multiple platforms. I have not tried installing fastp and salmon on Windows and macOS, so I'm not sure if that would be an issue. I will try that asap and let you know. I saw the discussion on Slack, and I will try to think of solutions to this issue, from using Orchestra to Desktop. |
Hello, everyone.
The Introduction to RNA-seq is currently empty, so I would like to contribute to it.
As far as I understood it, this episode should contain instructions on how to go from raw FASTQ files to a matrix of transcript abundances, including pre-processing steps (e.g., sequence QC, trimming adapters and low-quality sequences, etc).
However, as there are several options of software tools to use in each step of the pipeline, I think we should first agree on a workflow to use. I think we can build on the Bioc workflow package rnaseqGene. My suggested workflow would be:
I'd love to hear what you all think.
Best,
Fabricio
The text was updated successfully, but these errors were encountered: