-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add profile for ARM compatibility #1425
Changes from 40 commits
d59ecf7
991637c
4bdae48
c9d616b
9f640ab
00cff4c
f281e0b
f5208dd
ea8b9e8
db02d99
4a8ed2b
b86a6f9
1b96eb8
4e5d7a3
bbd3452
902367e
328d02d
2f92bfc
0a563dc
0be2d6a
bba6b92
56f1a47
a5e831f
ed026be
a8d514f
8e14711
161cf95
caad7fa
cdfb98b
524e19e
7448aed
4f8de94
a1d558e
dae2899
d3580a4
32f7d32
7df162f
642d2a8
d9e534d
d4fcb56
896dd74
314ea85
251481f
cde5fef
f59132a
40f05c5
2156f5d
79370d3
424137a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -116,6 +116,10 @@ If you would like to reduce the number of reads used in the analysis, for exampl | |||||||
|
||||||||
## Alignment options | ||||||||
|
||||||||
:::note | ||||||||
The `--aligner hisat2` option is not currently supported using ARM architecture ('-profile arm') | ||||||||
::: | ||||||||
|
||||||||
By default, the pipeline uses [STAR](https://github.com/alexdobin/STAR) (i.e. `--aligner star_salmon`) to map the raw FastQ reads to the reference genome, project the alignments onto the transcriptome and to perform the downstream BAM-level quantification with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html). STAR is fast but requires a lot of memory to run, typically around 38GB for the Human GRCh37 reference genome. Since the [RSEM](https://github.com/deweylab/RSEM) (i.e. `--aligner star_rsem`) workflow in the pipeline also uses STAR you should use the [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) aligner (i.e. `--aligner hisat2`) if you have memory limitations. | ||||||||
|
||||||||
You also have the option to pseudoalign and quantify your data directly with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) or [Kallisto](https://pachterlab.github.io/kallisto/) by specifying `salmon` or `kallisto` to the `--pseudo_aligner` parameter. The selected pseudoaligner will then be run in addition to the standard alignment workflow defined by `--aligner`, mainly because it allows you to obtain QC metrics with respect to the genomic alignments. However, you can provide the `--skip_alignment` parameter if you would like to run Salmon or Kallisto in isolation. By default, the pipeline will use the genome fasta and gtf file to generate the transcripts fasta file, and then to build the Salmon index. You can override these parameters using the `--transcript_fasta` and `--salmon_index` parameters, respectively. | ||||||||
|
@@ -298,6 +302,10 @@ By default, the input GTF file will be filtered to ensure that sequence names co | |||||||
|
||||||||
## Contamination screening options | ||||||||
|
||||||||
:::note | ||||||||
The `--contaminant_screening` option is not currently available using ARM architecture ('-profile arm') | ||||||||
::: | ||||||||
|
||||||||
The pipeline provides the option to scan unaligned reads for contamination from other species using [Kraken2](https://ccb.jhu.edu/software/kraken2/), with the possibility of applying corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). Since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2. | ||||||||
|
||||||||
It is important to note that the accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is included in the database. If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php). | ||||||||
|
@@ -356,6 +364,26 @@ genome: 'GRCh37' | |||||||
|
||||||||
You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch). | ||||||||
|
||||||||
### Running on Linux ARM architectures | ||||||||
|
||||||||
The pipeline can be executed in an ARM compatible mode by specifying the ARM profile, for example: | ||||||||
|
||||||||
```bash | ||||||||
nextflow run \ | ||||||||
nf-core/rnaseq \ | ||||||||
--input <SAMPLESHEET> \ | ||||||||
--outdir <OUTDIR> \ | ||||||||
--gtf <GTF> \ | ||||||||
--fasta <GENOME FASTA> \ | ||||||||
-profile docker,arm | ||||||||
``` | ||||||||
|
||||||||
This will use ARM-compatible containers, and apply a small number of overrides to Conda definitions to support ARM operation. | ||||||||
|
||||||||
:::warning | ||||||||
Please note that the ARM profile is experimental. It is expected to function correctly in all cases unless explicitly indicated otherwise—currently, exceptions include the use of the hisat2 aligner and contaminant screening via kraken2. However, because testing is presently conducted manually, we cannot guarantee its reliability. | ||||||||
::: | ||||||||
|
||||||||
pinin4fjords marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
### Updating the pipeline | ||||||||
|
||||||||
When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: | ||||||||
|
@@ -420,6 +448,12 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof | |||||||
- A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow ` 24.03.0-edge` or later). | ||||||||
- `conda` | ||||||||
- A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer. | ||||||||
- `arm` | ||||||||
- A configuration profile that will set `docker.runOptions` appropriately for ARM architectures, and apply overrides supplying ARM-compatible containers and Conda environments. | ||||||||
pinin4fjords marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
:::warning | ||||||||
Please note that the ARM profile is experimental. It is expected to function correctly in all cases unless explicitly indicated otherwise—currently, exceptions include the use of the hisat2 aligner and contaminant screening via kraken2. However, because testing is presently conducted manually, we cannot guarantee its reliability. | ||||||||
::: | ||||||||
|
||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we just link to the new section we created in the |
||||||||
### `-resume` | ||||||||
|
||||||||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.