Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add subworkflow for functional enrichment analysis #7254

Merged
merged 75 commits into from
Jan 27, 2025
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
4ea1b92
[functional_enrichment] create a first template
suzannejin Dec 11, 2024
a434377
Merge branch 'nf-core:master' into functional_analysis
suzannejin Dec 13, 2024
1567d0a
[functional_analysis] Add original code from the enrichment subworkfl…
suzannejin Dec 13, 2024
318898d
[functional_analysis] simplify code for grea and gprofiler2
suzannejin Dec 13, 2024
e8a1f7f
[functional_analysis] add basic test for deseq2+gprofiler2
suzannejin Dec 13, 2024
b3ce956
[functional_analysis] pass test for deseq2 + gprofiler
suzannejin Dec 13, 2024
c0833c3
[functional_analysis] add test for limma-voom + gprofiler2
suzannejin Dec 13, 2024
aa95ebe
Merge branch 'nf-core:master' into functional_analysis
suzannejin Dec 17, 2024
88ff6c8
Merge branch 'nf-core:master' into functional_analysis
suzannejin Dec 18, 2024
9837349
[functional_analysis] add optional inputs and set them to null. Updat…
suzannejin Dec 18, 2024
c9b8a88
[functional_analysis] count elements in channel as condition to run a…
suzannejin Dec 18, 2024
643b7f7
[functional_analysis] updated the code to handle gsea stuff. tested g…
suzannejin Dec 19, 2024
e86ece4
[functional_analysis] deseq2+gprofiler2 works
suzannejin Dec 19, 2024
7030f37
[functional_analysis] added test for gsea, but still need to make it …
suzannejin Dec 19, 2024
a7b7bd9
[functional_analysis] last changes, need to solve bugs
suzannejin Dec 19, 2024
a36f2fd
[functional_analysis] GSEA works now. Added snapshot. Still need to c…
suzannejin Dec 20, 2024
8d9c31a
[functional_analysis] add comments
suzannejin Dec 20, 2024
6c3df56
[functional_analysis] clean the code related to empty optional inputs
suzannejin Dec 20, 2024
e8f105c
[functional_analysis] add test for limmavoom+gsea
suzannejin Dec 20, 2024
e3eb9df
[functional_analysis] fill meta.yml
suzannejin Dec 20, 2024
99f81ba
correct errata
suzannejin Dec 20, 2024
294b1de
Merge branch 'master' into functional_analysis
suzannejin Jan 9, 2025
e175066
Merge branch 'master' into functional_analysis
suzannejin Jan 9, 2025
77ed5ea
update meta.yml
suzannejin Jan 9, 2025
64ac34f
update tests and snapshots
suzannejin Jan 9, 2025
01b1791
Merge branch 'master' into functional_analysis
suzannejin Jan 13, 2025
df8945f
remove weird addition of module
suzannejin Jan 13, 2025
954fde6
fix the tests that runs gprofiler2. Checked that it produces same res…
suzannejin Jan 13, 2025
b187583
updated gsea test. Need to check why output files are empty though
suzannejin Jan 13, 2025
392dc08
update deseq2+gprofiler2 test to uncheck the unstable html file content
suzannejin Jan 13, 2025
79965f4
update snapshots for gsea
suzannejin Jan 13, 2025
c15d4cd
.
suzannejin Jan 13, 2025
3f4aecb
update versions test snapshot
suzannejin Jan 13, 2025
de82a8f
add test for propd + grea
suzannejin Jan 13, 2025
8017e21
work in process.. need to modify abundance_differential_filter to use…
suzannejin Jan 13, 2025
da67af8
modified abundance_differential_filter to use meta_exp in deseq_norm …
suzannejin Jan 14, 2025
d42f2da
update multimapcriteria. gsea tests pass
suzannejin Jan 14, 2025
fb3f1ae
update abundance_differential_filter
suzannejin Jan 14, 2025
c439740
add meta to modules'input
suzannejin Jan 14, 2025
1759a86
update all.config
suzannejin Jan 14, 2025
877946b
.
suzannejin Jan 14, 2025
e9f069b
.
suzannejin Jan 14, 2025
93345a6
Merge branch 'master' into functional_analysis
suzannejin Jan 14, 2025
75399ec
Merge branch 'master' into functional_analysis
suzannejin Jan 16, 2025
12b7f8c
revert homer snap
suzannejin Jan 16, 2025
ca61db0
Merge branch 'master' into functional_analysis
suzannejin Jan 16, 2025
170e870
revert abundance_differential_filter
suzannejin Jan 16, 2025
51229b9
.
suzannejin Jan 16, 2025
af0a0a7
.
suzannejin Jan 16, 2025
4f24202
Merge branch 'master' into functional_analysis
suzannejin Jan 16, 2025
cac0b07
update snapshot
suzannejin Jan 16, 2025
2b78395
update gprofiler2_gost meta and test
suzannejin Jan 16, 2025
709abba
update gsea_gsea meta and test
suzannejin Jan 16, 2025
aeedb60
add view to check output
suzannejin Jan 16, 2025
4b3013b
modify the tests to have inputs without 'method' in meta
suzannejin Jan 16, 2025
65ea9e5
replace meta.remove by meta - [...]
suzannejin Jan 16, 2025
5454a7d
update test snapshots
suzannejin Jan 16, 2025
f536a4e
update meta
suzannejin Jan 16, 2025
05edb99
Merge branch 'master' into functional_analysis
suzannejin Jan 20, 2025
85d60f5
update test snapshots as custom modules were updated in nf-core/modules
suzannejin Jan 20, 2025
5888db2
Merge branch 'master' into functional_analysis
suzannejin Jan 20, 2025
3003d04
remove view
suzannejin Jan 20, 2025
d795904
Merge branch 'master' into functional_analysis
suzannejin Jan 21, 2025
d88df5d
modify gprofiler2 snapshot to assert unstable png
suzannejin Jan 21, 2025
3b3a05f
update test in gprofiler2
suzannejin Jan 21, 2025
1b49463
Merge branch 'master' into functional_analysis
suzannejin Jan 21, 2025
99d5669
replace collect
suzannejin Jan 21, 2025
c4e4bca
Merge branch 'master' into functional_analysis
suzannejin Jan 24, 2025
25f94ed
Simplify input channels by adding genesets and background to ch_input.
suzannejin Jan 24, 2025
6a4b1c7
add stub to test and update snapshots.
suzannejin Jan 24, 2025
dbb6637
fix indent
suzannejin Jan 24, 2025
4f7cae8
fix small bug
suzannejin Jan 24, 2025
6e5e1a6
add comments and update meta
suzannejin Jan 24, 2025
3f10e16
add stub snapshot
suzannejin Jan 24, 2025
ecd84ed
Merge branch 'master' into functional_analysis
suzannejin Jan 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions modules/nf-core/gprofiler2/gost/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ process GPROFILER2_GOST {

input:
tuple val(meta), path(de_file)
path(gmt_file)
path(background_file)
tuple val(meta2), path(gmt_file)
tuple val(meta3), path(background_file)

output:
tuple val(meta), path("*.gprofiler2.all_enriched_pathways.tsv") , emit: all_enrich
Expand Down
10 changes: 8 additions & 2 deletions modules/nf-core/gprofiler2/gost/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,18 @@ input:
pattern: "*.{csv,tsv}"
description: |
CSV or TSV-format tabular file with differential analysis outputs
- - gmt_file:
- - meta2:
type: map
description: Groovy map
- gmt_file:
type: file
pattern: "*.gmt"
description: |
Path to a GMT file downloaded from g:profiler that should be queried instead of the online databases
- - background_file:
- - meta3:
type: map
description: Groovy map
- background_file:
type: file
pattern: "*.{csv,tsv,txt}"
description: |
Expand Down
24 changes: 18 additions & 6 deletions modules/nf-core/gprofiler2/gost/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,14 @@ nextflow_process {
['id':'Condition_genotype_WT_KO', 'variable':'Condition genotype', 'reference':'WT', 'target':'KO', 'blocking':'batch'],
file(params.modules_testdata_base_path + "genomics/mus_musculus/rnaseq_expression/Condition_genotype_WT_KO.deseq2.results_filtered.tsv", checkIfExists: true)
]
input[1] = file(params.modules_testdata_base_path + "genomics/mus_musculus/gene_set_analysis/mh.all.v2022.1.Mm.symbols.gmt", checkIfExists: true)
input[2] = file(params.modules_testdata_base_path + "genomics/mus_musculus/rnaseq_expression/study.filtered.tsv", checkIfExists: true)
input[1] = [
['id': 'test'],
file(params.modules_testdata_base_path + "genomics/mus_musculus/gene_set_analysis/mh.all.v2022.1.Mm.symbols.gmt", checkIfExists: true)
]
input[2] = [
['id': 'test'],
file(params.modules_testdata_base_path + "genomics/mus_musculus/rnaseq_expression/study.filtered.tsv", checkIfExists: true)
]
"""
}
}
Expand All @@ -66,9 +72,9 @@ nextflow_process {
{ assert process.success },
{ assert snapshot(
process.out.all_enrich,
process.out.plot_png,
process.out.sub_enrich,
process.out.sub_plot,
file(process.out.plot_png[0][1]).name, //assert unstable file
process.out.sub_plot[0][1].collect{ file(it).name }, //assert unstable file
process.out.filtered_gmt,
process.out.session_info.collect{ meta,session_info -> file(session_info).name }, //assert unstable file
process.out.versions,
Expand All @@ -94,8 +100,14 @@ nextflow_process {
['id':'Condition_genotype_WT_KO', 'variable':'Condition genotype', 'reference':'WT', 'target':'KO', 'blocking':'batch'],
file(params.modules_testdata_base_path + "genomics/mus_musculus/rnaseq_expression/Condition_genotype_WT_KO.deseq2.results_filtered.tsv", checkIfExists: true)
]
input[1] = file(params.modules_testdata_base_path + "genomics/mus_musculus/gene_set_analysis/mh.all.v2022.1.Mm.symbols.gmt", checkIfExists: true)
input[2] = file(params.modules_testdata_base_path + "genomics/mus_musculus/rnaseq_expression/study.filtered.tsv", checkIfExists: true)
input[1] = [
['id': 'test'],
file(params.modules_testdata_base_path + "genomics/mus_musculus/gene_set_analysis/mh.all.v2022.1.Mm.symbols.gmt", checkIfExists: true)
]
input[2] = [
['id': 'test'],
file(params.modules_testdata_base_path + "genomics/mus_musculus/rnaseq_expression/study.filtered.tsv", checkIfExists: true)
]
"""
}
}
Expand Down
46 changes: 12 additions & 34 deletions modules/nf-core/gprofiler2/gost/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,6 @@
"Condition_genotype_WT_KO.gprofiler2.all_enriched_pathways.tsv:md5,1134a02ca061c463bcbff277eefbfb19"
]
],
[
[
{
"id": "Condition_genotype_WT_KO",
"variable": "Condition genotype",
"reference": "WT",
"target": "KO",
"blocking": "batch"
},
"Condition_genotype_WT_KO.gprofiler2.gostplot.png:md5,4b83d1bcf7dc9dbf6cef3d545e440c5b"
]
],
[
[
{
Expand All @@ -47,27 +35,17 @@
]
]
],
"Condition_genotype_WT_KO.gprofiler2.gostplot.png",
[
[
{
"id": "Condition_genotype_WT_KO",
"variable": "Condition genotype",
"reference": "WT",
"target": "KO",
"blocking": "batch"
},
[
"Condition_genotype_WT_KO.gprofiler2.GO:BP.sub_enriched_pathways.png:md5,d89498267e985adf0ad1266e2deb9f48",
"Condition_genotype_WT_KO.gprofiler2.GO:CC.sub_enriched_pathways.png:md5,e04cdd51b200671613254d021d3af242",
"Condition_genotype_WT_KO.gprofiler2.GO:MF.sub_enriched_pathways.png:md5,33ea0652d78111978677acde0fe7f807",
"Condition_genotype_WT_KO.gprofiler2.HP.sub_enriched_pathways.png:md5,6c040ac4baba73ae5637b00650e6aea1",
"Condition_genotype_WT_KO.gprofiler2.KEGG.sub_enriched_pathways.png:md5,fbd232c4eeced95ceda60b43a02dbe1f",
"Condition_genotype_WT_KO.gprofiler2.MIRNA.sub_enriched_pathways.png:md5,956880d3bf4852a06b0ffaaaba565732",
"Condition_genotype_WT_KO.gprofiler2.REAC.sub_enriched_pathways.png:md5,0e8f9217d275668986771dc7fede3170",
"Condition_genotype_WT_KO.gprofiler2.TF.sub_enriched_pathways.png:md5,0697164bc87e95e6508db966df94e01e",
"Condition_genotype_WT_KO.gprofiler2.WP.sub_enriched_pathways.png:md5,09976762c7541ff9e5009e8763986845"
]
]
"Condition_genotype_WT_KO.gprofiler2.GO:BP.sub_enriched_pathways.png",
"Condition_genotype_WT_KO.gprofiler2.GO:CC.sub_enriched_pathways.png",
"Condition_genotype_WT_KO.gprofiler2.GO:MF.sub_enriched_pathways.png",
"Condition_genotype_WT_KO.gprofiler2.HP.sub_enriched_pathways.png",
"Condition_genotype_WT_KO.gprofiler2.KEGG.sub_enriched_pathways.png",
"Condition_genotype_WT_KO.gprofiler2.MIRNA.sub_enriched_pathways.png",
"Condition_genotype_WT_KO.gprofiler2.REAC.sub_enriched_pathways.png",
"Condition_genotype_WT_KO.gprofiler2.TF.sub_enriched_pathways.png",
"Condition_genotype_WT_KO.gprofiler2.WP.sub_enriched_pathways.png"
],
[

Expand All @@ -89,7 +67,7 @@
"nf-test": "0.9.2",
"nextflow": "24.10.3"
},
"timestamp": "2025-01-09T13:43:18.555455129"
"timestamp": "2025-01-21T11:29:54.746689985"
},
"stub": {
"content": [
Expand Down Expand Up @@ -298,6 +276,6 @@
"nf-test": "0.9.2",
"nextflow": "24.10.3"
},
"timestamp": "2025-01-09T13:43:36.462475057"
"timestamp": "2025-01-21T11:31:33.394855046"
}
}
2 changes: 1 addition & 1 deletion modules/nf-core/gsea/gsea/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ process GSEA_GSEA {
input:
tuple val(meta), path(gct), path(cls), path(gene_sets)
tuple val(reference), val(target)
path(chip) // Optional identifier mapping file
tuple val(meta2), path(chip) // Optional identifier mapping file

output:
tuple val(meta), path("*.rpt") , emit: rpt
Expand Down
5 changes: 4 additions & 1 deletion modules/nf-core/gsea/gsea/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,10 @@ input:
description: |
String indicating which of the classes in the cls file should be used
as the target level of the comparison.
- - chip:
- - meta2:
type: map
description: Groovy map
- chip:
type: file
description: |
optional Broad-style chip file mapping identifiers in gct to
Expand Down
10 changes: 8 additions & 2 deletions modules/nf-core/gsea/gsea/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ nextflow_process {
"""
input[0] = [['id':'Condition_genotype_WT_KO', 'variable':'Condition genotype', 'reference':'WT', 'target':'KO', 'blocking':'batch'], file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/Condition_treatment_Control_Treated.gct", checkIfExists:true), file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/Condition_genotype_WT_KO.cls", checkIfExists:true), file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/mh.all.v2022.1.Mm.symbols.gmt", checkIfExists:true)]
input[1] = ['WT', 'KO']
input[2] = file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/Mus_musculus.anno.feature_metadata.chip", checkIfExists:true)
input[2] = [
['id': 'test'],
file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/Mus_musculus.anno.feature_metadata.chip", checkIfExists:true)
]
"""
}
}
Expand Down Expand Up @@ -63,7 +66,10 @@ nextflow_process {
"""
input[0] = [['id':'Condition_genotype_WT_KO', 'variable':'Condition genotype', 'reference':'WT', 'target':'KO', 'blocking':'batch'], file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/Condition_treatment_Control_Treated.gct", checkIfExists:true), file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/Condition_genotype_WT_KO.cls", checkIfExists:true), file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/mh.all.v2022.1.Mm.symbols.gmt", checkIfExists:true)]
input[1] = ['WT', 'KO']
input[2] = file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/Mus_musculus.anno.feature_metadata.chip", checkIfExists:true)
input[2] = [
['id': 'test'],
file("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/mus_musculus/gene_set_analysis/Mus_musculus.anno.feature_metadata.chip", checkIfExists:true)
]
"""
}
}
Expand Down
150 changes: 150 additions & 0 deletions subworkflows/nf-core/differential_functional_enrichment/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@

//
// Perform enrichment analysis
//
include { GPROFILER2_GOST } from "../../../modules/nf-core/gprofiler2/gost/main.nf"
include { CUSTOM_TABULARTOGSEAGCT } from '../../../modules/nf-core/custom/tabulartogseagct/main.nf'
include { CUSTOM_TABULARTOGSEACLS } from '../../../modules/nf-core/custom/tabulartogseacls/main.nf'
include { CUSTOM_TABULARTOGSEACHIP } from '../../../modules/nf-core/custom/tabulartogseachip/main.nf'
include { GSEA_GSEA } from '../../../modules/nf-core/gsea/gsea/main.nf'
include { PROPR_GREA } from "../../../modules/nf-core/propr/grea/main.nf"

// Combine meta maps, including merging non-identical values of shared keys (e.g. 'id')
def mergeMaps(meta, meta2){
(meta + meta2).collectEntries { k, v ->
meta[k] && meta[k] != v ? [k, "${meta[k]}_${v}"] : [k, v]
}
}

workflow DIFFERENTIAL_FUNCTIONAL_ENRICHMENT {
take:
// input data for functional analysis
// They can be the results from differential expression analysis or abundance matrix
// The functional analysis method to run should be explicitly provided
ch_input // [ meta_input, input file, method to run ]

// gene sets and background
ch_gene_sets // [ meta_gmt, gmt file ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this necessarily need to be a GMT file?
E.g. decoupler uses weighted gene sets for PROGENy and collecTRI analyses that are typically provided as a long-form data frame. If we included deconvolution as functional analysis tool, it would typically use a signature matrix.

It really depends a bit on the scope of this subworkflow. But if the plan is to support a wide range of functional analysis tools as suggested in nf-core/differentialabundance#367, it would be good to keep this generic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I'm always wary of premature over-engineering. I don't object to gmt in the first instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically my suggestion comes down to making the gene sets + background method-specific. Already the current workflow logic doesn't really care about whether it's a gmt file or not. It's only specified in the comment.

Copy link
Contributor Author

@suzannejin suzannejin Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is possible to standardize the gene set input format for all methods, and then each module deals with the reformatting to the proper format specifically needed for the method?

Otherwise I would imagine it become confusing from the pipeline's user perspective to have to provide the input gene set with certain format depending on the method chosen, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The "long-form dataframe" format used by decoupler is quite universal. It can cover signature matrices, weighted and signed gene sets.
  • This is also the format that can be obtained from omnipathdb via API. Omnipathdb contains most of the commonly used signatures, such as MSigDB, GO, Dorothea, Progeny. Like that users wouldn't need to obtain the genesets themselves, but just specify the name.
  • I agree that a common format makes sense, but I might still want to couple certain signatures to certain methods and not necessarily run all-vs-all. E.g. With PROGENy signatures, I'd typically use the recommended MLM algorithm in decoupler, while with MSigDB signatures, I'd rather use GSEA.
  • Not entirely sure yet if deconvolution is in the scope of this subworkflow, but here the methods are often shipped together with a signature matrix, so they wouldn't require any input signature at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. It would be really nice to include omnipathdb, then the coupling between signature and method can be automatized based on name.
For now, we could have a method specific gene sets channel just as input channel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not thrilled by the prospect of a multiplicity of gene_set inputs per method.

@suzannejin Maybe the gene sets should actually be part of ch_input. That way, there could be method-specific gene set files in that channel, and we wouldn't need multiple input channels

ch_background // [ meta_background, background file ]

// other - for the moment these files are only needed for GSEA
ch_contrasts // [ meta_contrast, contrast_variable, reference, target ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) When providing results from differential expression analysis, a contrast would not be needed.
(2) When providing gene expression as input, differential testing is necessary that is not unlike DE analysis (it needs a model and a contrast definition).

IMO it would make sense to keep differential testing entirely out of this workflow. In case (1), it's not needed, and in case (2), a sample x signature matrix is produced. This matrix could just be fed into a differential analysis workflow (e.g. limma) again.

Like that we can keep the complexity of this subworkflow low.

Copy link
Member

@grst grst Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware that the gsea module does support providing a contrast directly when using gene expression data. However, I believe that there's no point in using this mode.

In this case, GSEA anyway just computes a metric (signal2noise, t-statistic, ...) based on these variables (see docs), so we can as well provide a fold change or DE-statistic directly, while having the advantage that we can provide a full model definition including covariates to the DE method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware that the gsea module does support providing a contrast directly when using gene expression data. However, I believe that there's no point in using this mode.

Fair point, but it's what diff. ab. does right now. I'd recognised the possbility of a switch, but hadn't got round to it: nf-core/differentialabundance#36

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point that we could model the current matrix-driven GSEA as part of the differential subworkflow, alongside LIMMA et al though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the current subworkflow to be able to produce the exact behaviour of how the modules are used in the DA pipeline, hence GSEA is taking gene expression data instead of DE. Not sure why this is mode is chosen for the pipeline, maybe @pinin4fjords can step in here?

But yes, I do agree that it would be conceptually cleaner if the subworkflow just takes in DE output, and so for GSEA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of taking this chance of streamlining the workflow now that we are anyway changing quite a few things. We can help with implementing a module for preranked GSEA if required1.

(alternatively, the decoupler module is kind of ready, and it comes with a very fast GSEA implementation -- it should generate the same results in terms of scores and p-values, but it doesn't produce all outputs such as the "leading edge" plots).

Footnotes

  1. In a few weeks. Waiting for the contract extension with our external developers to be signed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to be a pain, but I do like those leading edge plots, they are useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against a preranked GSEA module. But what I really would like to get rid of is the contrast specification in this subworkflow.

Copy link
Contributor Author

@suzannejin suzannejin Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually like to start integrating the subworkflows in the current DA pipeline hopefully next week.
If the switch to preranked GSEA would take some time, maybe we could first agree on the current subworkflow version? I think it could serve as a nice starting point, and you are welcome to add modifications above it afterwards

ch_samplesheet // [ meta_exp, samples sheet ]
ch_featuresheet // [ meta_exp, features sheet, features id, features symbol ]

main:

ch_versions = Channel.empty()

// Add method information into meta map of ch_input
// This information is used later to determine which method to run for each input

ch_input = ch_input
.combine(ch_gene_sets)
.combine(ch_background)
.multiMap {
meta_input, file, analysis_method, meta_gmt, gmt, meta_background, background ->
def meta_new = meta_input + [ 'method': analysis_method ]
input:
[ meta_new, file ]
gene_sets:
[ meta_new, gmt ] // NOTE here we assume that the modules will not make use of meta_gmt and meta_background
background:
[ meta_new, background ]
}

// In the case of GSEA, it needs additional files coming from other channels that other methods don't use
// here we define the input channel for the GSEA section

def criteria = multiMapCriteria { meta_input, input, gmt, meta_exp, samplesheet, featuresheet, features_id, features_symbol, meta_contrasts, variable, reference, target ->
def meta_contrasts_new = meta_contrasts + [ 'variable': variable, 'reference': reference, 'target': target ] // make sure variable, reference, target are in the meta
def meta_all = mergeMaps(meta_contrasts_new, meta_input)
input:
[ meta_all, input ]
gene_sets:
[ meta_all, gmt ]
contrasts_and_samples:
[ meta_all, samplesheet ]
features:
[ meta_exp, featuresheet ]
features_cols:
[ features_id, features_symbol ]
}
ch_preinput_for_gsea = ch_input.input
.join(ch_input.gene_sets)
.filter{ it[0].method == 'gsea' }
.combine(ch_samplesheet.join(ch_featuresheet))
.combine(ch_contrasts)
.multiMap(criteria)

// ----------------------------------------------------
// Perform enrichment analysis with gprofiler2
// ----------------------------------------------------

GPROFILER2_GOST(
ch_input.input.filter{ it[0].method == 'gprofiler2' },
ch_input.gene_sets.filter{ it[0].method == 'gprofiler2'},
ch_input.background.filter{ it[0].method == 'gprofiler2'}
)

// ----------------------------------------------------
// Perform enrichment analysis with GSEA
// ----------------------------------------------------

// NOTE that GCT input can be more than 1, if they come from different tools (eg. limma, deseq2).
// CLS input can be as many as combinations of input x contrasts
// Whereas features can be only one file.

CUSTOM_TABULARTOGSEAGCT(ch_preinput_for_gsea.input)

CUSTOM_TABULARTOGSEACLS(ch_preinput_for_gsea.contrasts_and_samples)

CUSTOM_TABULARTOGSEACHIP(
ch_preinput_for_gsea.features.first(),
ch_preinput_for_gsea.features_cols.first()
)

ch_input_for_gsea = CUSTOM_TABULARTOGSEAGCT.out.gct
.join(CUSTOM_TABULARTOGSEACLS.out.cls)
.join( ch_preinput_for_gsea.gene_sets )

GSEA_GSEA(
ch_input_for_gsea,
ch_input_for_gsea.map{ tuple(it[0].reference, it[0].target) },
CUSTOM_TABULARTOGSEACHIP.out.chip.first()
)

// ----------------------------------------------------
// Perform enrichment analysis with GREA
// ----------------------------------------------------

PROPR_GREA(
ch_input.input.filter{ it[0].method == 'grea' },
ch_input.gene_sets.filter{ it[0].method == 'grea' }
)

emit:
// here we emit the outputs that will be useful afterwards in the
// nf-core/differentialabundance pipeline

// gprofiler2-specific outputs
gprofiler2_all_enrich = GPROFILER2_GOST.out.all_enrich
gprofiler2_sub_enrich = GPROFILER2_GOST.out.sub_enrich
gprofiler2_plot_html = GPROFILER2_GOST.out.plot_html

// gsea-specific outputs
gsea_report = GSEA_GSEA.out.report_tsvs_ref
.join(GSEA_GSEA.out.report_tsvs_target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange indent


// grea-specific outputs
grea_results = PROPR_GREA.out.results

// tool versions
versions = ch_versions
.mix(GPROFILER2_GOST.out.versions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange indents

.mix(CUSTOM_TABULARTOGSEAGCT.out.versions)
.mix(CUSTOM_TABULARTOGSEACLS.out.versions)
.mix(CUSTOM_TABULARTOGSEACHIP.out.versions)
.mix(GSEA_GSEA.out.versions)
.mix(PROPR_GREA.out.versions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there no emissions?

}
Loading
Loading