Skip to content

Commit

Permalink
Merge pull request #2 from d3b-center/rerun-modules
Browse files Browse the repository at this point in the history
Rerun modules
  • Loading branch information
aadamk authored Oct 8, 2024
2 parents c733899 + aa7854c commit b8343e3
Show file tree
Hide file tree
Showing 149 changed files with 110 additions and 33,880 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -377,4 +377,7 @@ Icon
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
.apdisk

# add data file to gitignore
data/v15
1 change: 0 additions & 1 deletion analyses/data_preparation/.Renviron
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
R_MAX_VSIZE=100Gb

3 changes: 2 additions & 1 deletion analyses/data_preparation/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
data/*
# outputs
results/*
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ suppressPackageStartupMessages({
library(rtracklayer)
})

mem.maxVSize(vsize = 102400)

# parse command line options
option_list <- list(
make_option(c("--histology_file"), type = "character", help = "Histology file (.tsv)"),
Expand Down
26 changes: 11 additions & 15 deletions analyses/data_preparation/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@

### Author: Komal S. Rathi
### Author: Komal S. Rathi, Adam Kraya

### Purpose

The purpose of this module is:
1. Subset input data matrices (RNA, CNV, SNV, Methylation and Splicing data) to short histology of interest.
2. Format PSI matrix to a format accepted by PEGASAS.
3. Filter/Transform data matrices which can be used as inputs for multi-modal clustering packages.
1. Subset input data matrices (RNA, CNV, SNV, Methylation and Splicing data) to short histology of interest i.e `HGAT`.
2. Filter/Transform data matrices which can be used as inputs for multi-modal clustering packages.

### Data version

- OPC v15 for histologies (molecular subtype etc), RNA, Methylation, Splicing
- OPC v15 for histologies (with clinical variables like molecular subtype), RNA, Methylation, Splicing

### Run Analysis
```
Expand All @@ -29,29 +27,27 @@ Features were selected from OpenPedCan-analysis v15 datasets using the following

1) **RNA**:

- The expected counts dataset was first filtered to Medulloblatoma samples.
- The expected counts dataset was first filtered to `HGAT` samples.
- Features were reduced to `Top 1000 most variable protein coding genes` followed by `Rank transformation`.

2) **Methylation**:

- Methylation beta-values matrix was first filtered to Medulloblatoma samples.
- Methylation beta-values matrix was first filtered to `HGAT` samples.
- Features were reduced to `Top 1000 most variable probes`.

3) **Splicing**:

- Splice matrix was first filtered to Medulloblatoma samples.
- Splice matrix was first filtered to `HGAT` samples.
- Features were reduced to `Top 1000 most variable splice variants`.

#### Output

The script resulted in `152 Medulloblastoma samples` that have all 3 data modalities available. The filtered/transformed data matrices are written out to individual .tsv files. Additionally a mapping between `Kids_First_Biospecimen_ID` identifiers from each modality and `sample_id` is written out to `samples_map.tsv`.
The script resulted in `228 HGAT samples` that have all 3 data modalities available. The filtered/transformed data matrices are written out to individual .tsv files. Additionally a mapping between `Kids_First_Biospecimen_ID` identifiers from each modality and `sample_id` is written out to `samples_map.tsv`.

```
results
├── cnv_data.tsv # cnv data
├── methyl_data.tsv # methylation data
├── norm_counts.tsv # expression data
├── snv_data.tsv # snv data
├── splice_data.tsv # splice data
├── methyl_data.tsv # methylation data
├── rna_data.tsv # expression data
├── splice_data.tsv # splice data
└── samples_map.tsv # biospecimens + cohort identifiers for samples used for each modality
```
3 changes: 3 additions & 0 deletions analyses/dge_pathway_analysis/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# outputs
results/*
plots/*
9 changes: 4 additions & 5 deletions analyses/dge_pathway_analysis/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ bash run_analysis.sh
#### Inputs

```
../../data
├── c2.cp.kegg_medicus.v2023.2.Hs.symbols.gmt # KEGG MEDICUS gmt file
└── gencode.v39.primary_assembly.annotation.gtf.gz # gencode v39
# gencode v39
../../data/v15
└── gencode.v39.primary_assembly.annotation.gtf.gz
# cohort specific files
# gene expression file
../data_preparation/data
└── gene-counts-rsem-expected_count-collapsed.rds
Expand Down Expand Up @@ -59,4 +59,3 @@ plots/intNMF/deseq
├── cluster_{n}_vs_rest_gsea_cnet.pdf
└── cluster_{n}_vs_rest_gsea_dotplot.pdf
```

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
30,195 changes: 0 additions & 30,195 deletions analyses/dge_pathway_analysis/results/intNMF/deseq/diffexpr_output_per_cluster.tsv

This file was deleted.

This file was deleted.

Loading

0 comments on commit b8343e3

Please sign in to comment.