Skip to content

Commit

Permalink
GRIMER v1.0.0 (#3)
Browse files Browse the repository at this point in the history
* v1.0.0

* metadata code, sorted panels user

* fix small typos, print md

* updated external reference files

* fix

* option to input table with commulative values

* check md data length, fix typos, env decv

* update readme
  • Loading branch information
pirovc authored Jul 21, 2022
1 parent 43d2feb commit 2a5bd2f
Show file tree
Hide file tree
Showing 17 changed files with 7,233 additions and 5,225 deletions.
102 changes: 96 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,30 @@

![GRIMER](grimer/img/logo.png)

GRIMER perform analysis of microbiome data and generates a portable and interactive dashboard integrating annotation, taxonomy and metadata.
GRIMER perform analysis of microbiome data and generates a portable and interactive dashboard integrating annotation, taxonomy and metadata with focus on contamination detection. More information about the method can be found in the [pre-print](https://doi.org/10.1101/2021.06.22.449360)

## Examples

Online examples of reports generated with GRIMER: https://pirovc.github.io/grimer-reports/

## Installation

Via conda

```bash
conda install -c bioconda -c conda-forge grimer
```

or locally installing only dependencies via conda:

```bash
git clone https://github.com/pirovc/grimer.git
cd grimer
conda env create -f env.yaml
conda activate grimer # source activate grimer
conda env create -f env.yaml # or mamba env create -f env.yaml
conda activate grimer # or source activate grimer
python setup.py install --record files.txt # Uninstall: xargs rm -rf < files.txt
grimer -h
```
***Soon GRIMER will be available as a package in BioConda.***

## Usage

Expand Down Expand Up @@ -52,11 +59,94 @@ grimer -i input_table.tsv -m metadata.tsv -t ncbi #optional -b taxdump.tar.gz
grimer -i input_table.tsv -m metadata.tsv -t ncbi -c config/default.yaml -d -g
```

### List all options
### Analyzing any MGnify public study

```bash
grimer -h
./grimer-mgnify.py -i MGYS00006024 -o output_folder/
```

## Parameters

grimer

optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit

required arguments:
-i INPUT_FILE, --input-file INPUT_FILE
Main input table with counts (Observation table, Count table, Contingency Tables, ...) or .biom file. By default rows contain observations and columns contain
samples (use --tranpose if your file is reversed). First column and first row are used as headers.

main arguments:
-m METADATA_FILE, --metadata-file METADATA_FILE
Input metadata file in simple tabular format with samples in rows and metadata fields in columns. QIIME 2 metadata format is also accepted, with an extra row to
define categorical and numerical fields. If not provided and --input-file is a .biom files, will attempt to get metadata from it.
-t {ncbi,gtdb,silva,greengenes,ott}, --taxonomy {ncbi,gtdb,silva,greengenes,ott}
Define taxonomy to convert entry and annotate samples. Will automatically download and parse or files can be provided with --tax-files.
-b [TAX_FILES ...], --tax-files [TAX_FILES ...]
Optional specific taxonomy files to use.
-r [RANKS ...], --ranks [RANKS ...]
Taxonomic ranks to generate visualizations. Use 'default' to use entries from the table directly. Default: default
-c CONFIG, --config CONFIG
Configuration file with definitions of references, controls and external tools.

output arguments:
-g, --mgnify Plot MGnify chart
-d, --decontam Run and plot DECONTAM
-l TITLE, --title TITLE
Title to display on the header of the report.
-p [{overview,samples,heatmap,correlation} ...], --output-plots [{overview,samples,heatmap,correlation} ...]
Plots to generate. Default: overview,samples,heatmap,correlation
-o OUTPUT_HTML, --output-html OUTPUT_HTML
File to output report. Default: output.html
--full-offline Embed javascript library in the output file. File will be around 1.5MB bigger but also work without internet connection. That way your report will live forever.

general data options:
-f LEVEL_SEPARATOR, --level-separator LEVEL_SEPARATOR
If provided, consider --input-table to be a hierarchical multi-level table where the observations headers are separated by the indicated separator characther
(usually ';' or '|')
-y VALUES, --values VALUES
Force 'count' or 'normalized' data parsing. Empty to auto-detect.
-w, --cumm-levels Activate if input table has already cummulative values among levels.
-s, --transpose Transpose --input-table (if samples are listed on columns and observations on rows)
-u [UNASSIGNED_HEADER ...], --unassigned-header [UNASSIGNED_HEADER ...]
Define one or more header names containing unsassinged/unclassified counts.
--obs-replace [OBS_REPLACE ...]
Replace values on table observations labels/headers (support regex). Example: '_' ' ' will replace underscore with spaces, '^.+__' '' will remove the matching
regex.
--sample-replace [SAMPLE_REPLACE ...]
Replace values on table sample labels/headers (support regex). Example: '_' ' ' will replace underscore with spaces, '^.+__' '' will remove the matching regex.
-z REPLACE_ZEROS, --replace-zeros REPLACE_ZEROS
INT (add 'smallest count'/INT to every raw count), FLOAT (add FLOAT to every raw count). Default: 1000
--min-frequency MIN_FREQUENCY
Define minimum number/percentage of samples containing an observation to keep the observation [values between 0-1 for percentage, >1 specific number].
--max-frequency MAX_FREQUENCY
Define maximum number/percentage of samples containing an observation to keep the observation [values between 0-1 for percentage, >1 specific number].
--min-count MIN_COUNT
Define minimum number/percentage of counts to keep an observation [values between 0-1 for percentage, >1 specific number].
--max-count MAX_COUNT
Define maximum number/percentage of counts to keep an observation [values between 0-1 for percentage, >1 specific number].

Samples options:
-j TOP_OBS_BARS, --top-obs-bars TOP_OBS_BARS
Top abundant observations to show in the bars.

Heatmap and clustering options:
-a TRANSFORMATION, --transformation TRANSFORMATION
none (counts), norm (percentage), log (log10), clr (centre log ratio). Default: log
-e METADATA_COLS, --metadata-cols METADATA_COLS
How many metadata cols to show on the heatmap. Higher values makes plot slower to navigate.
--optimal-ordering Activate optimal_ordering on linkage, takes longer for large number of samples.
--show-zeros Do not skip zeros on heatmap. File will be bigger and iteraction with heatmap slower.
--linkage-methods [{single,complete,average,centroid,median,ward,weighted} ...]
--linkage-metrics [{braycurtis,canberra,chebyshev,cityblock,correlation,cosine,dice,euclidean,hamming,jaccard,jensenshannon,kulsinski,mahalanobis,minkowski,rogerstanimoto,russellrao,seuclidean,sokalmichener,sokalsneath,sqeuclidean,wminkowski,yule} ...]
--skip-dendrogram Disable dendogram. Will create smaller files.

Correlation options:
-x TOP_OBS_CORR, --top-obs-corr TOP_OBS_CORR
Top abundant observations to build the correlationn matrix, based on the avg. percentage counts/sample. 0 for all

## Powered by

[<img src="https://static.bokeh.org/branding/logos/bokeh-logo.png" height="60">](https://bokeh.org)
Expand Down
2 changes: 1 addition & 1 deletion config/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ references:
# "Negative Controls": "path/file1.tsv"

external:
mgnify: "files/mgnify.tsv"
mgnify: "files/mgnify5989.tsv"
decontam:
threshold: 0.1 # [0-1] P* hyperparameter
method: "frequency" # frequency, prevalence, combined
Expand Down
7 changes: 3 additions & 4 deletions env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,9 @@ dependencies:
- numpy
- scipy>=1.6.0
- scikit-bio>=0.5.6
- multitax==1.1.0
- multitax==1.1.1
- markdown
- biom-format>=2.1.10
- r-base>=4.0.0 #DECONTAM
- bioconductor-decontam==1.10.0 #DECONTAM
- r-optparse==1.6.6 #DECONTAM
- biom-format>=2.1.10 #biom
- jsonapi-client>=0.9.7 #mgnify scripts
- r-optparse==1.6.6 #DECONTAM
82 changes: 53 additions & 29 deletions files/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# GRIMER References and aux. files
# GRIMER References and other files

## Reference file format

Expand Down Expand Up @@ -27,33 +27,57 @@ references:

### contaminants.yml

Last update: 2021-04-01
Last update: 2022-03-09

| Organism group | Genus | Species |
|----------------|-------|---------|
| Bacteria | 6 | 0 | 1998 Tanner, M.A. et al. |
| Bacteria | 4 | 0 | 2003 Grahn, N. et al. |
| Bacteria | 16 | 0 | 2006 Barton, H.A. et al. |
| Bacteria | 11 | 1 | 2014 Laurence, M. et al. |
| Bacteria | 92 | 0 | 2014 Salter, S.J. et al. |
| Bacteria | 7 | 0 | 2015 Jervis-Bardy, J. et al. |
Manually curated from diverse publications:

| Organism group | Genus | Species | Reference |
|----------------|-------|---------|-----------|
| Bacteria | 6 | 0 | 1998 Tanner, M.A. et al. |
| Bacteria | 0 | 10 | 2002 Kulakov, L.A. et al. |
| Bacteria | 4 | 0 | 2003 Grahn, N. et al. |
| Bacteria | 16 | 0 | 2006 Barton, H.A. et al. |
| Bacteria | 11 | 1 | 2014 Laurence, M. et al.|
| Bacteria | 92 | 0 | 2014 Salter, S.J. et al. |
| Bacteria | 7 | 0 | 2015 Jervis-Bardy, J. et al. |
| Bacteria | 28 | 0 | 2015 Jousselin, E. et al. |
| Bacteria | 23 | 0 | 2016 Lauder, A.P. et al. |
| Bacteria | 77 | 127 | 2016 Glassing, A. et al.|
| Bacteria | 23 | 0 | 2016 Lauder, A.P. et al. |
| Bacteria | 6 | 0 | 2016 Lazarevic, V. et al. |
| Bacteria | 77 | 127 | 2016 Glassing, A. et al. |
| Bacteria | 62 | 0 | 2017 Salter, S.J. et al. |
| Bacteria | 0 | 122 | 2018 Kirstahler, P. et al. |
| Bacteria | 62 | 0 | 2017 Salter, S.J. et al. |
| Bacteria | 0 | 122 | 2018 Kirstahler, P. et al. |
| Bacteria | 34 | 0 | 2018 Stinson, L.F. et al. |
| Bacteria | 18 | 0 | 2019 Stinson, L.F. et al. |
| Bacteria | 52 | 2 | 2019 Weyrich, L.S. et al. |
| Bacteria | 8 | 26 | 2019 de Goffau, M.C. et al. |
| Bacteria | 52 | 2 | 2019 Weyrich, L.S. et al. |
| Bacteria | 15 | 93 | 2020 Nejman D. et al. |
| Viruses | 0 | 1 | 2015 Mukherjee, S. et al. |
| Viruses | 0 | 1 | 2015 Kjartansdóttir, K.R. et al. |
| Viruses | 0 | 301 | 2019 Asplund, M. et al. |
| Total (unique) | 201 | 625 | |
| Bacteria | 15 | 93 | 2020 Nejman D. et al. |
| Viruses | 0 | 1 | 2015 Kjartansdóttir, K.R. et al. |
| Viruses | 0 | 1 | 2015 Mukherjee, S. et al. |
| Viruses | 0 | 291 | 2019 Asplund, M. et al. |
| Eukaryota | 0 | 3 | 2016 Czurda, S. et al. |
| Eukaryota | 0 | 1 | PRJNA168|
| Total (unique) | 210 | 627 | |

### human-related.yml

BacDive and eHOMD dump date: 2021-04-13
Last update: 2022-03-09

Manually curated from from: Byrd, A., Belkaid, Y. & Segre, J. The human skin microbiome. Nat Rev Microbiol 16, 143–155 (2018). https://doi.org/10.1038/nrmicro.2017.157

```yaml
"Top organisms form the human skin microbiome":
"Bacteria":
url: "https://doi.org/10.1038/nrmicro.2017.157"
ids: [257758, 225324, 169292, 161879, 146827, 43765, 38304, 38287, 38286, 29466, 29388, 28037, 1747, 1305, 1303, 1290, 1282, 1270]
"Eukarya":
url: "https://doi.org/10.1038/nrmicro.2017.157"
ids: [2510778, 1047171, 379413, 119676, 117179, 76777, 76775, 76773, 44058, 41880, 36894, 34391, 31312, 5480, 5068, 3074, 2762]
"Viruses":
url: "https://doi.org/10.1038/nrmicro.2017.157"
ids: [185639, 746832, 10566, 493803, 10279, 746830, 746831, 46771]
```

BacDive and eHOMD specific subsets. Dump date: 2022-03-09

```bash
scripts/bacdive_download.py
Expand All @@ -64,15 +88,15 @@ scripts/ehomd_download.py

The downloaded MGnify database file should be provided in the main configuration file for grimer as follows:

external:
mgnify: "files/mgnify.tsv"

## mgnify.tsv
```yaml
external:
mgnify: "files/mgnify5989.tsv"
```
### mgnify.tsv

MGnify dump date: 2021-04-08 (latest study accession MGYS00005724)
MGnify dump date: 2022-03-09 (latest study accession MGYS00005989)

```bash
seq -f "MGYS%08g" 256 5724 | xargs -P 24 -I {} scripts/mgnify_download.py {} mgnify_dump_20210408/ > mgnify_dump_20210408.log 2>|1 |
scripts/mgnify_extract.py -f mgnify_dump_20210408 -t 10 -o files/mgnify.tsv
seq -f "MGYS%08g" 256 5989 | xargs -P 24 -I {} scripts/mgnify_download.py -i {} -v -g -o mgnify_dump_5989/ > mgnify_dump_5989.log 2>|1 |
scripts/mgnify_extract.py -f mgnify_dump_5989 -t 10 -o files/mgnify.tsv
```
Loading

0 comments on commit 2a5bd2f

Please sign in to comment.