From 5d63ca72a3a092728eec21a05ddbf9de55960331 Mon Sep 17 00:00:00 2001 From: "Dr. K.D. Murray" Date: Thu, 1 Feb 2024 11:01:18 +0100 Subject: [PATCH] paper: final comments from Brice --- paper/paper.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/paper/paper.md b/paper/paper.md index 2023222..f459939 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -37,7 +37,7 @@ bibliography: paper.bib # Summary -Acanthophis is a comprehensive pipeline for the joint discovery and analysis of both plant genetic variation and variation in the composition and abundance of plant-associated microbiomes. +Acanthophis is a comprehensive pipeline for the joint analysis of both plant genetic variation and variation in the composition and abundance of plant-associated microbiomes (together, the "hologenome"). Implemented in Snakemake [@koster12_snakemakescalable], Acanthophis handles data from raw FASTQ read files through quality control, alignment of the reads to a plant reference, variant calling, taxonomic classification and quantification of microbes, and metagenome analysis. The workflow contains numerous practical optimisations, both to reduce disk space usage and maximise utilisation of computational resources. Acanthophis is available under the Mozilla Public Licence v2 at as a python package installable from conda or PyPI (`pip install acanthophis`). @@ -48,13 +48,13 @@ Understanding plant biology benefits from ecosystem-scale analysis of genetic va Such analyses are often data intensive, particularly at the scale required for quantitative analyses, i.e. hundreds to thousands of host individuals [@regalado20_combining]. They demand computationally-efficient pipelines that perform both host genotyping and host-associated microbiome characterisation in a consistent, flexible, and reproducible fashion. -Currently, no such unified pipelines exist. Previous pipelines perform only a subset of these tasks (e.g. Snakemake's variant calling pipeline; @koster21_snakemakeworkflows). In addition, most host-aware microbiome analysis pipelines do not allow for host genotyping and/or assume an animal host (e.g. Taxprofiler; @yates23_nfcore). Acanthophis has attracted many users, and has been referred to in peer-reviewed journal articles and preprints (e.g. @murray19_landscapedrivers; @ahrens21_genomicconstraints). +Currently, no such unified pipelines exist. Previous pipelines perform only a subset of these tasks (e.g. Snakemake's variant calling pipeline; @koster21_snakemakeworkflows). In addition, most host-aware microbiome analysis pipelines do not allow for genotyping and/or assume an animal host (e.g. Taxprofiler; @yates23_nfcore). Acanthophis has attracted many users, and has been referred to in peer-reviewed journal articles and preprints (e.g. @murray19_landscapedrivers; @ahrens21_genomicconstraints). # Components and Features -Acanthophis is a pipeline for the analysis of plant population resequencing data. It expects short-read shotgun whole (meta-)genome sequencing data, typically of plants collected in the field (nothing fundamentally prevents Acanthophis operating on long-read data, however additional tools would need to be incorporated, which will happen given sufficient user demand). A typical dataset might be 10s-1000s of samples from one or multiple closely related species, sequenced with 2x150bp paired-end short read sequencing. In a plant-microbe interaction genomics study, these plants and therefore sequencing libraries can contain microbes (a "hologenome"), however datasets focusing only on host genome variation are also catered for. Acanthophis can be configured to do any of the following analyses: mapping reads to a reference, calling variants, annotating variant effects, estimating genetic distances *de novo*, and profiling and/or assembling metagenomes. While we developed Acanthophis to handle plant data, there is no reason why it cannot be applied to other taxa, however some parameters may need adjustment (see below). Philosophically, Acanthophis aims to for maximum efficiency and flexibility, and therefore does not bake any particular biological question into its outputs. As such, each user should for example filter the resulting variant files as appropriate for their biological question(s), and likewise apply other post-processing as needed. +Acanthophis is a pipeline for the analysis of plant population resequencing data. It expects short-read shotgun whole (meta-)genome sequencing data, typically of plants collected in the field (nothing fundamentally prevents Acanthophis operating on long-read data, however additional tools would need to be incorporated, which will happen given sufficient user demand). A typical dataset might be 10s-1000s of samples from one or multiple closely related species, sequenced with 2x150bp paired-end short read sequencing. In a plant-microbe interaction genomics study, these plants and therefore sequencing libraries can contain microbes (a "hologenome"), however datasets focusing only on host genome variation are also catered for. Acanthophis can be configured to do any of the following analyses: mapping reads to a reference, calling variants, annotating variant effects, estimating genetic distances directly from sequence reads (*de novo*), and profiling and/or assembling metagenomes. While we developed Acanthophis to handle plant data, there is no reason why it cannot be applied to other taxa, however some parameters may need adjustment (see below). Philosophically, Acanthophis aims to for maximum efficiency and flexibility, and therefore does not bake any particular biological question into its outputs. As such, each user should for example filter the resulting variant files as appropriate for their biological question(s), and likewise apply other post-processing as needed. -Across the entire pipeline, Acanthophis operates on 'sample sets', named groups of one or more samples, and each sample can be in any number of sample sets. The pipeline is configured via a global `config.yaml` file, in which one can configure the pipeline per sample-set. This way, one can configure the analyses to be run (most can be disabled if not needed), as well as tool-specific settings or thresholds. We provide a documented template as well as a reproducible workflow to simulate test data, which can be used as a basis for customisation. While Acanthophis is cross-platform, most of the underlying tools are only packaged for and/or only operate on GNU/Linux operating systems. Therefore, Acanthophis is only actively supported for users on Linux systems. +Across the entire pipeline, Acanthophis operates on 'sample sets', named groups of one or more samples, and each sample can be in any number of sample sets. The pipeline is configured via a global `config.yaml` file, in which one can configure the pipeline per sample-set. This way, one can configure the analyses to be run (most of the below analysis stages can be skipped if not needed), as well as tool-specific settings or thresholds. We provide a documented template as well as a reproducible workflow to simulate test data, which can be used as a basis for customisation. While Acanthophis is cross-platform, most of the underlying tools are only packaged for and/or only operate on GNU/Linux operating systems. Therefore, Acanthophis is only actively supported for users on Linux systems. ## Stage 1: Raw reads to per-sample reads @@ -88,6 +88,6 @@ Throughout all pipeline stages, various tools output summaries of their actions # Acknowledgements -We thank Luisa Teasdale, Anne-Cecile Colin, Rose Andrew, Johannes Köster, and Scott Ferguson for comments or advice on Acanthophis and/or on this manuscript. KDM is supported by a Marie Skłodowska-Curie Actions fellowship. KDM and DW are supported by +We thank Brice Letcher, George Bouras, Abhishek Tiwari, Luisa Teasdale, Anne-Cecile Colin, Rose Andrew, Johannes Köster, and Scott Ferguson for comments or advice on Acanthophis and/or on this manuscript. KDM is supported by a Marie Skłodowska-Curie Actions fellowship. This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No. 951444-PATHOCOM to DW). This work was supported financially by the Australian Research Council (CE140100008; DP150103591; DE190100326). The research was undertaken with the assistance of resources from the National Computational Infrastructure (NCI), which is supported by the Australian Government. # References