From 46bd1815811a9d4cf0596d4ccc78212cf39a8e7d Mon Sep 17 00:00:00 2001 From: Jared Lumpe Date: Thu, 23 Sep 2021 23:33:34 -0600 Subject: [PATCH] Documentation updates --- docs/source/cli.rst | 152 ++++++++++++++++++++++++++++++++----- docs/source/conf.py | 7 +- docs/source/index.rst | 7 +- docs/source/install.rst | 23 +++++- docs/source/quickstart.rst | 5 ++ 5 files changed, 168 insertions(+), 26 deletions(-) create mode 100644 docs/source/quickstart.rst diff --git a/docs/source/cli.rst b/docs/source/cli.rst index bb090d5..145a1de 100644 --- a/docs/source/cli.rst +++ b/docs/source/cli.rst @@ -1,6 +1,7 @@ Command Line Interface ********************** + Root command group ================== @@ -10,27 +11,29 @@ Root command group gambit [OPTIONS] COMMAND [ARGS]... +Some top-level options are set at the root command group, and should be specified `before` the name +of the subcommand to run. Options ------- -.. option:: -d DB_DIR - - Path to directory containing GAMBIT database files. Must contain exactly one ``.db`` and one - ``.h5`` file. Required by most subcommands. As an alternative you can specify the database - location with the :envvar:`GAMBIT_DB_PATH` environment variable. +.. option:: -d, --db DIR + Path to directory containing GAMBIT database files. Required by most subcommands. + As an alternative you can specify the database location with the :envvar:`GAMBIT_DB_PATH` + environment variable. -Environment ------------ +Environment variables +--------------------- .. envvar:: GAMBIT_DB_PATH Alternative to :option:`-d` for specifying path to database. -Commands -======== + +Querying the database +===================== query ----- @@ -39,15 +42,20 @@ query :: - gambit query [OPTIONS] FILES... + gambit query [OPTIONS] GENOMES... Predict taxonomy of microbial samples from genome sequences. -Files must contain assembled genome sequences, but may have multiple contigs. +``GENOMES`` must contain assembled genome sequences, but may have multiple contigs. Alternatively +a file containing pre-calculated signatures may be used with the ``--sigfile`` option. The +reference database must be specified from the root command group. + +Options +....... -.. option:: -o, --output OUTFILE +.. option:: -o, --output FILE File to write output to. If omitted will write to stdout. @@ -55,20 +63,122 @@ Files must contain assembled genome sequences, but may have multiple contigs. Format of genome sequence files. Currently only FASTA is supported. -.. option:: -f, --outfmt {json|csv} +.. option:: -f, --outfmt {csv|json|archive} - Output format. + Results format (see next section). +.. option:: --sigfile FILE -Output Formats -============== + Path to file containing query signatures. -JSON ----- -TODO +Result Formats +-------------- CSV ---- +... + +A .csv file with one row per query. Contains the following columns: + +* ``query.name`` - Name of query. +* ``query.path`` - Path to query file, if any. +* ``predicted.name`` - Name of predicted taxon. +* ``predicted.rank`` - Rank of predicted taxon. +* ``predicted.ncbi_id`` - ID of taxon in NCBI taxonomy database. +* ``predicted.threshold`` - Distance threshold of predicted taxon. +* ``closest.distance`` - Distance to closest genome. +* ``closest.description`` - Description of closest genome. + + +JSON +.... + +A machine-readable format meant to be used in pipelines. + +.. todo:: + Document schema + + +Archive +....... + +A more verbose JSON-based format used for testing and development. + + + +Generating and inspecting k-mer signatures +========================================== + +signatures info +--------------- + +.. program:: gambit signatures info + +:: + + gambit signatures info [OPTIONS] FILE + + +Print information about a GAMBIT signatures file. Defaults to a basic human-readable format. + + +Options +....... + +.. option:: -j, --json + + Print information in JSON format. Includes more information than standard output. + +.. option:: -p, --pretty + + Prettify JSON output to make it more human-readable. + +.. option:: -i, --ids + + Print IDs of all signatures in file. + + +signatures create +----------------- + +.. program:: gambit signatures create + +:: + + gambit signatures create [OPTIONS] GENOMES + +Calculate GAMBIT signatures of ``GENOMES`` and write to file. + +The ``-k`` and ``--prefix`` options may be omitted if a reference database is specified through the +root command group, in which case the parameters of the database will be used. + + +Options +....... + +.. option:: -o, --output FILE + + Path to write file to (required). + +.. option:: -k INTEGER + + Length of k-mers to find (does not include length of prefix). + +.. option:: -p, --prefix STRING + + K-mer prefix to match, a non-empty string of DNA nucleotide codes. + +.. option:: -s, --seqfmt {fasta} + + Format of genome sequence files. Currently only FASTA is supported. + +.. option:: -i, --ids FILE + + File containing IDs to assign to signatures in file metadata. Should contain one ID per line. + +.. option:: -m, --meta-json FILE + + JSON file containing metadata to attach to file. -(Not yet implemented) + .. todo:: + Document schema diff --git a/docs/source/conf.py b/docs/source/conf.py index 4788649..9932cbc 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -34,7 +34,7 @@ 'sphinx.ext.autodoc', # 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', - # 'sphinx.ext.todo', + 'sphinx.ext.todo', # 'sphinx.ext.coverage', # 'sphinx.ext.mathjax', # 'sphinx.ext.viewcode', @@ -61,3 +61,8 @@ # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] + + +# -- Additional options ------------------------------------------------------ + +todo_include_todos = True diff --git a/docs/source/index.rst b/docs/source/index.rst index b0a2f7d..d7fcf66 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -1,11 +1,14 @@ GAMBIT Documentation ******************** +Contents +======== + .. toctree:: - :maxdepth: 2 - :hidden: + :maxdepth: 1 install + quickstart cli api/api diff --git a/docs/source/install.rst b/docs/source/install.rst index 1a1a3ff..575e8e4 100644 --- a/docs/source/install.rst +++ b/docs/source/install.rst @@ -2,13 +2,32 @@ Installation and Setup ====================== +Install from bioconda +--------------------- + +The recommended way to install the tool is through the conda package manager (available +`here `_):: + + conda install -c bioconda hesslab-gambit + + Install from source ------------------- -TODO +Installing from source requires the ``cython`` package as well as a C compiler be installed on your +system. Clone the repository and navigate to the directory, and then run:: + + pip install . + +Or do an editable development install with:: + + pip install -e . Database files -------------- -TODO +Download the following files and place them in a directory of your choice: + +* `gambit-genomes-1.0b1-210719.db `_ +* `gambit-signatures-1.0b1-210719.h5 `_ diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst new file mode 100644 index 0000000..bf24f8a --- /dev/null +++ b/docs/source/quickstart.rst @@ -0,0 +1,5 @@ +Quick Start +*********** + +.. todo:: + \