diff --git a/pipelines/optimus/Optimus.changelog.md b/pipelines/optimus/Optimus.changelog.md index 38c6aa177..61db5a2f4 100644 --- a/pipelines/optimus/Optimus.changelog.md +++ b/pipelines/optimus/Optimus.changelog.md @@ -1,16 +1,17 @@ # optimus_v3.0.0 2020-06-10 (Date of Last Commit) -* Removed zarr formatted matrix and metrics outputs and replaced with Loom -* Removed emptyDrops for sn_rna mode -* Updated Loom file attribute names: CellID to cell_names, Gene to gene_names, and Accession to ensembl_ids +* Removed the Zarr formatted matrix and metrics outputs and replaced with Loom +* Removed EmptyDrops for sn_rna mode +* Updated the Loom file attribute names: CellID to cell_names, Gene to gene_names, and Accession to ensembl_ids * Added metrics for mitochondrial reads +* Added an optional input for the BAM basename; this input is listed as ‘bam_output_basename’and the default is 'sample_id' # optimus_v2.0.0 2020-02-08 (Date of Last Commit) -* Fixed bug that resulted in emptyDrops output being incorrect -* Updated workflow to WDL 1.0 +* Fixed a bug that resulted in emptyDrops output being incorrect +* Updated the workflow to WDL 1.0 # optimus_v1.4.0 diff --git a/pipelines/optimus/Optimus.wdl b/pipelines/optimus/Optimus.wdl index e028246f9..b2f241ea1 100644 --- a/pipelines/optimus/Optimus.wdl +++ b/pipelines/optimus/Optimus.wdl @@ -31,6 +31,7 @@ workflow Optimus { Array[File] r2_fastq Array[File]? i1_fastq String sample_id + String? output_bam_basename = sample_id # organism reference parameters File tar_star_reference @@ -222,7 +223,7 @@ workflow Optimus { call Merge.MergeSortBamFiles as MergeSorted { input: bam_inputs = PreMergeSort.bam_output, - output_bam_filename = sample_id + ".bam", + output_bam_filename = output_bam_basename + ".bam", sort_order = "coordinate" } diff --git a/pipelines/optimus/README.md b/pipelines/optimus/README.md index 327a3ae1e..4cd2236f1 100644 --- a/pipelines/optimus/README.md +++ b/pipelines/optimus/README.md @@ -37,6 +37,10 @@ Optimus is a pipeline developed by the Data Coordination Platform (DCP) of the [ Optimus has been validated for analyzing both [human](https://github.com/HumanCellAtlas/skylab/blob/master/benchmarking/optimus/optimus_report.rst) and [mouse](https://docs.google.com/document/d/1_3oO0ZQSrwEoe6D3GgKdSmAQ9qkzH_7wrE7x6_deL10/edit) data sets. More details about the human validation can be found in the [in the original file](https://docs.google.com/document/d/158ba_xQM9AYyu8VcLWsIvSoEYps6PQhgddTr9H0BFmY/edit). +| **Update on Single Nuclei RNAseq (sn_rna) Pipeline** | +| --- | +| We are in the process of validating Optimus for snRNAseq using `sn_rna` parameter. These changes are detailed in the documentation. Once the pipeline is validated for snRNAseq, we will provide the validation report link in the above section. | + ## Quick Start Table | Pipeline Features | Description | Source | @@ -90,6 +94,7 @@ The JSON file also contains metadata for the reference information in the follow | Annotations_gtf | Cloud path to GTF containing gene annotations used for gene tagging (must match GTF in STAR reference) | NA | | Chemistry | Optional string description of whether data was generated with 10x v2 or v3 chemistry. Optimus validates this string. If the string does not match one of the optional strings, the pipeline will fail. You can remove the checks by setting "force_no_check = true" in the input JSON | "tenX_v2" (default) or "tenX_v3" | | Counting_mode | String description of whether data is single-cell or single-nuclei | "sc_rna" or "sn_rna" | +| Output_bam_basename | Optional string used for the output BAM file basename; the default is sample_id | NA | ### Sample Inputs for Analyses in a Terra Workspace @@ -193,12 +198,12 @@ Output files of the pipeline include: 3. Cell metadata, including cell metrics 4. Gene metadata, including gene metrics -The following table lists the output files produced from the pipeline. For samples that have sequenced over multiple lanes, the pipeline will output one merged version of each listed file. +The following table lists the output files produced from the pipeline. For samples that have sequenced over multiple lanes, the pipeline will output one merged version of each listed file. | Output Name | Filename, if applicable | Output Type |Output Format | | ------ |------ | ------ | ------ | | pipeline_version | | Version of the processing pipeline run on this data | String | -| bam | merged.bam | aligned bam | bam | +| bam | .bam | Aligned BAM | BAM | | matrix_row_index | sparse_counts_row_index.npy | Index of cells in expression matrix | Numpy array index | | matrix_col_index | sparse_counts_col_index.npy | Index of genes in expression matrix | Numpy array index | | cell_metrics | merged-cell-metrics.csv.gz | cell metrics | compressed csv | Matrix of metrics by cells | @@ -206,10 +211,7 @@ The following table lists the output files produced from the pipeline. For sampl | loom_output_file | output.loom | Loom | Loom | Loom file with expression data and metadata | N/A | -The Loom is the default output. See the [create_loom_optimus.py](https://github.com/HumanCellAtlas/skylab/blob/master/docker/loom-output/create_loom_optimus.py) for the detailed code. - - -The final Loom output contains the unnormalized (unfiltered), UMI-corrected expression matrices, as well as the gene and cell metrics detailed in the [Loom_schema documentation](https://github.com/HumanCellAtlas/skylab/blob/master/pipelines/optimus/Loom_schema.md). +The Loom is the default output. See the [create_loom_optimus.py](https://github.com/HumanCellAtlas/skylab/blob/master/docker/loom-output/create_loom_optimus.py) for the detailed code. The final Loom output contains the unnormalized (unfiltered), UMI-corrected expression matrices, as well as the gene and cell metrics detailed in the [Loom_schema documentation](https://github.com/HumanCellAtlas/skylab/blob/master/pipelines/optimus/Loom_schema.md). | Zarr Array Deprecation Notice June 2020 | | --- |