diff --git a/README.md b/README.md index de1aea1..7556db1 100644 --- a/README.md +++ b/README.md @@ -10,13 +10,12 @@ The DCP welcomes any contributed notebooks or other tutorials to the list below. ## Vignettes Table of Contents: -* [Explore an HCA Data Set in Scanpy (May 2019)](Explore%20an%20HCA%20Data%20Set%20in%20Scanpy%20(May%202019)/README.md) -* [Download Expression Matrix for Scanpy](Download%20Expression%20Matrix%20for%20Scanpy/README.md) -* [Explore an HCA Data Set in Scanpy (Nov 2018)](Explore%20an%20HCA%20Data%20Set%20in%20Scanpy%20(Nov%202018)/README.md) -* [SPARK Example](SPARK%20Example/README.md) -* [Login to the DSS](Login%20to%20the%20DSS/README.md) -* [Download all .fastq files in a project](Download%20all%20.fastq%20files%20in%20a%20project/README.md) -* [Download Any BAM File](Download%20Any%20BAM%20File/README.md) -* [Find Cell Type Count](Find%20Cell%20Type%20Count/README.md) -* [Install the HCA CLI tools](Install%20the%20HCA%20CLI%20tools/README.md) -* [Download 10x Seq T-cell Bundles](Download%2010x%20Seq%20T-cell%20Bundles/README.md) +* [Install the HCA CLI tools](install-hca/install-hca.ipynb) +* [Login to the DSS](login/login.ipynb) +* [Writing ElasticSearch Queries](elasticsearch-queries/elasticsearch-queries.ipynb) +* [Download 10x Liver Cell Data](download-10x-seq-tcell/download-10x-liver.ipynb) +* [Download All Fastq Files in a Project](download-all-fastq-files/download-all-fastq-files.ipynb) +* [Download Any BAM File](download-any-bam-file/download-any-bam-file.ipynb) +* [Download Smartseq2 Expression Matrix as an Input to Scanpy](download-smartseq2-matrix-scanpy/download-smartseq2-matrix-scanpy.ipynb) +* [Explore an HCA Data Set in Scanpy](explore-hca-dataset-scanpy/explore-hca-dataset-scanpy.ipynb) +* [Find Cell Type Count](find-cell-type-count/find-cell-type-count.ipynb) diff --git a/Download 10x Seq T-cell Bundles/README.md b/download-10x-seq-tcell/README.md similarity index 98% rename from Download 10x Seq T-cell Bundles/README.md rename to download-10x-seq-tcell/README.md index 2218061..1da2d53 100644 --- a/Download 10x Seq T-cell Bundles/README.md +++ b/download-10x-seq-tcell/README.md @@ -1,5 +1,5 @@ -# Download All Bundles for T-Cells Sequenced with 10x - -In this task, the goal is to download all bundles containing data on T-cells and 10x sequencing. - -See the [notebook](https://github.com/HumanCellAtlas/data-consumer-vignettes/blob/jk-download-t-cells-with-10x/tasks/Download%2010x%20Seq%20T-cell%20Bundles/Download%20All%20Bundles%20for%20T-cells%20Sequenced%20with%2010x.ipynb). +# Download All Bundles for T-Cells Sequenced with 10x + +In this task, the goal is to download all bundles containing data on T-cells and 10x sequencing. + +See the [notebook](https://github.com/HumanCellAtlas/data-consumer-vignettes/blob/jk-download-t-cells-with-10x/tasks/Download%2010x%20Seq%20T-cell%20Bundles/Download%20All%20Bundles%20for%20T-cells%20Sequenced%20with%2010x.ipynb). diff --git a/Download 10x Seq T-cell Bundles/Download All Bundles for T-cells Sequenced with 10x.ipynb b/download-10x-seq-tcell/download-10x-seq-tcell.ipynb similarity index 100% rename from Download 10x Seq T-cell Bundles/Download All Bundles for T-cells Sequenced with 10x.ipynb rename to download-10x-seq-tcell/download-10x-seq-tcell.ipynb diff --git a/Download all .fastq files in a project/README.md b/download-all-fastq-files/README.md similarity index 100% rename from Download all .fastq files in a project/README.md rename to download-all-fastq-files/README.md diff --git a/Download all .fastq files in a project/Download all .fastq files in a project.ipynb b/download-all-fastq-files/download-all-fastq-files.ipynb similarity index 100% rename from Download all .fastq files in a project/Download all .fastq files in a project.ipynb rename to download-all-fastq-files/download-all-fastq-files.ipynb diff --git a/Download Any BAM File/README.md b/download-any-bam-file/README.md similarity index 100% rename from Download Any BAM File/README.md rename to download-any-bam-file/README.md diff --git a/Download Any BAM File/Download Any BAM.ipynb b/download-any-bam-file/download-any-bam-file.ipynb similarity index 100% rename from Download Any BAM File/Download Any BAM.ipynb rename to download-any-bam-file/download-any-bam-file.ipynb diff --git a/Download Any BAM File/requirements.txt b/download-any-bam-file/requirements.txt similarity index 100% rename from Download Any BAM File/requirements.txt rename to download-any-bam-file/requirements.txt diff --git a/Download Expression Matrix for Scanpy/README.md b/download-smartseq2-matrix-scanpy/README.md similarity index 98% rename from Download Expression Matrix for Scanpy/README.md rename to download-smartseq2-matrix-scanpy/README.md index e8879f6..50629d2 100644 --- a/Download Expression Matrix for Scanpy/README.md +++ b/download-smartseq2-matrix-scanpy/README.md @@ -1,5 +1,5 @@ -# Download SmartSeq2 Expression Matrix as an Input to Scanpy - -In this task, suppose you want to download an expression matrix for use in Python's scanpy module. - -See the [notebook](https://github.com/HumanCellAtlas/data-consumer-vignettes/blob/feature/download-expression-matrix/tasks/Download%20Expression%20Matrix%20for%20Scanpy/Download%20SmartSeq2%20Expression%20Matrix%20as%20Input%20to%20Scanpy.ipynb). +# Download SmartSeq2 Expression Matrix as an Input to Scanpy + +In this task, suppose you want to download an expression matrix for use in Python's scanpy module. + +See the [notebook](https://github.com/HumanCellAtlas/data-consumer-vignettes/blob/feature/download-expression-matrix/tasks/Download%20Expression%20Matrix%20for%20Scanpy/Download%20SmartSeq2%20Expression%20Matrix%20as%20Input%20to%20Scanpy.ipynb). diff --git a/Download Expression Matrix for Scanpy/Download SmartSeq2 Expression Matrix as Input to Scanpy.ipynb b/download-smartseq2-matrix-scanpy/download-smartseq2-matrix-scanpy.ipynb similarity index 100% rename from Download Expression Matrix for Scanpy/Download SmartSeq2 Expression Matrix as Input to Scanpy.ipynb rename to download-smartseq2-matrix-scanpy/download-smartseq2-matrix-scanpy.ipynb diff --git a/Explore an HCA Data Set in Scanpy (May 2019)/ENSG_to_name.clean.csv b/explore-hca-dataset-scanpy-may-2019/ENSG_to_name.clean.csv similarity index 100% rename from Explore an HCA Data Set in Scanpy (May 2019)/ENSG_to_name.clean.csv rename to explore-hca-dataset-scanpy-may-2019/ENSG_to_name.clean.csv diff --git a/Explore an HCA Data Set in Scanpy (May 2019)/README.md b/explore-hca-dataset-scanpy-may-2019/README.md similarity index 100% rename from Explore an HCA Data Set in Scanpy (May 2019)/README.md rename to explore-hca-dataset-scanpy-may-2019/README.md diff --git a/Explore an HCA Data Set in Scanpy (May 2019)/f9e363cd-7fa5-4349-add2-1c9bd86d10c8.loom b/explore-hca-dataset-scanpy-may-2019/f9e363cd-7fa5-4349-add2-1c9bd86d10c8.loom similarity index 100% rename from Explore an HCA Data Set in Scanpy (May 2019)/f9e363cd-7fa5-4349-add2-1c9bd86d10c8.loom rename to explore-hca-dataset-scanpy-may-2019/f9e363cd-7fa5-4349-add2-1c9bd86d10c8.loom diff --git a/Explore an HCA Data Set in Scanpy (May 2019)/notebooks_hca_demo_scanpy.ipynb b/explore-hca-dataset-scanpy-may-2019/notebooks_hca_demo_scanpy.ipynb similarity index 100% rename from Explore an HCA Data Set in Scanpy (May 2019)/notebooks_hca_demo_scanpy.ipynb rename to explore-hca-dataset-scanpy-may-2019/notebooks_hca_demo_scanpy.ipynb diff --git a/Explore an HCA Data Set in Scanpy (May 2019)/requirements.txt b/explore-hca-dataset-scanpy-may-2019/requirements.txt similarity index 100% rename from Explore an HCA Data Set in Scanpy (May 2019)/requirements.txt rename to explore-hca-dataset-scanpy-may-2019/requirements.txt diff --git a/Explore an HCA Data Set in Scanpy (May 2019)/terra.sh b/explore-hca-dataset-scanpy-may-2019/terra.sh similarity index 100% rename from Explore an HCA Data Set in Scanpy (May 2019)/terra.sh rename to explore-hca-dataset-scanpy-may-2019/terra.sh diff --git a/Explore an HCA Data Set in Scanpy (Nov 2018)/ENSG_to_name.csv b/explore-hca-dataset-scanpy-nov-2018/ENSG_to_name.csv similarity index 100% rename from Explore an HCA Data Set in Scanpy (Nov 2018)/ENSG_to_name.csv rename to explore-hca-dataset-scanpy-nov-2018/ENSG_to_name.csv diff --git a/Explore an HCA Data Set in Scanpy (Nov 2018)/README.md b/explore-hca-dataset-scanpy-nov-2018/README.md similarity index 96% rename from Explore an HCA Data Set in Scanpy (Nov 2018)/README.md rename to explore-hca-dataset-scanpy-nov-2018/README.md index 4c9029b..e7de869 100644 --- a/Explore an HCA Data Set in Scanpy (Nov 2018)/README.md +++ b/explore-hca-dataset-scanpy-nov-2018/README.md @@ -1,6 +1,6 @@ # HCA Scanpy Demo -The vignette in this folder demonstrates the ability to load an expression matrix downloaded +The vignette in this folder demonstrates the ability to load an expression matrix downloaded from the HCA browser into scanpy and briefly explore the data. The downloaded file, `pancreas.loom`, is the result of end-to-end processing in the Data Coordination Platform workflow, and is included in this repo. The file `ENSG_to_name.csv` contains the Gencode identifiers for the genes in the HCA expression matrices, along with the more commonly used gene symbol (i.e. `ENSG00000115263.14,GCG`). This might be generally useful in your exploration of HCA expression matrices. @@ -8,7 +8,7 @@ The file `ENSG_to_name.csv` contains the Gencode identifiers for the genes in th ## Installation -This notebook assumes that you have installed python 3 on your system, +This notebook assumes that you have installed python 3 on your system, and that the `pip` executable installs packages for that python distribution. For some users, it may be necessary to use `pip3` instead. diff --git a/Explore an HCA Data Set in Scanpy (Nov 2018)/hca_demo_scanpy.ipynb b/explore-hca-dataset-scanpy-nov-2018/hca_demo_scanpy.ipynb similarity index 100% rename from Explore an HCA Data Set in Scanpy (Nov 2018)/hca_demo_scanpy.ipynb rename to explore-hca-dataset-scanpy-nov-2018/hca_demo_scanpy.ipynb diff --git a/Explore an HCA Data Set in Scanpy (Nov 2018)/pancreas.loom b/explore-hca-dataset-scanpy-nov-2018/pancreas.loom similarity index 100% rename from Explore an HCA Data Set in Scanpy (Nov 2018)/pancreas.loom rename to explore-hca-dataset-scanpy-nov-2018/pancreas.loom diff --git a/Explore an HCA Data Set in Scanpy (Nov 2018)/requirements.txt b/explore-hca-dataset-scanpy-nov-2018/requirements.txt similarity index 100% rename from Explore an HCA Data Set in Scanpy (Nov 2018)/requirements.txt rename to explore-hca-dataset-scanpy-nov-2018/requirements.txt diff --git a/Find Cell Type Count/README.md b/find-cell-type-count/README.md similarity index 97% rename from Find Cell Type Count/README.md rename to find-cell-type-count/README.md index ee3ec7c..923330a 100644 --- a/Find Cell Type Count/README.md +++ b/find-cell-type-count/README.md @@ -1,6 +1,6 @@ -# Find Cell Type Count - -In this task, you want to count the number of available cells of a certain cell type. For instance, -you might ask, "How many _hematopoietic system_ cells are available in the database?" - -See the [notebook](Find%20Cell%20Type%20Count.ipynb). +# Find Cell Type Count + +In this task, you want to count the number of available cells of a certain cell type. For instance, +you might ask, "How many _hematopoietic system_ cells are available in the database?" + +See the [notebook](Find%20Cell%20Type%20Count.ipynb). diff --git a/Find Cell Type Count/Find Cell Type Count.ipynb b/find-cell-type-count/find-cell-type-count.ipynb similarity index 100% rename from Find Cell Type Count/Find Cell Type Count.ipynb rename to find-cell-type-count/find-cell-type-count.ipynb diff --git a/Install the HCA CLI tools/README.md b/install-hca/README.md similarity index 100% rename from Install the HCA CLI tools/README.md rename to install-hca/README.md diff --git a/Install the HCA CLI tools/Install.ipynb b/install-hca/install-hca.ipynb similarity index 100% rename from Install the HCA CLI tools/Install.ipynb rename to install-hca/install-hca.ipynb diff --git a/Login to the DSS/README.md b/login/README.md similarity index 100% rename from Login to the DSS/README.md rename to login/README.md diff --git a/Login to the DSS/Log In.ipynb b/login/login.ipynb similarity index 100% rename from Login to the DSS/Log In.ipynb rename to login/login.ipynb diff --git a/SPARK Example/README.md b/spark-example/README.md similarity index 99% rename from SPARK Example/README.md rename to spark-example/README.md index 4d4093f..8683816 100644 --- a/SPARK Example/README.md +++ b/spark-example/README.md @@ -193,7 +193,7 @@ d03e1fb5-0bd6-41ad-9744-c87af0fbdc33 40 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCT This is an extremely basic example of how to access HCA data on AWS using SPARK. A future iteration of this code could do something much more interesting. I'm -thinking of implementing what Titus Brown covers in this [blog post](http://ivory.idyll.org/blog/2016-sourmash.html). +thinking of implementing what Titus Brown covers in this [blog post](http://ivory.idyll.org/blog/2016-sourmash.html). Specifically it would be super cool to calculate MinHash signature for each of the biomaterials in HCA and use these to quickly compare samples by samples. Perhaps this would be useful in building a feature to ask "what cells are diff --git a/SPARK Example/img/emr1.png b/spark-example/img/emr1.png similarity index 100% rename from SPARK Example/img/emr1.png rename to spark-example/img/emr1.png diff --git a/SPARK Example/img/emr2.png b/spark-example/img/emr2.png similarity index 100% rename from SPARK Example/img/emr2.png rename to spark-example/img/emr2.png diff --git a/SPARK Example/img/emr3.png b/spark-example/img/emr3.png similarity index 100% rename from SPARK Example/img/emr3.png rename to spark-example/img/emr3.png diff --git a/SPARK Example/img/emr4.png b/spark-example/img/emr4.png similarity index 100% rename from SPARK Example/img/emr4.png rename to spark-example/img/emr4.png diff --git a/SPARK Example/img/launch_job.png b/spark-example/img/launch_job.png similarity index 100% rename from SPARK Example/img/launch_job.png rename to spark-example/img/launch_job.png diff --git a/SPARK Example/pom.xml b/spark-example/pom.xml similarity index 100% rename from SPARK Example/pom.xml rename to spark-example/pom.xml diff --git a/SPARK Example/src/main/java/kmer/DataStoreClient.java b/spark-example/src/main/java/kmer/DataStoreClient.java similarity index 100% rename from SPARK Example/src/main/java/kmer/DataStoreClient.java rename to spark-example/src/main/java/kmer/DataStoreClient.java diff --git a/SPARK Example/src/main/java/kmer/HttpUtil.java b/spark-example/src/main/java/kmer/HttpUtil.java similarity index 100% rename from SPARK Example/src/main/java/kmer/HttpUtil.java rename to spark-example/src/main/java/kmer/HttpUtil.java diff --git a/SPARK Example/src/main/java/kmer/Kmer.java b/spark-example/src/main/java/kmer/Kmer.java similarity index 100% rename from SPARK Example/src/main/java/kmer/Kmer.java rename to spark-example/src/main/java/kmer/Kmer.java diff --git a/SPARK Example/src/main/java/kmer/Retriable.java b/spark-example/src/main/java/kmer/Retriable.java similarity index 99% rename from SPARK Example/src/main/java/kmer/Retriable.java rename to spark-example/src/main/java/kmer/Retriable.java index 132c6e4..2b6ae14 100644 --- a/SPARK Example/src/main/java/kmer/Retriable.java +++ b/spark-example/src/main/java/kmer/Retriable.java @@ -47,4 +47,3 @@ static Optional runWithRetries( return runWithRetries(maxRetries, maxInterval, supplier, t -> true); } }; - diff --git a/SPARK Example/src/main/java/kmer/SparkUtil.java b/spark-example/src/main/java/kmer/SparkUtil.java similarity index 88% rename from SPARK Example/src/main/java/kmer/SparkUtil.java rename to spark-example/src/main/java/kmer/SparkUtil.java index f86f166..a13e6d8 100644 --- a/SPARK Example/src/main/java/kmer/SparkUtil.java +++ b/spark-example/src/main/java/kmer/SparkUtil.java @@ -4,14 +4,14 @@ import org.apache.spark.api.java.JavaSparkContext; /** - * This is a utility class to create JavaSparkContext - * and other objects required by Spark. There are many - * ways to create JavaSparkContext object. Here we offer + * This is a utility class to create JavaSparkContext + * and other objects required by Spark. There are many + * ways to create JavaSparkContext object. Here we offer * 2 ways to create it: * * 1. by using YARN's resource manager host name * - * 2. by using spark master URL, which is expressed as: + * 2. by using spark master URL, which is expressed as: * * spark://:7077 * diff --git a/SPARK Example/src/test/java/kmer/DataStoreClientTest.java b/spark-example/src/test/java/kmer/DataStoreClientTest.java similarity index 99% rename from SPARK Example/src/test/java/kmer/DataStoreClientTest.java rename to spark-example/src/test/java/kmer/DataStoreClientTest.java index 559201c..6922cb6 100644 --- a/SPARK Example/src/test/java/kmer/DataStoreClientTest.java +++ b/spark-example/src/test/java/kmer/DataStoreClientTest.java @@ -70,4 +70,3 @@ public void testSerialize() throws IOException { oos.writeObject(new DataStoreClient("https://somewhere.com/")); } } - diff --git a/SPARK Example/src/test/java/kmer/HttpUtilTest.java b/spark-example/src/test/java/kmer/HttpUtilTest.java similarity index 100% rename from SPARK Example/src/test/java/kmer/HttpUtilTest.java rename to spark-example/src/test/java/kmer/HttpUtilTest.java diff --git a/SPARK Example/src/test/java/kmer/RetriableTest.java b/spark-example/src/test/java/kmer/RetriableTest.java similarity index 99% rename from SPARK Example/src/test/java/kmer/RetriableTest.java rename to spark-example/src/test/java/kmer/RetriableTest.java index 2f5679e..65c005d 100644 --- a/SPARK Example/src/test/java/kmer/RetriableTest.java +++ b/spark-example/src/test/java/kmer/RetriableTest.java @@ -22,5 +22,3 @@ public void runWithRetriesShouldReturnSuccessfulValue() throws InterruptedExcept assertTrue(opt.get() == 2); } } - - diff --git a/toc.md.j2 b/toc.md.j2 deleted file mode 100644 index 8157c44..0000000 --- a/toc.md.j2 +++ /dev/null @@ -1,13 +0,0 @@ -# Data Consumer Vignettes - -[![Build Status](https://travis-ci.com/HumanCellAtlas/data-consumer-vignettes.svg?branch=master)](https://travis-ci.com/HumanCellAtlas/data-consumer-vignettes) - -## Overview - -Welcome to the DCP Vignette repository, containing walkthrough tutorials to help you get started with the DCP primarily via command-line access. For downstream application development, please refer to the [HCA DCP API documentation](https://prod.data.humancellatlas.org/apis). - -The DCP welcomes any contributed notebooks or other tutorials to the list below. You can create your own branch and submit a pull request. - -## Vignettes Table of Contents: - -{{toc}} diff --git a/tocgen.py b/tocgen.py deleted file mode 100644 index d672620..0000000 --- a/tocgen.py +++ /dev/null @@ -1,29 +0,0 @@ -from jinja2 import Environment, FileSystemLoader -import os, glob - -# Sets template to toc.md.j2 located at the root of project -env = Environment(loader=FileSystemLoader(os.path.dirname('./'))) -template = env.get_template('toc.md.j2') - -# Sets dir to root of project -rootDir = '.' - -# Gets a list of dirs with README -list_dirs = glob.glob("./**/*.md") - -content = "" - -# For list of items, perform split to get the wanted parts of the path and append it to -# content var -for items in list_dirs: - root, dirName, readMe = items.split("/") - link = os.path.join(dirName, readMe).replace(" ","%20") - if dirName != 'test': - content += "* [{}]({})\n".format(dirName, link) - -# Saves content to object that will be used to write to template -vars = {"toc": content} - -# Writes to README.md based on template -with open('README.md', 'w') as f: - f.write(template.render(**vars))