Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify siphon example to illustrate indexing and looping over datasets #118

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 112 additions & 4 deletions pages/workshop/Siphon/Siphon Overview.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,105 @@
"cat.datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above, `cat.datasets` is a dictionary, and the print above is a list of its keys, which are also the filenames of the datasets. Accessing `cat.datasets` with one of these keys will return the `Dataset` that can be used to download the data, as you'll see below.\n",
"\n",
"However, `datasets` is a special kind of dictionary that also can be indexed by position. It looks like it's an array or a list, but it's not. Let's get the first dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can also try using a filename from the print statement above in place of the 0.\n",
"ds = cat.datasets[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now download the dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.download()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Look in your file explorer panel or run the cell below to verify that we did actually download the file!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os; os.listdir()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead, if you loop through `cat.datasets` you will get a sequence of keys: the filenames. \n",
"\n",
"The example below shows how to loop through the keys, retreive each `Dataset` and download it.\n",
"\n",
"We'll only download the second and third files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for ds_name in cat.datasets[1:3]:\n",
" print('Downloading', ds_name)\n",
" cat.datasets[ds_name].download()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ahhh, what happened? Well, indexing with a slice is like indexing with an integer, and so this method gives us a list of datasets, not a list of keys. Instead, we'll have to manually count the files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for di, ds_name in enumerate(cat.datasets):\n",
" if (di >= 1) & (di < 3):\n",
" print('Downloading', ds_name)\n",
" cat.datasets[ds_name].download()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now have the first three files in the dataset catalog."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -128,7 +227,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We *could* manually look through that list above and figure out what dataset we're looking for and generate that name (or index). Siphon provides some helpers to simplify this process, provided the names of the dataset follow a pattern with the timestamp in the name:"
"We *could* manually look through that list above and figure out what dataset we're looking for and generate that name (or index). Or, we could adapt our loop to to further filter out datasets based on the dataset's name.\n",
"\n",
"Siphon provides some helpers to simplify this process, provided the the file times match a standard pattern. (If not, the methods below can take an additional `regex=` argument to specify the time encoding.)"
]
},
{
Expand Down Expand Up @@ -233,6 +334,13 @@
"datasets = cat.datasets.filter_time_range(request_time, request_time + timedelta(hours=6))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unlike `cat.datasets`, here `datasets` is a simple list of each `Dataset`."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -250,7 +358,7 @@
}
},
"source": [
"We can ask Siphon to download the file locally:"
"As before, we can ask Siphon to download the file locally:"
]
},
{
Expand All @@ -266,7 +374,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Look in your file explorer panel or run the cell below to verify that we did actually download the file!"
"Again, look in your file explorer panel or run the cell below to verify that we did actually download the file!"
]
},
{
Expand Down Expand Up @@ -368,7 +476,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
"version": "3.6.12"
}
},
"nbformat": 4,
Expand Down