diff --git a/_sources/part1/visualization.ipynb b/_sources/part1/visualization.ipynb index 3dfdb07..d49a2da 100644 --- a/_sources/part1/visualization.ipynb +++ b/_sources/part1/visualization.ipynb @@ -17,7 +17,7 @@ "\n", "### Authors\n", "\n", - "- Pier Lorenzo Marasco, Ispra (Italy), [@pl-marasco](https://github.com/pl-marasco)\n", + "- Pier Lorenzo Marasco, Provare LTD (UK), [@pl-marasco](https://github.com/pl-marasco)\n", "\n", "### Contributors\n", "\n", diff --git a/_sources/part1/xarray_pitfalls.ipynb b/_sources/part1/xarray_pitfalls.ipynb index e7c80da..48cd389 100644 --- a/_sources/part1/xarray_pitfalls.ipynb +++ b/_sources/part1/xarray_pitfalls.ipynb @@ -17,7 +17,7 @@ "\n", "### Authors\n", "\n", - "- Pier Lorenzo Marasco, Ispra (Italy), [@pl-marasco](https://github.com/pl-marasco)\n", + "- Pier Lorenzo Marasco, Provare LTD (UK), [@pl-marasco](https://github.com/pl-marasco)\n", "\n", "### Contributors\n", "\n", diff --git a/_sources/part3/chunking_introduction.ipynb b/_sources/part3/chunking_introduction.ipynb index b0f59ac..b1779d6 100644 --- a/_sources/part3/chunking_introduction.ipynb +++ b/_sources/part3/chunking_introduction.ipynb @@ -16,13 +16,12 @@ "## Authors & Contributors\n", "### Authors\n", "- Tina Odaka, UMR-LOPS Ifremer (France), [@tinaok](https://github.com/tinaok)\n", - "- Pier Lorenzo Marasco, Ispra (Italy), [@pl-marasco](https://github.com/pl-marasco)\n", + "- Pier Lorenzo Marasco, Provare LTD (UK), [@pl-marasco](https://github.com/pl-marasco)\n", "\n", "### Contributors\n", - "- Alejandro Coca-Castro, The Alan Turing Institure, [acocac](https://github.com/acocac)\n", + "- Alejandro Coca-Castro, The Alan Turing Institute, [acocac](https://github.com/acocac)\n", "- Anne Fouilloux, Simula Research Laboratory (Norway), [@annefou](https://github.com/annefou)\n", - "- Guillaume Eynard-Bontemps, CNES (France), [@guillaumeeb](https://github.com/guillaumeeb)\n", - "\n" + "- Guillaume Eynard-Bontemps, CNES (France), [@guillaumeeb](https://github.com/guillaumeeb)" ] }, { diff --git a/_sources/part3/data_exploitability_pangeo.ipynb b/_sources/part3/data_exploitability_pangeo.ipynb index 70cec08..9f02b7a 100644 --- a/_sources/part3/data_exploitability_pangeo.ipynb +++ b/_sources/part3/data_exploitability_pangeo.ipynb @@ -2,9 +2,7 @@ "cells": [ { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "# How to exploit data on Pangeo\n", "\n", @@ -14,9 +12,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "## Authors & Contributors\n", "### Authors\n", @@ -24,7 +20,7 @@ "- Pier Lorenzo Marasco, Provare LTD (UK), [@pl-marasco](https://github.com/pl-marasco) (author of the conversion)\n", "\n", "### Contributors\n", - "- Alejandro Coca-Castro, The Alan Turing Institure, [acocac](https://github.com/acocac) (author)\n", + "- Alejandro Coca-Castro, The Alan Turing Institute, [acocac](https://github.com/acocac) (author)\n", "- Anne Fouilloux, Simula Research Laboratory (Norway), [@annefou](https://github.com/annefou)\n", "- Justus Magin, UMR-LOPS CNRS(France), [@justusmagin](https://github.com/justusmagin)\n", "- Tina Odaka, UMR-LOPS Ifremer (France), [@tinaok](https://github.com/tinaok)\n", @@ -43,9 +39,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "### Relevant resources\n", "\n", @@ -55,9 +49,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "## Import libraries" ] @@ -66,7 +58,11 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "tags": [] }, "outputs": [], "source": [ @@ -99,111 +95,51 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "## Connect to the Dask cluster" ] }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "Dask cluster can be deployed on HPC clusters, cloud computing services, or on a local machine. More on how to deploy over different platforms can be found here: https://docs.dask.org/en/stable/deploying.html\n", "Here we creates a Dask client, which is essential for managing and executing parallel computations efficiently in the subsequent parts of the notebook. " ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Local cluster with multiprocessing\n", - "# cluster = LocalCluster()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# If runned over the EOSC JupyterHub, you can connect to the Dask Gateway\n", - "from dask_gateway import Gateway\n", - "gateway = Gateway()" - ] - }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ - "
WARNING! \n", - "In case you already created gateway cluster, you will see list of your clusters. \n", - "And this cell will kill all your orphan clusters.\n", - "Please clean them before you make a new cluster using following command\n", + "
WARNING !\n", + "If you are going to use dask_gateway (on pangeo-EOSC), please activate next cell. If you are using your local PC, or you do not use dask_gateway, leave it as it is.\n", "
" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, "outputs": [], "source": [ + "# Connect to the Dask Gateway\n", + "from dask_gateway import Gateway\n", + "gateway = Gateway()\n", + "\n", + "#List activated cluster\n", + "\n", "clusters = gateway.list_clusters()\n", "print(clusters)\n", "\n", - "#Clean clusters running on your name\n", + "#Clean clusters running on your name before starting a new one\n", "for cluster in clusters:\n", " cluster = gateway.connect(cluster.name)\n", - " cluster.shutdown()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "Create a new cluster and scale it to four workers" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ + " cluster.shutdown()\n", + "\n", + "#Create a new cluster and scale it to four workers\n", "cluster = gateway.new_cluster()\n", "cluster.scale(4)\n", "cluster" - ] - }, - { - "cell_type": "markdown", - "source": [ - "Let's setup the Dask Dashboard with your new cluster.\n", - "\n", - "** *This time, just click on the link to open the dashboard into another tab.Then copy and past\n", - "the link of web site appearing to the dask lab - extension ** *\n", - "\n", - "### Get a client from the Dask Gateway Cluster\n", - "\n", - "As stated above,creating a Dask `Client` is mandatory in order to perform following Daks computations on your Dask Cluster.\n" ], "metadata": { "collapsed": false @@ -211,31 +147,46 @@ }, { "cell_type": "markdown", + "metadata": {}, "source": [ "
WARNING !\n", - "Please don't execute this cell below, it is needed for building the Jupyter Book\n", + "Please deactivate cell below if you use dask_gateway (on pangeo-EOSC). If you are using your local PC, or you do not use dask_gateway, leave it as it is.\n", "
" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "tags": [] + }, "outputs": [], "source": [ "cluster = None" - ], - "metadata": { - "collapsed": false - } + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get a client from the Dask Cluster\n", + "\n", + "As stated above,creating a Dask `Client` is mandatory in order to perform following Daks computations on your Dask Cluster." + ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "tags": [] }, "outputs": [], "source": [ @@ -244,26 +195,24 @@ "if cluster:\n", " client = Client(cluster) # create a dask Gateway cluster\n", "else:\n", - " client = Client() # create a local dask cluster on the machine.\n", + " cluster = LocalCluster()\n", + " client = Client(cluster) # create a local dask cluster on the machine.\n", "client" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], + "cell_type": "markdown", + "metadata": {}, "source": [ - "client=Client(cluster)" + "Let's setup the Dask Dashboard with your new cluster, as explained in the former section (Parallel computing with Dask)\n", + "Reminder: \n", + "- ***If you use Dask-Gateway: just click on the link to open the dashboard into another tab. Then copy and paste the link of web site appearing to the dask lab - extension***\n", + "- ...\n" ] }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "## Load data\n", "\n", @@ -272,9 +221,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "We will use the catchment as our area of interest (AOI) for the analysis. The catchment is defined by a polygon, which we will load from a GeoJSON file. \n", "The GeoJSON file contains the geometry of the catchment in the WGS84 coordinate reference system (EPSG:4326) and that has to be defined. " @@ -284,7 +231,11 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "tags": [] }, "outputs": [], "source": [ @@ -294,9 +245,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "### Satellite collections\n", "\n", @@ -314,7 +263,11 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "tags": [] }, "outputs": [], "source": [ @@ -330,9 +283,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "#### Get bands information\n", "As the original data provides bands with different names than the original Sentinel 2 bands, we need to get the information about the bands." @@ -342,7 +293,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -354,9 +308,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "#### Load data\n", "We will use the stackstac library to load the data. The stackstac library is a library that allows loading data from a STAC API into an xarray dataset.\n", @@ -370,7 +322,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -383,9 +338,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "## Calculate snow cover\n", "\n", @@ -398,7 +351,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -409,9 +365,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "Let's compute the NDSI and mask out the clouds." ] @@ -420,7 +374,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -430,9 +387,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "
Dask Method Differences: `.compute()` vs `.persist()`\n", "\n", @@ -456,7 +411,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -465,9 +423,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "We will mask out the clouds, which are identified by the values 8 and 9 in the scene classification layer (scl). The scl contains information about the type of land cover. We will mask out the clouds, which are identified by the values 8 and 9 in the scl layer.\n", "\n", @@ -478,7 +434,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -488,9 +447,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "As the SCL layer contains information about the type of land cover, we will mask out the clouds, which are identified by the values 8 and 9 in the scl layer." ] @@ -499,7 +456,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "tags": [] }, "outputs": [], "source": [ @@ -509,9 +466,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "## Process snow cover data\n", "\n", @@ -524,7 +479,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -534,9 +492,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "As we are going to use the `RioXarray` library to mask out the data, we need to add some more information to the data. The RioXarray library is a library that allows to manipulate geospatial data in xarray datasets. Underneath it uses the rasterio library that is a library built on top of GDAL.\n", "\n", @@ -547,7 +503,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -557,9 +516,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "Let's clip the snow_cloud object using the catchment geometry in the UTM32N coordinate reference system." ] @@ -568,7 +525,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -577,9 +537,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "It's time to persist the data in memory. We will use the persist method to load the data in memory and keep it there until the end of the analysis." ] @@ -588,7 +546,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -597,18 +558,14 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "### Aggregate data" ] }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "Data aggregation is a very important step in the analysis. It allows to reduce the amount of data and to make the analysis more efficient. Moreover as in this case we are going to aggregate the date to daily values, this will allow use to compute statistic on the data at the basin scale later on.\n", "\n", @@ -619,7 +576,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -630,7 +590,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -639,9 +602,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "As the data has been aggregated to daily values, we need to rename the floor method to something more meaningful as date." ] @@ -650,7 +611,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -661,7 +625,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -670,9 +637,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "### Visualize data\n", "We will use the `hvplot` library to visualize the data. The library allows to visualize data in `xarray` datasets. It is based on the holoviews library, which is a library that allows to visualize multidimensional data.\n", @@ -684,7 +649,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -703,9 +671,123 @@ }, { "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Calculate snow cover with apply_ufunc\n", + "\n", + "
\n", + " Calculate snow cover using Xarray's apply_ufunc \n", + "
\n", + "
    \n", + "
  • The procedure for computing snow cover can also be summed up as following python function. \n", + "
  • \n", + "
  • We first verify that Green, swir16 and scr are in the order of 0,1,2 th variable in band variable. Then we simply copy and past all the python codes in a function.
  • \n", + "
\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, "metadata": { - "collapsed": false + "tags": [] + }, + "outputs": [], + "source": [ + "def calculate_ndsi_snow_cloud(data):\n", + " green = data[0]\n", + " swir = data[1]\n", + " scl = data[2]\n", + " ndsi = (green - swir) / (green + swir)\n", + " ndsi_mask = ( ndsi > 0.4 )& ~np.isnan(ndsi)\n", + " snow = np.where(ndsi_mask, 1, ndsi)\n", + " snowmap = np.where((snow <= 0.42) & ~np.isnan(snow), 0, snow)\n", + " mask = ~( (scl == 8) | (scl == 9) | (scl == 3) )\n", + " snow_cloud = np.where(mask, snowmap, 2)\n", + " return snow_cloud" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " Apply mask then persist the data, then apply_ufunc to perform computation. \n", + "
\n", + "
    \n", + "
  • The masking procedure can also applied before the computation. \n", + "
  • \n", + "
  • Xarray's apply_ufunc is passed to each chunk of Xarray.DataArray \n", + "
  • \n", + "
  • chunksize for stackstac is specified so that it is only chunked (sliced) in time. All datas from band, x and y are in one chunk (slice)\n", + "
  • \n", + "
\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] }, + "outputs": [], + "source": [ + "%%time\n", + "da = stackstac.stack(items,\n", + " bounds_latlon=aoi.iloc[0].geometry.bounds,\n", + " resolution=20,\n", + " chunksize=(1,-1,-1,-1),\n", + " assets=['green', 'swir16', 'scl'])\n", + "#Mask data\n", + "geom_utm32 = aoi.to_crs(epsg=32632).iloc[0]['geometry']\n", + "da.rio.write_crs(\"EPSG:32632\", inplace=True)\n", + "da.rio.set_nodata(np.nan, inplace=True)\n", + "da = da.rio.clip([geom_utm32])\n", + "\n", + "snow_cloud_clipped=xr.apply_ufunc(\n", + " calculate_ndsi_snow_cloud\n", + " ,da\n", + " ,input_core_dims=[[\"band\",\"y\",\"x\"]]\n", + " ,output_core_dims=[[\"y\",\"x\"]]\n", + " ,exclude_dims=set([\"band\"])\n", + " ,vectorize=True\n", + " ,dask=\"parallelized\"\n", + " ,output_dtypes=[da.dtype]\n", + " ).assign_attrs({'long_name': 'snow_cloud'}).to_dataset(name='snow_cloud')\n", + "\n", + "snow_cloud_clipped\n", + "#snow_cloud_clipped_date = snow_cloud_clipped.groupby(snow_cloud_clipped.time.dt.floor('D')).max(skipna=True).rename({'floor': 'date'})\n", + "#snow_cloud_clipped_date" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " Inspect the data dimentions! \n", + "
\n", + "
    \n", + "
  • How did changed from input (da) to output(snow_cloud_clipped)?\n", + "
  • \n", + "
  • What is setted as input_core_dims? \n", + "
  • \n", + "
  • What is setted as output_core_dims? \n", + "
  • \n", + "
  • What is setted as exclude_dims? \n", + "
  • \n", + "
  • Did you see 'time' dimension?\n", + "
  • \n", + "
  • We will get back to apply_ufunc with next OpenEO example. \n", + "
  • \n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, "source": [ "## Compute statistics\n", "\n", @@ -720,7 +802,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -732,7 +817,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -744,7 +832,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -756,7 +847,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -770,9 +864,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "We are going to get the same information for the snow cover." ] @@ -781,7 +873,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -792,7 +887,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -803,7 +901,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -815,7 +916,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -827,7 +931,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -838,7 +945,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -848,9 +958,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "Let's compare the date with the discharge data." ] @@ -859,7 +967,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -868,9 +979,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "Let's refine a little bit the data so that we can compare it with the snow cover data." ] @@ -879,7 +988,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -893,7 +1005,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -902,9 +1017,7 @@ }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, + "metadata": {}, "source": [ "## Conclusion\n", "\n", @@ -934,7 +1047,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.11.6" } }, "nbformat": 4, diff --git a/part1/visualization.html b/part1/visualization.html index 5efcb59..79abbfc 100644 --- a/part1/visualization.html +++ b/part1/visualization.html @@ -500,7 +500,7 @@

Authors & Contributors

Authors#

@@ -971,7 +971,7 @@

Open local dataset @@ -1573,7 +1573,7 @@

Read a shapefile with the Area Of Interest (AOI) @@ -2197,11 +2197,11 @@

Visualization with HoloViews
-
+
- + @@ -511,13 +511,13 @@

Authors & ContributorsAuthors#

  • Tina Odaka, UMR-LOPS Ifremer (France), @tinaok

  • -
  • Pier Lorenzo Marasco, Ispra (Italy), @pl-marasco

  • +
  • Pier Lorenzo Marasco, Provare LTD (UK), @pl-marasco

Contributors#

    -
  • Alejandro Coca-Castro, The Alan Turing Institure, acocac

  • +
  • Alejandro Coca-Castro, The Alan Turing Institute, acocac

  • Anne Fouilloux, Simula Research Laboratory (Norway), @annefou

  • Guillaume Eynard-Bontemps, CNES (France), @guillaumeeb

@@ -607,8 +607,8 @@

Global LTS -
CPU times: user 321 ms, sys: 70 ms, total: 391 ms
-Wall time: 3.02 s
+
CPU times: user 511 ms, sys: 82.4 ms, total: 594 ms
+Wall time: 2.44 s
 
@@ -1000,9 +1000,9 @@

Global LTS @@ -1081,7 +1081,7 @@

Global LTS @@ -1160,7 +1160,7 @@

Global LTS @@ -1239,7 +1239,7 @@

Global LTS @@ -1318,7 +1318,7 @@

Global LTS @@ -1397,7 +1397,7 @@

Global LTS @@ -1476,7 +1476,7 @@

Global LTSopen_mfdataset automatically switch from Numpy Arrays to Dask Arrays as the data structure used by Xarray.

test.data is the backend array Python representation of Xarray’s Data Array, Dask Array when using chunking, Numpy by default.

We will introduce Dask arrays and Dask graphs visualization in the next section Scaling with Dask.

@@ -3224,7 +3224,7 @@

Zarr storage format - @@ -3788,9 +3788,9 @@

Extract chunk information @@ -3869,7 +3869,7 @@

Extract chunk information @@ -3948,7 +3948,7 @@

Extract chunk information @@ -4027,7 +4027,7 @@

Extract chunk information @@ -4106,7 +4106,7 @@

Extract chunk information @@ -4185,7 +4185,7 @@

Extract chunk information @@ -4264,7 +4264,7 @@

Extract chunk information
working on  foss4g-data/CGLS_LTS_1999_2019/c_gls_NDVI-LTS_1999-2019-0721_GLOBE_VGT-PROBAV_V3.0.1.nc
 

-
CPU times: user 905 ms, sys: 306 ms, total: 1.21 s
-Wall time: 33.8 s
+
CPU times: user 1.36 s, sys: 346 ms, total: 1.71 s
+Wall time: 23.6 s
 
@@ -4399,8 +4399,8 @@

We have 36 files to process, but for this chunking_introduction example, we

-
CPU times: user 34.5 ms, sys: 589 µs, total: 35.1 ms
-Wall time: 33.8 ms
+
CPU times: user 45.9 ms, sys: 1.17 ms, total: 47.1 ms
+Wall time: 48.3 ms
 
@@ -4424,8 +4424,8 @@

We have 36 files to process, but for this chunking_introduction example, we

-
CPU times: user 19.9 ms, sys: 0 ns, total: 19.9 ms
-Wall time: 338 ms
+
CPU times: user 25.5 ms, sys: 3.49 ms, total: 29 ms
+Wall time: 253 ms
 
@@ -4818,9 +4818,9 @@

We have 36 files to process, but for this chunking_introduction example, we source: Derived from EO satellite imagery time_coverage_end: 2019-12-31T23:59:59Z time_coverage_start: 1999-01-01T00:00:00Z - title: Normalized Difference Vegetation Index: Long Term S...

+ dtype='float64', name='lon', length=40320))
  • Conventions :
    CF-1.6
    archive_facility :
    VITO
    copyright :
    Copernicus Service information 2021
    history :
    2021-03-01 - Processing line NDVI LTS
    identifier :
    urn:cgls:global:ndvi_stats_all:NDVI-LTS_1999-2019-0701_GLOBE_V3.0.1
    institution :
    VITO NV
    long_name :
    Normalized Difference Vegetation Index
    orbit_type :
    LEO
    parent_identifier :
    urn:cgls:global:ndvi_stats_all
    platform :
    SPOT-4, SPOT-5, Proba-V
    processing_level :
    L4
    processing_mode :
    Offline
    product_version :
    V3.0.1
    references :
    https://land.copernicus.eu/global/products/ndvi
    sensor :
    VEGETATION-1, VEGETATION-2, VEGETATION
    source :
    Derived from EO satellite imagery
    time_coverage_end :
    2019-12-31T23:59:59Z
    time_coverage_start :
    1999-01-01T00:00:00Z
    title :
    Normalized Difference Vegetation Index: Long Term Statistics 1KM: GLOBE 1999-2019 0701
  • We can save the consolidated metadata for our dataset in a file, and reuse it later to access the dataset. We used json for next step, but we can also use parquet.

    @@ -5259,9 +5259,9 @@

    We have 36 files to process, but for this chunking_introduction example, we source: Derived from EO satellite imagery time_coverage_end: 2019-12-31T23:59:59Z time_coverage_start: 1999-01-01T00:00:00Z - title: Normalized Difference Vegetation Index: Long Term S... + dtype='float64', name='lon', length=40320))

  • Conventions :
    CF-1.6
    archive_facility :
    VITO
    copyright :
    Copernicus Service information 2021
    history :
    2021-03-01 - Processing line NDVI LTS
    identifier :
    urn:cgls:global:ndvi_stats_all:NDVI-LTS_1999-2019-0701_GLOBE_V3.0.1
    institution :
    VITO NV
    long_name :
    Normalized Difference Vegetation Index
    orbit_type :
    LEO
    parent_identifier :
    urn:cgls:global:ndvi_stats_all
    platform :
    SPOT-4, SPOT-5, Proba-V
    processing_level :
    L4
    processing_mode :
    Offline
    product_version :
    V3.0.1
    references :
    https://land.copernicus.eu/global/products/ndvi
    sensor :
    VEGETATION-1, VEGETATION-2, VEGETATION
    source :
    Derived from EO satellite imagery
    time_coverage_end :
    2019-12-31T23:59:59Z
    time_coverage_start :
    1999-01-01T00:00:00Z
    title :
    Normalized Difference Vegetation Index: Long Term Statistics 1KM: GLOBE 1999-2019 0701
  • The catalog (json file we created) can be shared on the cloud (or GitHub, etc.) and anyone can load it from there too.

    This approach allows anyone to easily access LTS data and select the Area of Interest for their own study.

    @@ -5692,11 +5692,11 @@

    We have 36 files to process, but for this chunking_introduction example, we source: Derived from EO satellite imagery time_coverage_end: 2019-12-31T23:59:59Z time_coverage_start: 1999-01-01T00:00:00Z - title: Normalized Difference Vegetation Index: Long Term S... + dtype='float64', name='time'))
  • Conventions :
    CF-1.6
    archive_facility :
    VITO
    copyright :
    Copernicus Service information 2021
    history :
    2021-03-01 - Processing line NDVI LTS
    identifier :
    urn:cgls:global:ndvi_stats_all:NDVI-LTS_1999-2019-0101_GLOBE_V3.0.1
    institution :
    VITO NV
    long_name :
    Normalized Difference Vegetation Index
    orbit_type :
    LEO
    parent_identifier :
    urn:cgls:global:ndvi_stats_all
    platform :
    SPOT-4, SPOT-5, Proba-V
    processing_level :
    L4
    processing_mode :
    Offline
    product_version :
    V3.0.1
    references :
    https://land.copernicus.eu/global/products/ndvi
    sensor :
    VEGETATION-1, VEGETATION-2, VEGETATION
    source :
    Derived from EO satellite imagery
    time_coverage_end :
    2019-12-31T23:59:59Z
    time_coverage_start :
    1999-01-01T00:00:00Z
    title :
    Normalized Difference Vegetation Index: Long Term Statistics 1KM: GLOBE 1999-2019 0101
  • The kerchunk catalogues can be placed in an intake catalogue, then loading multiple NetCDF file in the cloud can be just done in following 3 lines, chunked and fast.

    @@ -6119,11 +6119,11 @@

    We have 36 files to process, but for this chunking_introduction example, we source: Derived from EO satellite imagery time_coverage_end: 2019-12-31T23:59:59Z time_coverage_start: 1999-01-01T00:00:00Z - title: Normalized Difference Vegetation Index: Long Term S...