From f073ee24e8451884e39d2784d3c12cbc0a410694 Mon Sep 17 00:00:00 2001 From: jfmartinez4 Date: Tue, 28 May 2024 15:10:40 -0400 Subject: [PATCH] updating wsim 101a and 101b --- m101a-wsim-gldas-acquisition.qmd | 74 ++++++++++++++++++++++-- m101b-wsim-gldas-vis.qmd | 96 +++++++++++++++++++++++++++----- 2 files changed, 151 insertions(+), 19 deletions(-) diff --git a/m101a-wsim-gldas-acquisition.qmd b/m101a-wsim-gldas-acquisition.qmd index 2014b97..528bc76 100644 --- a/m101a-wsim-gldas-acquisition.qmd +++ b/m101a-wsim-gldas-acquisition.qmd @@ -23,9 +23,9 @@ After completing this lesson, you should be able to: ## Introduction -The water cycle is the constant process of circulation of water on, above, and under the Earth's surface [@NOAA2019]. Human activities produce greenhouse gas emissions, land use changes, dam and reservoir development, and groundwater extraction which have affected the natural water cycle in recent decades [@intergovernmentalpanelonclimatechange2023]. The influence of these human activities on the water cycle have consequential impacts on oceanic, groundwater, and land processes, influencing phenomena such as droughts and floods [@Zhou2016]. +The water cycle is the process of circulation of water on, above, and under the Earth’s surface [@NOAA2019]. Human activities produce greenhouse gas emissions, land use changes, dam and reservoir development, and groundwater extraction have affected the natural water cycle in recent decades [@intergovernmentalpanelonclimatechange2023]. The influence of these human activities on the water cycle has consequential impacts on oceanic, groundwater, and land processes, influencing phenomena such as droughts and floods [@Zhou2016]. -Precipitation deficits, or periods of below average rainfall, can lead to drought, which is characterized by prolonged periods of little to no rainfall and resulting water shortages. Droughts often trigger environmental stresses and can create cycles of reinforcement, impacting both ecosystems and people [@Rodgers2023]. For example, while California frequently experiences drought, the combination of prolonged dry spells and sustained high temperatures prevented the replenishment of cool fresh water to the Klamath river, which led to severe water shortages in 2003 and again from 2012 to 2014. These shortages affect agricultural areas like the Central Valley, which grows almonds, one of California's most important crops, with the state producing 80% of the world's almonds. These severe droughts, coupled with competition for limited fresh water resources, resulted in declining populations of [Chinook salmon](https://www.fisheries.noaa.gov/species/chinook-salmon) due to heat stress and gill rot disease disrupting the food supply for Klamath basin tribal groups [@guillen2002; @Bland2014]. +Precipitation deficits, or periods of below-average rainfall, can lead to drought, characterized by prolonged periods of little to no rainfall and resulting water shortages. Droughts often trigger environmental stresses and can create cycles of reinforcement impacting ecosystems and people [@Rodgers2023]. For example, California frequently experiences drought but the combination of prolonged dry spells and sustained high temperatures prevented the replenishment of cool fresh water to the Klamath River, which led to severe water shortages in 2003 and again from 2012 to 2014. These shortages affect agricultural areas like the Central Valley, which grows almonds, one of California’s most important crops, with the state producing 80% of the world’s almonds. These severe droughts, coupled with competition for limited freshwater resources, resulted in declining populations of [Chinook salmon](https://www.fisheries.noaa.gov/species/chinook-salmon) due to heat stress and gill rot disease disrupting the food supply for Klamath basin tribal groups [@guillen2002; @Bland2014]. ![](docs/images/watercycle_rc.png)[^1] @@ -35,12 +35,34 @@ Precipitation deficits, or periods of below average rainfall, can lead to drough ::: {.callout-tip style="color: #5a7a2b;"} ## Data Science Review -A [raster](https://docs.qgis.org/2.18/en/docs/gentle_gis_introduction/raster_data.html) dataset is a type of geographic data in digital image format which has numerical information stored in each pixel. (Rasters are often referred to as grids because of their regularly-shaped matrix data structure.) Rasters can store many types of information, and they usually have dimensions that include latitude, longitude, and time. NetCDF is one format for raster data; others include Geotiff, ASCII and many more. Several raster formats like NetCDF can store multiple raster layers, or a "raster stack," which can be useful for storing and analyzing a series of rasters. +A [raster](https://docs.qgis.org/2.18/en/docs/gentle_gis_introduction/raster_data.html) dataset is a type of geographic data in digital image format with numerical information stored in each pixel. (Rasters are often called grids because of their regularly-shaped matrix data structure.) Rasters can store many types of information and can have dimensions that include latitude, longitude, and time. NetCDF is one format for raster data; others include Geotiff, ASCII, and many more. Several raster formats like NetCDF can store multiple raster layers, or a “raster stack,” which can be useful for storing and analyzing a series of rasters. ::: ::: The **Water Security (WSIM-GLDAS) Monthly Grids, v1 (1948 - 2014)** The Water Security (WSIM-GLDAS) Monthly Grids, v1 (1948 - 2014) dataset "identifies and characterizes surpluses and deficits of freshwater, and the parameters determining these anomalies, at monthly intervals over the period January 1948 to December 2014" [@isciences2022]. The dataset can be downloaded from the [NASA SEDAC](https://sedac.ciesin.columbia.edu/data/set/water-wsim-gldas-v1) website. Downloads of the WSIM-GLDAS data are organized by a combination of thematic variables (composite surplus/deficit, temperature, PETmE, runoff, soil moisture, precipitation) and integration periods (a temporal aggregation) (1, 3, 6, 12 months). Each variable-integration combination consists of a NetCDF raster (.nc) file ( with a time dimension that contains a raster layer for each of the 804 months between January, 1948 and December, 2014. Some variables also contain multiple attributes each with their own time series. Hence, this is a large file that can take a lot of time to download and may cause computer memory issues on certain systems. This is considered BIG data. +::: {.callout-note} +## Knowledge Check + +1. How would you best describe the water cycle? + a. A prolonged period of little to no rainfall. + b. Low precipitation combined with record temperatures. + c. The circulation of water on and above Earth’s surface. + d. A cycle that happens due to drought. + +2. What human interventions affect the water cycle? (select all that apply) + a. Greenhouse gas emissions + b. Land use changes + c. Dam and reservoir development + d. Groundwater overexploitation + +3. What is a precipitation deficit? + a. A period of rainfall below the average. + b. A prolonged period of little to no rainfall. + c. A period of chain reactions. + d. A period of rainfall above the average. +::: + ## Acquiring the Data ::: {.callout-tip style="color: #5a7a2b;"} @@ -63,7 +85,11 @@ For this lesson, we will work with the **WSIM-GLDAS data set Composite Anomaly T ::: {.callout-tip style="color: #5a7a2b;"} ## Data Science Review -This lesson uses the [`stars`](https://r-spatial.github.io/stars/), [`sf`](https://r-spatial.github.io/sf/), [`lubridate`](https://lubridate.tidyverse.org/), and [cubelyr](https://cran.r-project.org/web/packages/cubelyr/index.html) packages. Make sure they are installed before you begin working with the code in this document. If you'd like to learn more about the functions used in this lesson you can use the help guides on their package websites. +This lesson uses the [`stars`](https://r-spatial.github.io/stars/), [`sf`](https://r-spatial.github.io/sf/), [`lubridate`](https://lubridate.tidyverse.org/), and [cubelyr](https://cran.r-project.org/web/packages/cubelyr/index.html) packages. + +The `stars` package in R helps you work with large and complex spatial data, making it easier to analyze and visualize maps and satellite images. The `sf` package lets you handle and analyze spatial data in a simple way, allowing you to work with maps and geographic information seamlessly. The `lubridate` package makes it really easy to handle dates and times in R, so you can effortlessly convert, manipulate, and perform calculations with them. The `cubelyr` package helps you create and analyze multidimensional data cubes, making it easier to explore complex datasets and discover patterns. + +Make sure they are installed before you begin working with the code in this document. If you'd like to learn more about the functions used in this lesson you can use the help guides on their package websites. ::: ::: @@ -138,6 +164,25 @@ wsim_gldas_anoms |> Although we have now reduced the data to a single attribute with a restricted time of interest, we can take it a step further and limit the spatial extent to a country or state of interest. +::: {.callout-note} +## Knowledge Check +1. Which of these best describe a raster dataset? + a. A type of geographic data in digital image format. + b. A table or list of numbers. + c. A geographic region of interest. + d. An attribute with a time period of interest. +2. Which of the following is true about the information that rasters can store? (select all that apply) + a. Attributes (thematic content) + b. Dimensions (information expressing spatial or temporal extent information) + c. Geographic coordinates + d. A list of numbers +3. In the R programming language, what does the term vector refer to? + a. A grid of geographic data. + b. A collection or list of numbers. + c. A geographic region of interest. + d. An attribute with a time period of interest. +::: + ## Spatial Selection ::: column-margin @@ -283,6 +328,25 @@ You can download this image if you are running this script locally. **Not for 2i Once you run this code you can find the file in the file location... This allows you to share your findings. +::: {.callout-note} +## Knowledge Check +1. There are several options for spatially subsetting (or clipping) a raster/raster stack to a region of interest. What method was used in this lesson? + a. Using a vector of dates. + b. Using another raster object. + c. Specifying a bounding box. + d. Using a vector boundary dataset. +2. When running into memory issues, what is something you can do to reduce the computational load? + a. Work with one time frame or region at a time. + b. Save it as a new file. + c. Subset the data to a region of interest/time frame. + d. Find other data to work with. +3. What is the importance of subsetting data? + a. Freeing up space. + b. Analyzing a certain time or area of interest. + c. Making code run faster. + d. All of the above. +::: + ## In this Lesson, You Learned... Congratulations! Now you should be able to: @@ -299,4 +363,4 @@ In the next lesson, we will create more advanced visualizations and extract data [Lesson 1b: WSIM-GLDAS Visualizations and Data Extraction](https://ciesin-geospatial.github.io/TOPSTSCHOOL-water/m101b-wsim-gldas-vis.html){.btn .btn-primary .btn role="button"} -# References +# References \ No newline at end of file diff --git a/m101b-wsim-gldas-vis.qmd b/m101b-wsim-gldas-vis.qmd index 23530f7..2828441 100644 --- a/m101b-wsim-gldas-vis.qmd +++ b/m101b-wsim-gldas-vis.qmd @@ -17,10 +17,11 @@ After completing this lesson, you should be able to: - Subset the WSIM-GLDAS raster data for a region and time period of interest. - Perform visual exploration with histograms. -- Integrate gridded population with WSIM-GLDAS data to perform analyses and construct visualizations. +- Integrate gridded population data with WSIM-GLDAS data to perform analyses and construct visualizations to understand how people are impacted. - Make choropleth maps visualizing WSIM-GLDAS data by administrative vector boundaries. - Summarize WSIM-GLDAS and population raster data using zonal statistics. + ## Introduction ::: column-margin @@ -35,6 +36,13 @@ This lesson uses the [`stars`](https://r-spatial.github.io/stars/), [`sf`](https We'll begin with the **WSIM-GLDAS Composite Anomaly Twelve-Month Return Period** Composite Anomaly Twelve-Month Return Period file from SEDAC. We will spatially subset the data to cover only the Continental United States (CONUSA) which will help to minimize our memory footprint. We can further reduce our memory overhead by reading in just the variable we want to analyze. In this instance we can read in just the `deficit` attribute from the WSIM-GLDAS Composite Anomaly Twelve-Month Return Period file, rather than reading the entire NetCDF with all of its attributes. +::: column-margin +::: {.callout-tip style="color: #5a7a2b;"} +## Coding Review +Random Access Memory (RAM) is where data and programs that are currently in use are stored temporarily. It allows the computer to quickly access data, making everything you do on the computer faster and more efficient. Unlike the hard drive, which stores data permanently, RAM loses all its data when the computer is turned off. RAM is like a computer’s short-term memory, helping it to handle multiple tasks at once. + +::: + ```{r include=FALSE, warning=FALSE} # read in the wsim-gldas layer from SEDAC wsim_gldas <- aws.s3::s3read_using(FUN = stars::read_stars, @@ -70,7 +78,7 @@ wsim_gldas <- dplyr::filter(wsim_gldas, time %in% keeps) print(wsim_gldas) ``` -Next, we can clip the WSIM-GLDAS dataset using the USA country boundary from geoBoundaries. As in lesson 1, we acquire the boundary vector data using the geoBoundaries API. +Next, we can clip the WSIM-GLDAS dataset using the USA country boundary from geoBoundaries. As in lesson 1, we acquire the vector boundary data using the geoBoundaries API. ```{r warning=FALSE} #directly acquire the boundary from geoBoundaries API @@ -111,6 +119,16 @@ You will want to review the printout to make sure it looks okay. Other basic descriptive analyses are useful to verify and understand your data. One of these is to produce a frequency distribution (also known as a histogram), which is reviewed below. + +:::{.callout-note} +## Knowledge Check +1. What are the two ways we reduced our memory footprint when we loaded the WSIM-GLDAS data set for this lesson? (select all that apply) + a. Spatially subsetting the data to the CONUSA + b. Reading in only one year of data + c. Reading in only the attribute of interest (i.e., ‘deficit’) + d. All of the above. +::: + ## Annual CONUSA Time Series The statistical properties reviewed in the previous step are useful for exploratory data analysis, but we should also inspect the data's spatial characteristics. We can start our visual exploration of annual drought in the CONUSA by creating a map visualization depicting the deficit return period for each of the years in the subset dataset we loaded in the previous step. @@ -164,6 +182,17 @@ ggplot2::ggplot(usa)+ This visualization shows that there were several significant drought events (as indicated by return-period values) throughout 2000-2014. Significant drought events included the southeast in 2000, the southwest in 2002, the majority of the western 3rd in 2007, Texas-Oklahoma in 2011, Montana-Wyoming-Colorado in 2012, and the entirety of the California coast in 2014. The droughts of 2012 and 2011 are particularly severe and widespread with return periods greater than 50 years covering multiple states. Based on historical norms, we should only expect droughts this strong every 50-60 years! +:::{.callout-note} +## Knowledge Check + +4. The output maps of Annual Mean Deficit Anomalies for the CONUSA indicate that… + a. The most significant deficit in 2004 is located in the western United States. + b. The least significant deficit 2003 is located in the Midwest. + c. The most significant deficit in 2011 is located around Texas and neighboring states. + d. The least significant deficit in 2000 is located in the southeast. + +::: + ## Monthly Time Series We can get a more detailed look at these drought events by using the 1-month composite WSIM-GLDAS dataset and clipping the data to a smaller spatial extent. Let's examine the 2014 California drought. @@ -295,7 +324,7 @@ A [data frame](https://www.rdocumentation.org/packages/base/versions/3.6.2/topic ::: ::: -We can explore the data further by creating a frequency distribution (also called a histogram) of the deficit anomalies for any given spatial extent; here we are still looking at the distributions in California. We extract the data from the raster time series and create a data frame of values that are easier to manipulate into a histogram. [R data frames](https://www.w3schools.com/r/r_data_frames.asp) are data displayed in table format, which can be plotted on graphs or charts. +We can explore the data further by creating a frequency distribution (also called a histogram) of the deficit anomalies for any given spatial extent; here we are still looking at the deficit anomalies in California. We start by extracting the data from the raster time series and then create a data frame of values that are easier to manipulate into a histogram. [R data frames](https://www.w3schools.com/r/r_data_frames.asp) are data displayed in table format, which can be plotted on graphs or charts. ```{r} # extract the raster values into a dataframe @@ -318,6 +347,15 @@ ggplot2::ggplot(deficit_hist, ggplot2::aes(deficit))+ The histograms start to quantify what we saw in the time series maps. Whereas the map shows where the deficits occur, the frequency distribution indicates the number of raster cells for each return period of the deficit range. The number of raster cells under a 60-year deficit (return period) is very high in most months, far exceeding any other value in the range. +:::{.callout-note} +## Knowledge Check +5. What's another term for ‘histogram’? + a. Choropleth + b. Frequency Distribution + c. Data array + d. Box plot + + ## Zonal Summaries The previous section describes the 2014 California drought, examining the state as a whole. Although we have a sense of what's happening in different cities or counties by looking at the maps, the maps do not provide quantitative summaries of those local areas. @@ -406,7 +444,7 @@ cali_county_summaries<- names(cali_county_summaries)<-lubridate::month(keeps, label = TRUE, abbr = FALSE) ``` -*exactextractr* will return summary statistics in the same order of the input boundary file, therefore we can join the California county names to the exactextractr output and join the summary statistics for visualization. We also make a version to view as a table to inspect the raw data. We can take a quick look at the first 10 counties to see their mean deficit return period for January-June. +`exactextractr` will return summary statistics in the same order of the input boundary file, therefore we can join the California county names to the exactextract summary statistics output for visualization. We will also make a version to view as a table to inspect the raw data. We can take a quick look at the first 10 counties to see their mean deficit return period for January-June. ```{r} # bind the extracted means with the california boundary @@ -422,6 +460,15 @@ kableExtra::kbl(cali_county_table[c(1:10),c(1:7)]) |> This confirms the widespread distribution of high deficit values (all the bright red) in our exploratory maps. The data is currently in wide format, which makes for easy viewing of a time series, but more advanced programmatic visualization typically requires data to be in a normalized, or long, format (more on that later). +:::{.callout-note} +## Knowledge Check +1. What can zonal statistics be used for? + a. Subsetting data to a time period of interest. + b. Creating a time series. + c. Creating summary statistics from the values of cells that lie within a boundary. + d. All of the above. +::: + ## County Choropleths Now that we've inspected the raw data we can make a choropleth out of the mean deficit return period data. We previously demonstrated more complex maps using *ggplot2*, *sf*, and *stars*, but you can also make quick plots of *sf* objects with the base plotting function. By default *sf* will make a map for every column listed in the dataset. In this case we only want to look at the monthly means so we will just plot columns 11 through 23. You can make simple alterations to the color palette and position of the legend, but custom map titles and legend titles are not easily accomplished with multi-panel maps (multiple maps in one) like the one pictured below. @@ -452,18 +499,29 @@ plot(cali_counties[c(11:23)], key.width = 0.3) ``` -Due to the widespread water deficits in the raw data, the mean values do not appear much different from the raw deficit raster layer, however, thematic (also called choropleth) maps can make it easier for users to survey the landscape by visualizing familiar geographies (like counties) that place themselves and their lived experiences alongside the data. +Due to the widespread water deficits in the raw data, the mean values do not appear much different from the raw deficit raster layer, however, choropleth maps, also called thematic maps, can make it easier for users to survey the landscape by visualizing familiar geographies (like counties) that place themselves and their lived experiences alongside the data. + +While this paints a striking picture of widespread water deficits, how many people are affected by this drought? Although the land area appears rather large, if one is not familiar with the distribution of population and urban centers in California it can be difficult to get a sense of the direct human impact. (This is partly because more populous locations are usually represented by smaller land areas and the less populous locations are usually represented by large administrative boundaries containing much more land area. Normalizing a given theme by land area may be something an analyst wants to do but we cover another approach below.) + +:::{.callout-note} +## Knowledge Check +1. Choropleth maps, aka thematic maps, are useful for + a. Visualizing data in familiar geographies like counties or states + b. Finding directions from one place to another + c. Visualizing data in uniform geographic units like raster grid cells. + d. Calculating return periods. +::: -While this paints a striking picture of widespread water deficits, how many people are affected by this drought? Although the land area appears rather large, if one is not familiar with the distribution of population and urban centers in California it can be difficult to get a sense of the direct human impact. (This is partly because more populous locations are usually represented by smaller land areas and the less populous locations are usually represented by large administrative boundaries containing much more land area. Normalizing a given theme by land area may be something an analyst wants to do but we cover another approach below.) ## Integrating Population Data -**Gridded Population of the World** (GPW) is a dataset collection in SEDAC that models the distribution of the global human population as counts and densities in a raster format [@CIESIN2018]. We will take full advantage of exactextractr to integrate across WSIM-GLDAS, geoBoundaries, and GPW. To begin, we need to download the 15 minute 2010 population *density* GPWv4. This most closely matches our time period (2014) and the resolution of WSIM-GLDAS. Although it may seem more intuitive to use GPW's population *count* data layers, you can achieve more accurate results (especially along coastlines) by using population density in conjunction with land area estimates derived from exactextractr. +**Gridded Population of the World (GPW)** is a data collection from SEDAC that models the distribution of the global human population as counts and densities in a raster format [@CIESIN2018].We will take full advantage of exactextractr to integrate across WSIM-GLDAS, geoBoundaries, and GPW. To begin, we need to download the 15-minute resolution (roughly 30 square kilometer at the equator) population density data for the year 2015 from GPWv4. This most closely matches our time period (2014) and the resolution of WSIM-GLDAS. Although in many applications one might choose to use GPW’s population count data layers, because we are using exactextractr we can achieve more accurate results (especially along coastlines) by using population density in conjunction with land area estimates from the exactextractr package. ::: {.callout-tip style="color: #5a7a2b;"} ## Data Review -The Gridded Population of the World Version 4 is available in multiple target metrics (e.g. counts, density), time periods (2000, 2005, 2010, 2015, 2020), and spatial resolutions (30 sec, 2.5 min, 15 min, 30 min, 60 min). Read more about GPW at the [collection home page on SEDAC](https://sedac.ciesin.columbia.edu/data/collection/gpw-v4). GPW is one of four global datasets available in raster format: Data sets vary in the degree to which they use additional information as ancillary variables to model the spatial distribution of population from the administrative units (vector polygons) in which they originate. A review of these data sets and their underlying models is found in a paper by Leyk and colleagues [@leyk2019]. Fitness-for-use is an important principle in determining the best dataset to use for a specific analysis. Because the question we ask here is --- what is the population exposure to different levels of water deficit in California? --- uses spatially coarse inputs and is for a place with high-quality data inputs, GPW is a good choice for this analysis. Users with vector-format census data (at county or sub-county level) could also adapt this approach for those data. In the case of California, the US Census data and GPW will produce nearly identical estimates because GPW is based on the census inputs. +The Gridded Population of the World Version 4 is available in multiple target metrics (e.g. counts, density), periods (2000, 2005, 2010, 2015, 2020), and spatial resolutions (30 sec, 2.5 min, 15 min, 30 min, 60 min). Read more about GPW at the [collection home page on SEDAC](https://sedac.ciesin.columbia.edu/data/collection/gpw-v4). GPW is one of four global gridded population datasets available in raster format. These data sets vary in the degree to which they use additional information as ancillary variables to model the spatial distribution of the population from the administrative units (vector polygons) in which they originate. A review of these data sets and their underlying models is found in a paper by Leyk and colleagues [@leyk2019]. You can learn more about gridded population data sets at POPGRID.org. Fitness-for-use is an important principle in determining the best dataset to use for a specific analysis. The question we ask here is — what is the population exposure to different levels of water deficit in California? — uses spatially coarse inputs and is for a place with high-quality data inputs, GPW is a good choice for this analysis. Users with vector-format census data (at the county or sub-county level) could also adopt this approach for those data. In the case of California, the US Census data and GPW will produce nearly identical estimates because GPW is based on the census inputs. + ::: Load in the population density layers. @@ -540,10 +598,10 @@ head(pop_by_rp) - `shapeISO`: The label of the polygon boundary where the cell is located. This was passed on from the California geojson boundary as specified in the `include_cols = 'shapeISO'` argument. In this instance, it's not very helpful because we used the state-level California boundary, but if we passed the ADM2 boundary with counties it would provide the name of the county where the cell is located. - `2014-01-01` to `2014-12-01`: The next 12 columns list the deficit return period classification value for the cell in each of the 12 months corresponding to the time dimension of the `wsim_gldas_1mo` raster layer. -- `weight`: The `weight` column lists the corresponding population density value (persons per km\^2) for that WSIM-GLDAS cell. The WSIM-GLDAS and GPW raster layers have the same projection and resolution. Therefore, because they are perfectly aligned, each WSIM return period cell has a corresponding GPW population weight "right on top of it". -- `coverage_area`: The total area (m\^2) of the WSIM-GLDAS cell that is covered by the California boundary layer. Given the total area of the WSIM cell that is covered, and the GPW persons per unit area weight, we can calculate the number of people estimated to be living within this cell under this WSIM deficit return period. +- `weight`: The `weight` column lists the corresponding population density value (persons per km^2) for that WSIM-GLDAS cell. The WSIM-GLDAS and GPW raster layers have the same projection and resolution. Therefore, because they are perfectly aligned, each WSIM return period cell has a corresponding GPW population weight that can be layered exactly on it. +- `coverage_area`: The total area (m^2) of the WSIM-GLDAS cell that is covered by the California boundary layer. Given the total area of the WSIM cell that is covered, and the GPW persons per unit area weight, we can calculate the number of people estimated to be living within this cell under this WSIM deficit return period. -We will need to perform a few more processing steps to prepare this `data.frame` for a time series visualization integrating all of the data. We will use the melt function to transform the data from wide format to long format in order to produce a visualization in *ggplot2.* Specifically, we need to "melt" the 12 month columns (`2014-01-01` to `2014-12-01`) into 2 new columns: 1) specifying the WSIM-GLDAS deficit return period value and 2) the month it came from. +We will need to perform a few more processing steps to prepare this `data.frame` for a time series visualization integrating all of the data. We will use the `melt` function to transform the data from wide format to long format in order to produce a visualization in `ggplot2`. Specifically, we need to use `melt` to make the 12 month columns (`2014-01-01` to `2014-12-01`) into 2 new columns: 1) specifying the WSIM-GLDAS deficit return period value and 2) the month it came from. ::: column-margin ::: {.callout-tip style="color: #5a7a2b;"} @@ -596,7 +654,7 @@ pop_by_rp[, pop_frac := pop_rp / total_pop][, total_pop := NULL] head(pop_by_rp) ``` -Before plotting we'll make the month labels more legible for plotting, convert the WSIM-GLDAS return period class into a factor, set the WSIM-GLDAS class palette. +Before plotting we’ll make the month labels more legible for plotting, convert the WSIM-GLDAS return period class into a factor, and set the WSIM-GLDAS class palette. ::: column-margin ::: {.callout-tip style="color: #5a7a2b;"} @@ -672,11 +730,21 @@ file.remove(c("data/gpw_v4_population_density_rev11_2015_15_min.tif", "data/composite_12mo.nc")) ``` +:::{.callout-note} +## Knowledge Check +1. Gridded population datasets…(select all that apply) + a. Show where river tributaries are. + b. Model the distribution of the global human population in raster grid cells. + c. Allow analyses of the number of persons impacted by a hazard such as drought. + d. Vary in the extent to which ancillary data are used in their production. + + ## Congratulations! In this Lesson You Learned How To... -- Identify hot spots of drought and select these hotspots for further analysis. +- Identify areas of severe drought and select these areas for further analysis. - Summarize data by county using the exactextractr tool. -- Integrate WSIM-GLDAS deficit, GPW population, and geoBoundaries administrative data to create complex time series visualizations. +- Integrate WSIM-GLDAS deficit, GPW population, and geoBoundaries administrative boundary data to create complex time series visualizations. + ## Lesson 2