-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME.Rmd
executable file
·278 lines (235 loc) · 12.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
---
output:
github_document
---
# rhdx
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = FALSE,
message = FALSE,
eval = FALSE,
fig.width = 10,
fig.path = "inst/img/",
comment = "#> "
)
```
[![Project Status: Active - Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](http://www.repostatus.org/badges/latest/wip.svg)](http://www.repostatus.org/#wip)
[![GitLab CI Build Status](https://gitlab.com/dickoa/rhdx/badges/master/pipeline.svg)](https://gitlab.com/dickoa/rhdx/pipelines)
[![Travis build status](https://api.travis-ci.org/dickoa/rhdx.svg?branch=master)](https://travis-ci.org/dickoa/rhdx)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/gitlab/dickoa/rhdx?branch=master&svg=true)](https://ci.appveyor.com/project/dickoa/rhdx)
[![Codecov Code Coverage](https://codecov.io/gh/dickoa/rhdx/branch/master/graph/badge.svg)](https://codecov.io/gh/dickoa/rhdx)
[![CRAN status](https://www.r-pkg.org/badges/version/rhdx)](https://cran.r-project.org/package=rhdx)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
`rhdx` is an R client for the Humanitarian Exchange Data platform.
## Introduction
The [Humanitarian Data Exchange platform](https://data.humdata.org/) is the open platform to easily find and analyze humanitarian data.
## Installation
This package is not on yet on CRAN and to install it, you will need the [`remotes`](https://github.com/r-lib/remotes) package.
You can get `rhdx` from Gitlab or Github (mirror)
```{r}
## install.packages("remotes")
remotes::install_gitlab("dickoa/rhdx")
remotes::install_github("dickoa/rhdx")
```
## rhdx: A quick tutorial
```{r}
library("rhdx")
```
The first step is usually to connect to HDX using the `set_rhdx_config` function and check the config using `get_rhdx_config`
```{r}
set_rhdx_config(hdx_site = "prod")
get_rhdx_config()
## <HDX Configuration>
## HDX site: prod
## HDX site url: https://data.humdata.org/
## HDX API key:
```
Now that we are connected to HDX, we can search for dataset using `search_datasets`, access resources withini the dataset page with the `get_resources` function and finally read the data directly into the `R` session using `read_resource`.
`magrittr` pipes operator are also supported
```{r}
library(tidyverse)
search_datasets("ACLED Mali", rows = 2) %>% ## search dataset in HDX, limit the results to two rows
pluck(1) %>% ## select the first dataset
get_resource(1) %>% ## pick the first resource
read_resource() ## read this HXLated data into R
## # A tibble: 2,516 x 30
## data_id iso event_id_cnty event_id_no_cnty event_date year
## * <dbl> <dbl> <chr> <dbl> <date> <dbl>
## 1 2942561 466 MLI2605 2605 2019-01-26 2019
## 2 2942562 466 MLI2606 2606 2019-01-26 2019
## 3 2942557 466 MLI2601 2601 2019-01-25 2019
## 4 2942558 466 MLI2602 2602 2019-01-25 2019
## 5 2942559 466 MLI2603 2603 2019-01-25 2019
## 6 2942560 466 MLI2604 2604 2019-01-25 2019
## 7 2942555 466 MLI2599 2599 2019-01-24 2019
## 8 2942556 466 MLI2600 2600 2019-01-24 2019
## 9 2942553 466 MLI2597 2597 2019-01-23 2019
## 10 2942554 466 MLI2598 2598 2019-01-23 2019
## # … with 2,506 more rows, and 24 more variables:
## # time_precision <dbl>, event_type <chr>, actor1 <chr>,
## # assoc_actor_1 <chr>, inter1 <dbl>, actor2 <chr>,
## # assoc_actor_2 <chr>, inter2 <dbl>, interaction <dbl>,
## # region <chr>, country <chr>, admin1 <chr>, admin2 <chr>,
## # admin3 <chr>, location <chr>, latitude <dbl>,
## # longitude <dbl>, geo_precision <dbl>, source <chr>,
## # source_scale <chr>, notes <chr>, fatalities <dbl>,
## # timestamp <dbl>, iso3 <chr>
```
`read_resource` will not work with resources in HDX, so far the following format are supported: `csv`, `xlsx`, `xls`, `json`, `geojson`, `zipped shapefile`, `kmz`, `zipped geodatabase` and `zipped geopackage`.
I will consider adding more data types in the future, feel free to file an issue if it doesn't work as expected or you want to add a support for a format.
### Reading dataset directly
We can also use `pull_dataset` to directly read and access a dataset object.
```{r, eval = FALSE}
pull_dataset("acled-data-for-mali") %>%
get_resource(1) %>%
read_resource()
## # A tibble: 3,990 x 31
## data_id iso event_id_cnty event_id_no_cnty event_date year
## <dbl> <dbl> <chr> <dbl> <date> <dbl>
## 1 7173324 466 MLI4111 4111 2020-07-31 2020
## 2 7173322 466 MLI4109 4109 2020-07-29 2020
## 3 7173323 466 MLI4110 4110 2020-07-29 2020
## 4 7173423 466 MLI4107 4107 2020-07-28 2020
## 5 7173761 466 MLI4108 4108 2020-07-28 2020
## 6 7173702 466 MLI4104 4104 2020-07-27 2020
## 7 7173732 466 MLI4103 4103 2020-07-27 2020
## 8 7173319 466 MLI4102 4102 2020-07-27 2020
## 9 7173320 466 MLI4105 4105 2020-07-27 2020
## 10 7173321 466 MLI4106 4106 2020-07-27 2020
## # … with 3,980 more rows, and 25 more variables:
## # time_precision <dbl>, event_type <chr>,
## # sub_event_type <chr>, actor1 <chr>, assoc_actor_1 <chr>,
## # inter1 <dbl>, actor2 <chr>, assoc_actor_2 <chr>,
## # inter2 <dbl>, interaction <dbl>, region <chr>,
## # country <chr>, admin1 <chr>, admin2 <chr>, admin3 <chr>,
## # location <chr>, latitude <dbl>, longitude <dbl>,
## # geo_precision <dbl>, source <chr>, source_scale <chr>,
## # notes <chr>, fatalities <dbl>, timestamp <dbl>, iso3 <chr>
```
## A step by step tutorial to getting data from rhdx
### Connect to a server
In order to connect to HDX, we can use the `set_rhdx_config` function
```{r, eval = FALSE}
set_rhdx_config(hdx_site = "prod")
```
### Search datasets
Once a server is chosen, we can now search from dataset using the `search_datasets`
In this case we will limit just to two results (`rows` parameter).
```{r, eval = FALSE}
list_of_ds <- search_datasets("displaced Nigeria", rows = 2)
list_of_ds
## [[1]]
## <HDX Dataset> 4fbc627d-ff64-4bf6-8a49-59904eae15bb
## Title: Nigeria - Internally displaced persons - IDPs
## Name: idmc-idp-data-for-nigeria
## Date: 01/01/2009-12/31/2016
## Tags (up to 5): displacement, idmc, population
## Locations (up to 5): nga
## Resources (up to 5): displacement_data, conflict_data, disaster_data
## [[2]]
## <HDX Dataset> 4adf7874-ae01-46fd-a442-5fc6b3c9dff1
## Title: Nigeria Baseline Assessment Data [IOM DTM]
## Name: nigeria-baseline-data-iom-dtm
## Date: 01/31/2018
## Tags (up to 5): adamawa, assessment, baseline-data, baseline-dtm, bauchi
## Locations (up to 5): nga
## Resources (up to 5): DTM Nigeria Baseline Assessment Round 21, DTM Nigeria Baseline Assessment Round 20, DTM Nigeria Baseline Assessment Round 19, DTM Nigeria Baseline Assessment Round 18, DTM Nigeria Baseline Assessment Round 17
```
### Choose the dataset you want to manipulate in R, in this case we will take the first one.
The result of `search_datasets` is a list of HDX datasets, you can manipulate this list like any other `list` in `R`.
We can use `purrr::pluck` to select the element we want in our list, here it is the first.
```{r, eval = FALSE}
ds <- pluck(list_of_ds, 1)
ds
## <HDX Dataset> 4fbc627d-ff64-4bf6-8a49-59904eae15bb
## Title: Nigeria - Internally displaced persons - IDPs
## Name: idmc-idp-data-for-nigeria
## Date: 01/01/2009-12/31/2016
## Tags (up to 5): displacement, idmc, population
## Locations (up to 5): nga
## Resources (up to 5): displacement_data, conflict_data, disaster_data
```
### List all resources in the dataset
With our dataset, the next step is to list all the resources. If you are not familiar with CKAN terminology, `resources` refer to the actual files shared in a
dataset page and you can download. Each dataset page contains one or more resources.
```{r, eval = FALSE}
get_resources(ds)
## [[1]]
## <HDX Resource> f57be018-116e-4dd9-a7ab-8002e7627f36
## Name: displacement_data
## Description: Internally displaced persons - IDPs (new displacement associated with conflict and violence)
## Size:
## Format: JSON
## [[2]]
## <HDX Resource> 6261856c-afb9-4746-b340-9cf531cbd38f
## Name: conflict_data
## Description: Internally displaced persons - IDPs (people displaced by conflict and violence)
## Size:
## Format: JSON
## [[3]]
## <HDX Resource> b8ff1f4b-105c-4a6c-bf54-a543a486ab7e
## Name: disaster_data
## Description: Internally displaced persons - IDPs (new displacement associated with disasters)
## Size:
## Format: JSON
```
### Choose a resource we need to download/read
For this example, we are looking for the displacement data and it's the first resource in the dataset page. We can use `pluck` on the list of resources or the helper
function `get_resource(resource, resource_index)` to select the resource we want to use.
The selected resource can be then downloaded and store for further use or directly read into your R session using the `read_resource` function.
The resource is a `json` file and it can be read directly using `jsonlite` package, we added a `simplify_json` option to get a `vector` or a `data.frame` when possible instead of a `list`.
```{r, eval = FALSE}
idp_nga_rs <- get_resource(ds, 1)
idp_nga_df <- read_resource(idp_nga_rs, simplify_json = TRUE, download_folder = tempdir())
idp_nga_df
## # A tibble: 11 x 7
## ISO3 Name Year `Conflict Stock… `Conflict New D…
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 NGA Nige… 2009 NA 5000
## 2 NGA Nige… 2010 NA 5000
## 3 NGA Nige… 2011 NA 65000
## 4 NGA Nige… 2012 NA 63000
## 5 NGA Nige… 2013 3300000 471000
## 6 NGA Nige… 2014 1075000 975000
## 7 NGA Nige… 2015 2096000 737000
## 8 NGA Nige… 2016 1955000 501000
## 9 NGA Nige… 2017 1707000 279000
## 10 NGA Nige… 2018 2216000 541000
## 11 NGA Nige… 2019 2583000 248000
## # … with 2 more variables: `Disaster New Displacements` <dbl>,
## # `Disaster Stock Displacement` <dbl>
```
### Using `magrittr` pipe
All these operations can be chained using pipes `%>%` and allow for a powerful grammar to easily get humanitarian data in R.
```{r, eval = FALSE}
library(tidyverse)
set_rhdx_config(hdx_site = "prod")
idp_nga_df <-
search_datasets("displaced Nigeria", rows = 2) %>%
pluck(1) %>%
get_resource(1) %>% ## get the first resource
read_resource(simplify_json = TRUE, download_folder = tempdir()) ## the file will be downloaded in a temporary directory
idp_nga_df
## # A tibble: 11 x 7
## ISO3 Name Year `Conflict Stock… `Conflict New D…
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 NGA Nige… 2009 NA 5000
## 2 NGA Nige… 2010 NA 5000
## 3 NGA Nige… 2011 NA 65000
## 4 NGA Nige… 2012 NA 63000
## 5 NGA Nige… 2013 3300000 471000
## 6 NGA Nige… 2014 1075000 975000
## 7 NGA Nige… 2015 2096000 737000
## 8 NGA Nige… 2016 1955000 501000
## 9 NGA Nige… 2017 1707000 279000
## 10 NGA Nige… 2018 2216000 541000
## 11 NGA Nige… 2019 2583000 248000
## # … with 2 more variables: `Disaster New Displacements` <dbl>,
## # `Disaster Stock Displacement` <dbl>
```
## Meta
* Please [report any issues or bugs](https://gitlab.com/dickoa/rhdx/issues).
* License: MIT
* Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.