forked from wlandau/targets-tutorial
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path3-changes.Rmd
278 lines (197 loc) · 6.73 KB
/
3-changes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
---
title: "Runtime savings amid rapid change"
output: html_notebook
---
# About
In this chapter, we will put `targets` to the test. We will modify code and data and and watch `tar_make()` run the minimum amount of computation required to bring the results back up to date. This saves time when an ambitious project is still under development and rapidly changing.
# Instructions
Throughout this chapter, you will test your understanding with quiz questions. Supply your choices to the `answer3_*()` functions. Run the code chunk below to set up the quiz questions. Please try not to peak in advance. It is okay to answer incorrectly at first.
```{r}
source("R/quiz.R")
source("3-changes/answers.R")
```
Let's clear out the old `_targets/` data store.
```{r}
library(targets)
tar_destroy() # Start fresh.
```
# Setup
Run the line below to begin with a new `_targets.R` file.
```{r}
tmp <- file.copy("3-changes/initial_targets.R", "_targets.R", overwrite = TRUE)
```
Then, open `_targets.R` in the RStudio IDE.
```{r}
tar_edit()
```
What do you see?
A. A call to `source("3-changes/functions.R")` to load our custom functions.
B. A call to `tar_option_set()` to declare the packages the targets need when they run.
C. The full pipeline from `2-pipelines.Rmd` at the very bottom of the file.
D. All the above.
```{r}
answer3_setup("E") # Supply your own answer here.
```
# First target
Run whole the pipeline with `tar_make()`.
```{r}
tar_make()
```
Which target ran first? Why?
A. `churn_data`, because the rest of the targets depend on it.
B. `churn_file`, because the rest of the targets depend on it.
C. `churn_data`, because it appears first in `_targets.R`.
D. `churn_file`, because it appears first in `_targets.R`.
```{r}
answer3_first("E")
```
# Last target
Which target ran last? Why?
A. `best_model`, because it depends on the results of the other targets.
B. `best_model`, because it is listed last in the pipeline.
C. `best_run`, because it depends on the results of the other targets.
D. `best_run`, because it is listed last in the pipeline.
```{r}
answer3_last("E")
```
# Inspection
Load the `best_model` target from the `_targets/` data store using the `tar_load()` function. Print the object.
```{r}
tar_load(best_model)
print(best_model)
```
What do you see?
A. A function.
B. A trained Keras model.
C. A one-row data frame.
D. A recipe object.
```{r}
answer3_inspect("E")
```
# Rerun `tar_make()`
Run `tar_make()` again.
```{r}
tar_make()
```
What happened? Why?
A. No target reran because the previous results are already in storage (`_targets/`).
B. `tar_make()` reran the best model.
C. All the targets reran.
D. No target reran because all the targets are already up to date with their dependencies (the underlying code and data, including upstream targets).
```{r}
answer3_rerun("E")
```
# Restart the session
Restart your R session.
```{r}
rstudioapi::restartSession()
```
Restore the session.
```{r}
source("R/quiz.R")
source("3-changes/answers.R")
```
Then, run `tar_make()` again.
```{r}
library(targets)
tar_make()
```
Which targets reran? Why?
A. None. The targets are in storage (`_targets/objects/`) and are still up to date with their dependencies.
B. I get a confusing error message.
C. All targets reran because I restarted the R session.
D. All the models rerun.
```{r}
answer3_restart("E")
```
# Change the data.
Remove the last row of the dataset file `data/churn.csv`.
```{r, message = FALSE}
library(tidyverse)
"data/churn.csv" %>%
read_csv(col_types = cols()) %>%
head(n = nrow(.) - 1) %>%
write_csv("data/churn.csv")
```
Run the pipeline again.
```{r}
tar_make()
```
Which targets reran? Why?
A. All of them. `targets` notices it has been a while since you ran the pipeline.
B. All of them. The data file changed, and `targets` automatically knows `data/churn.csv` is a data file.
C. All of them. The data file changed, and the pipeline is configured to automatically track `data/churn.csv`. The pipeline tracks the data file because the `churn_file` target returns the value `"data/churn.csv"` and the call to `tar_target()` has `format = "file"`.
D. None of them, because `data/churn.csv` is neither a target nor a function in the global environment.
```{r}
answer3_data("E")
```
# Change a command.
Open `_targets.R` for editing in the RStudio IDE.
```{r}
library(targets)
tar_edit()
```
In the command of target `best_run`, change the function to `rbind()` to the preferred `bind_rows()` from `dplyr`. The target definition should look like this:
```{r, eval = FALSE}
# Do not run here.
tar_target(
best_run,
bind_rows(run_relu, run_sigmoid, run_softmax) %>%
top_n(1, accuracy) %>%
head(1)
)
```
Then, rerun the pipeline.
```{r}
tar_make()
```
Which targets ran? Why?
A. None. `bind_rows()` and `rbind()` return the same value.
B. None. `targets` can predict that `bind_rows()` and `rbind()` will return the same value.
C. `best_run` and `best_model`. The command of `best_run` changed, and `best_model` is directly downstream.
D. `best_run` only. The command of `best_run` changed, but `best_run` returned the same value, so `best_model` is still up to date.
```{r}
answer3_command("E")
```
# Change a function.
Open `3-changes/functions.R` for editing in the RStudio IDE. In the body of `define_model()`, change the `rate` argument to 0.2 in the first call to `layer_dropout()`.
```{r, eval = FALSE}
# Do not run here.
layer_dropout(rate = 0.2) # previously 0.1
```
Then, rerun the pipeline.
```{r}
tar_make()
```
Which targets ran? Why?
A. None. No target directly calls `define_model()`.
B. None. The model definition did not change much.
C. `run_relu`, `run_sigmoid`, `run_softmax`, `best_run`, and possibly `best_model`. `best_model` might not run because it only depends on `best_run`.
D. `run_relu`, `run_sigmoid`, `run_softmax`, `best_run`, and `best_model`.
```{r}
answer3_function("E")
```
# Make a trivial change.
Open `3-changes/functions.R` for editing in the RStudio IDE. Add a comment in the body of `define_model()` and change nothing else.
```{r, eval = FALSE}
# Do not run here.
define_model <- function(churn_recipe, units1, units2, act1, act2, act3) {
# This is an arbitrary comment.
input_shape <- ncol(
juice(churn_recipe, all_predictors(), composition = "matrix")
)
# Leave the rest of the function alone.
}
```
Then, run the pipeline again.
```{r}
tar_make()
```
Which targets ran? Why?
A. `run_relu`, `run_sigmoid`, `run_softmax`, `best_run`, and `best_model`. The `define_model()` function changed, which affects all these targets.
B. None. `tar_make()` ignores trivial changes like comments and white space: anything that does not show up in `deparse(define_model)`.
C. None. `tar_make()` ignores all changes to functions.
D. None. No target directly calls `define_model()`.
```{r}
answer3_trivial("E")
```