Skip to content
This repository has been archived by the owner on Jan 12, 2025. It is now read-only.

Add presentation on the drake package #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions _files/slides/will-drake/abstract.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Reproducibility and high-performance computing with drake

There is room to improve the conversation and the landscape of reproducibility in the R and Statistics communities. At a more basic level than scientific replicability, literate programming, and version control, reproducibility carries an implicit promise that the alleged results of an analysis really do match the code. Drake helps keep this promise by tracking the relationships among the components of the analysis, a rare and effective approach that also saves time. And with multiple parallel computing options that switch on auto-magically, drake is also a convenient and powerful high-performance computing solution. Drake is published on CRAN, and you can follow the development at https://github.com/wlandau-lilly/drake.

13 changes: 13 additions & 0 deletions _files/slides/will-drake/demo.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
library(drake)
load_basic_example()
plot_graph(my_plan)
outdated(my_plan)
max_useful_jobs(my_plan)
make(my_plan)
plot_graph(my_plan)
reg2 = function(d){ # Change one of your functions.
d$x3 = d$x^3
lm(y ~ x3, data = d)
}
outdated(my_plan) # Some targets depend on reg2().
plot_graph(my_plan)
278 changes: 278 additions & 0 deletions _files/slides/will-drake/drake.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
---
title: "Data frames in R for Make"
subtitle: "Reproducibility and high-performance computing"
output:
revealjs::revealjs_presentation:
template: include/template.html
includes:
in_header: include/header.html
css: style.css
author: Will Landau
---

## drake in action

<h1 align="center">
<img src="images/demo.gif" alt="demo" style="border: none; box-shadow: none">
</h1>


## Workflow plan data frame

```{r plandf}
library(drake)
load_basic_example()
my_plan
```

## Network graph

```r
# The graph is interactive! Hover, click, drag, zoom, pan.
plot_graph(my_plan)
```

```{r graph, echo = FALSE}
plot_graph(my_plan, width = "100%", height = "400px", verbose = FALSE,
file = "graph1.html", font_size = 25)
```

<iframe src="graph1.html" width = "100%" height = "500px" allowtransparency="true"></iframe>


## Just the targets

```r
plot_graph(my_plan, targets_only = TRUE)
```

```{r graphtargs, echo = FALSE}
plot_graph(my_plan, targets_only = TRUE, width = "100%", height = "400px", verbose = FALSE, file = "graph2.html", font_size = 25)
```

<iframe src="graph2.html" width = "100%" height = "500px" allowtransparency="true"></iframe>


## Execution

```{r make1}
make(my_plan)
```

## Results

```{r}
loadd(small)
small
readd(coef_regression2_large)
```

## Reproducibility


```r
plot_graph(my_plan)
```

```{r make2, echo = FALSE}
plot_graph(my_plan, width = "100%", height = "400px", verbose = FALSE, file = "graph3.html", font_size = 25)
```

<iframe src="graph3.html" width = "100%" height = "500px" allowtransparency="true"></iframe>




## Reproducibility

```{r make3}
reg2 = function(d){ # Change one of your functions.
d$x3 = d$x^3
lm(y ~ x3, data = d)
}
outdated(my_plan, verbose = FALSE) # Some targets are now out of date.
missed(my_plan, verbose = FALSE) # But our workspace has all we need.
```

## Reproducibility

```r
plot_graph(my_plan)
```

```{r plotgraphreg3, echo = FALSE}
plot_graph(my_plan, width = "100%", height = "400px", verbose = FALSE, file = "graph4.html", font_size = 25)
```

<iframe src="graph4.html" width = "100%" height = "500px" allowtransparency="true"></iframe>




## Reproducibility
```{r rebuildreg3}
make(my_plan) # Only rebuild the outdated targets.
```

## High-performance computing

<div class="left">How many jobs could help?</div>

```r
max_useful_jobs(my_plan)
```

<div class="left">Parallel processes: low overhead, light weight</div>

```r
make(my_plan, jobs = 2) # Backend chosen based on platform.
make(my_plan, parallelism = "mclapply", jobs = 2) # Mac/Linux
make(my_plan, parallelism = "parLapply", jobs = 2) # Windows too
```

<div class="left">Parallel R sessions: high overhead, heavy duty</div>

```{r eval = FALSE}
make(my_plan, parallelism = "Makefile", jobs = 2)
make(my_plan, parallelism = "Makefile", command = "make",
args = c("--jobs=2", "--silent"))
```

## Supercomputing

<div style="font-size: 0.9em">
<div class="left">`my_script.R`</div>
```{r shell, eval = FALSE}
# Your setup...
make(my_plan, parallelism = "Makefile", jobs = 8,
prepend = "SHELL = ./shell.sh")
```

<div class="left">`shell.sh` (write with shell_file())</div>
```{r shellfile, eval = FALSE}
#!/bin/bash
shift
echo "module load R; $*" | qsub -sync y -cwd -j y
```

<div class="left">Run on a cluster or supercomputer.</div>
```{r submit, eval = FALSE}
chmod +x shell.sh
nohup nice -19 R CMD BATCH my_script.R &
```
</div>

## Utilities

<section>
<div style="float:left; width: 50%; text-align: center">
Workflow plan
```r
plan()
analyses()
summaries()
evaluate()
expand()
gather()
```

Dependency network
```r
outdated()
missed()
plot_graph()
read_graph()
dataframes_graph()
deps()
tracked()
max_useful_jobs()
```
</div>
<div id="float:right; width: 50%; text-align: center">
Cache
```r
clean()
cached()
imported()
built()
readd()
loadd()
find_project()
find_cache()
```

Debugging
```r
check()
session()
in_progress()
progress()
read_config()
```

</div>
</section>

## Learning

<div class="left">- Basic example</div>
```r
load_basic_example()
examples_drake() # List examples.
example_drake("basic") # Generate code files.
```

<div class="left">- Tutorials</div>
```{r getbasic, eval = FALSE}
vignette("drake") # High-level overview.
vignette("quickstart") # Deep dive.
vignette("caution") # Pitfalls.
```

- Rendered tutorials: [https://CRAN.R-project.org/package=drake/vignettes](https://CRAN.R-project.org/package=drake/vignettes)
- Bug reports, issues, feature requests: [https://github.com/wlandau-lilly/drake/issues](https://github.com/wlandau-lilly/drake/issues)

```{r cleanup, echo=F}
clean(destroy = TRUE)
unlink("report.Rmd")
```

## Similar work

- Main inspiration: [remake](https://github.com/richfitz/remake) (FitzJohn)
- [GNU Make](https://www.gnu.org/software/make) (GNU Project)
- Packages for caching and tracking:
- [archivist](https://cran.r-project.org/package=archivist) (Biecek et al.)
- [memoise](https://cran.r-project.org/package=memoise) (Wickham et al.)
- [R.cache](https://cran.r-project.org/package=R.cache) (Bengtsson)
- [trackr](https://github.com/gmbecker/recordr) (Moore and Becker)
- CRAN task views:
- [reproducible research](https://CRAN.R-project.org/view=ReproducibleResearch)
- [high-performance computing]( https://CRAN.R-project.org/view=HighPerformanceComputing)

## Sources

<ul style = "font-size: 0.65em">
<li>
Bengtsson, Henrik. "R.cache: Fast and light-eeight caching (memoization) of objects and results to speed up computations." 2015. R package version 0.12.0. [https://CRAN.R-project.org/package=R.cache](https://CRAN.R-project.org/package=R.cache).
</li>
<li>
Biecek, Przemyslaw and Kosinki, Marcin. "archivist: an R package for managing, recording, and restoring data analysis results." 2016. R package version 2.1.2. [https://CRAN.R-project.org/package=archivist](https://CRAN.R-project.org/package=archivist).
</li>
<li>
FitzJohn, Rich. "remake: Make-like declarative workflows in R." 2017. R package version 0.3.0. GitHub repository, [https://github.com/richfitz/remake](https://github.com/richfitz/remake).
</li>
<li>
Landau, William M. "Drake: data frames in R for Make." 2017. R package version 4.0.0. [https://CRAN.R-project.org/package=drake](https://CRAN.R-project.org/package=drake).
</li>
<li>
Moore, Sara and Becker, Gabriel. "trackr: Semantic annotation and discoverability system for R-based artifacts." 2017. R package version 0.7.4. [https://github.com/gmbecker/recordr](https://github.com/gmbecker/recordr).
</li>
<li>
Müller, Kirill. "Reproducible workflows with R." Zurich R user meetup. April 10, 2017. [https://krlmlr.github.io/remake-slides](https://krlmlr.github.io/remake-slides).
</li>
<li>
Stallman, Richard M. and McGrath, Roland and Smith, Paul D. <u>GNU Make: A Program for Directing Recompilation, for version 3.81</u>. Free Software Foundation, 2004.
<li>
Wickham, Hadley and Hester, Jim and Müller, Kirill. "memoise: Memoisation of functions." R package version 1.0.0. [https://CRAN.R-project.org/package=memoise](https://CRAN.R-project.org/package=memoise)
</li>
850 changes: 850 additions & 0 deletions _files/slides/will-drake/drake.html

Large diffs are not rendered by default.

18 changes: 18 additions & 0 deletions _files/slides/will-drake/graph1.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<script src="graph1_files/htmlwidgets-0.9/htmlwidgets.js"></script>
<link href="graph1_files/vis-4.20.0/vis.css" rel="stylesheet" />
<script src="graph1_files/vis-4.20.0/vis.min.js"></script>
<script src="graph1_files/visNetwork-binding-2.0.0/visNetwork.js"></script>

</head>
<body style="background-color:white;">
<div id="htmlwidget_container">
<div id="htmlwidget-ec897849c5c50a6f7846" style="width:100%;height:400px;" class="visNetwork html-widget"></div>
</div>
<script type="application/json" data-for="htmlwidget-ec897849c5c50a6f7846">{"x":{"nodes":{"id":["'report.md'","small","large","report_dependencies","regression1_small","regression1_large","regression2_small","regression2_large","summ_regression1_small","summ_regression1_large","summ_regression2_small","summ_regression2_large","coef_regression1_small","coef_regression1_large","coef_regression2_small","coef_regression2_large","my_knit","simulate","reg1","reg2","'report.Rmd'","c","summary","suppressWarnings","coef","knit","data.frame","rpois","stats::rnorm","lm"],"label":["'report.md'","small","large","report_dependencies","regression1_small","regression1_large","regression2_small","regression2_large","summ_regression1_small","summ_regression1_large","summ_regression2_small","summ_regression2_large","coef_regression1_small","coef_regression1_large","coef_regression2_small","coef_regression2_large","my_knit","simulate","reg1","reg2","'report.Rmd'","c","summary","suppressWarnings","coef","knit","data.frame","rpois","stats::rnorm","lm"],"level":[7,3,3,6,4,4,4,4,5,5,5,5,5,5,5,5,2,2,2,2,1,1,1,1,1,1,1,1,1,1],"font.size":[25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25],"status":["outdated","outdated","outdated","outdated","outdated","outdated","outdated","outdated","outdated","outdated","outdated","outdated","outdated","outdated","outdated","outdated","import","import","import","import","import","import","import","import","import","import","import","import","import","import"],"color":["#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#aa0000","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd","#1874cd"],"shape":["square","dot","dot","dot","dot","dot","dot","dot","dot","dot","dot","dot","dot","dot","dot","dot","triangle","triangle","triangle","triangle","square","triangle","triangle","triangle","triangle","triangle","triangle","triangle","triangle","triangle"],"hover_label":["my_knit('report.Rmd', report_dependencies)","simulate(5)","simulate(50)","c(small, large, coef_regression2_small)","reg1(small)","reg1(large)","reg2(small)","reg2(large)","suppressWarnings(summary(regression1_small))","suppressWarnings(summary(regression1_large))","suppressWarnings(summary(regression2_small))","suppressWarnings(summary(regression2_large))","coef(regression1_small)","coef(regression1_large)","coef(regression2_small)","coef(regression2_large)","my_knit","simulate","reg1","reg2","'report.Rmd'","c","summary","suppressWarnings","coef","knit","data.frame","rpois","stats::rnorm","lm"],"x":[-1,0.288888888888889,-0.0222222222222223,-0.2,0.866666666666667,-0.111111111111111,0.288888888888889,0.555555555555556,0.955555555555555,0.377777777777778,0.511111111111111,0.288888888888889,0.6,-0.822222222222222,0.0666666666666667,0.777777777777778,-0.911111111111111,-0.155555555555556,0.288888888888889,0.555555555555556,-1,-0.644444444444444,1,0.466666666666667,0.288888888888889,-0.911111111111111,-0.288888888888889,-0.377777777777778,-0.555555555555556,0.866666666666667],"y":[1,-0.333333333333333,-0.333333333333333,0.666666666666667,0,0,0,0,0.333333333333333,0.333333333333333,0.333333333333333,0.333333333333333,0.333333333333333,0.333333333333333,0.333333333333333,0.333333333333333,-0.666666666666667,-0.666666666666667,-0.666666666666667,-0.333333333333333,-1,-1,-1,0,-1,-1,-1,-1,-1,-1]},"edges":{"from":["'report.Rmd'","my_knit","report_dependencies","simulate","simulate","c","coef_regression2_small","large","small","reg1","small","large","reg1","reg2","small","large","reg2","regression1_small","summary","suppressWarnings","regression1_large","summary","suppressWarnings","regression2_small","summary","suppressWarnings","regression2_large","summary","suppressWarnings","coef","regression1_small","coef","regression1_large","coef","regression2_small","coef","regression2_large","knit","data.frame","rpois","stats::rnorm","lm","lm"],"to":["'report.md'","'report.md'","'report.md'","small","large","report_dependencies","report_dependencies","report_dependencies","report_dependencies","regression1_small","regression1_small","regression1_large","regression1_large","regression2_small","regression2_small","regression2_large","regression2_large","summ_regression1_small","summ_regression1_small","summ_regression1_small","summ_regression1_large","summ_regression1_large","summ_regression1_large","summ_regression2_small","summ_regression2_small","summ_regression2_small","summ_regression2_large","summ_regression2_large","summ_regression2_large","coef_regression1_small","coef_regression1_small","coef_regression1_large","coef_regression1_large","coef_regression2_small","coef_regression2_small","coef_regression2_large","coef_regression2_large","my_knit","simulate","simulate","simulate","reg1","reg2"],"arrows":["to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to","to"]},"nodesToDataframe":true,"edgesToDataframe":true,"options":{"width":"100%","height":"100%","nodes":{"shape":"dot","physics":false},"manipulation":{"enabled":false},"layout":{"hierarchical":{"enabled":true,"direction":"LR"}},"edges":{"smooth":false},"physics":{"stabilization":false},"interaction":{"navigationButtons":true,"hover":true}},"groups":null,"width":"100%","height":"400px","idselection":{"enabled":false},"byselection":{"enabled":false},"main":null,"submain":null,"footer":null,"legend":{"width":0.2,"useGroups":false,"position":"left","ncol":1,"stepX":100,"stepY":100,"nodes":{"label":["Up to date","In progress","Outdated","Imported","Missing","Object","Function","File"],"color":["#228b22","#ff7221","#aa0000","#1874cd","#9a32cd","gray","gray","gray"],"shape":["dot","dot","dot","dot","dot","dot","triangle","square"],"font.color":["black","black","black","black","black","black","black","black"],"font.size":[25,25,25,25,25,25,25,25],"id":[1,2,3,4,5,6,7,8]},"nodesToDataframe":true},"igraphlayout":{"type":"square"},"tooltipStay":300,"tooltipStyle":"position: fixed;visibility:hidden;padding: 5px;white-space: nowrap;font-family: verdana;font-size:14px;font-color:#000000;background-color: #f5f4ed;-moz-border-radius: 3px;-webkit-border-radius: 3px;border-radius: 3px;border: 1px solid #808074;box-shadow: 3px 3px 10px rgba(0, 0, 0, 0.2);","events":{"hoverNode":"function(e){\n var label_info = this.body.data.nodes.get({\n fields: ['label', 'hover_label'],\n filter: function (item) {\n return item.id === e.node\n },\n returnType :'Array'\n });\n this.body.data.nodes.update({id: e.node, label : label_info[0].hover_label, hover_label : label_info[0].label});\n }","blurNode":"function(e){\n var label_info = this.body.data.nodes.get({\n fields: ['label', 'hover_label'],\n filter: function (item) {\n return item.id === e.node\n },\n returnType :'Array'\n });\n this.body.data.nodes.update({id: e.node, label : label_info[0].hover_label, hover_label : label_info[0].label});\n }"}},"evals":["events.hoverNode","events.blurNode"],"jsHooks":[]}</script>
<script type="application/htmlwidget-sizing" data-for="htmlwidget-ec897849c5c50a6f7846">{"viewer":{"width":"100%","height":"400px","padding":15,"fill":false},"browser":{"width":"100%","height":"400px","padding":40,"fill":false}}</script>
</body>
</html>
Loading