forked from amices/mice
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
155 lines (111 loc) · 5.88 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
output:
md_document:
variant: markdown_github
bibliography: refs.bibtex
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
options(width = 60, digits = 3)
```
# mice
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/mice)](https://cran.r-project.org/package=mice)
[![](http://cranlogs.r-pkg.org/badges/mice)](https://cran.r-project.org/package=mice)
[![](https://img.shields.io/badge/github%20version-3.3.0-orange.svg)](https://github.com/stefvanbuuren/mice)
## [Multivariate Imputation by Chained Equations](http://stefvanbuuren.github.io/mice/)
The [`mice`](https://cran.r-project.org/package=mice) package
implements a method to deal with missing data. The package creates
multiple imputations (replacement values) for multivariate missing
data. The method is based on Fully Conditional Specification, where
each incomplete variable is imputed by a separate model. The `MICE`
algorithm can impute mixes of continuous, binary, unordered
categorical and ordered categorical data. In addition, MICE can impute
continuous two-level data, and maintain consistency between
imputations by means of passive imputation. Many diagnostic plots are
implemented to inspect the quality of the imputations.
## Installation
The `mice` package can be installed from CRAN as follows:
```{r eval = FALSE}
install.packages("mice")
```
The latest version is can be installed from GitHub as follows:
```{r eval = FALSE}
install.packages("devtools")
devtools::install_github(repo = "stefvanbuuren/mice")
```
## Minimal example
```{r pattern, fig.cap = "Missing data pattern of `nhanes` data. Blue is observed, red is missing."}
library(mice, warn.conflicts = FALSE)
# show the missing data pattern
md.pattern(nhanes)
```
The table and the graph summarize where the missing data occur in
the `nhanes` dataset.
```{r stripplot, fig.cap = "Distribution of `chl` per imputed data set."}
# multiple impute the missing values
imp <- mice(nhanes, maxit = 2, m = 2, seed = 1)
# inspect quality of imputations
stripplot(imp, chl, pch = 19, xlab = "Imputation number")
```
In general, we would like the imputations to be plausible, i.e.,
values that could have been observed if they had not been missing.
```{r}
# fit complete-data model
fit <- with(imp, lm(chl ~ age + bmi))
# pool and summarize the results
summary(pool(fit))
```
The complete-data is fit to each imputed dataset, and the
results are combined to arrive at estimates that properly
account for the missing data.
## `mice 3.0`
Version 3.0 represents a major update that implements the
following features:
1. `blocks`: The main algorithm iterates over blocks. A block is
simply a collection of variables. In the common MICE algorithm each
block was equivalent to one variable, which - of course - is
the default; The `blocks` argument allows mixing univariate
imputation method multivariate imputation methods. The `blocks`
feature bridges two seemingly disparate approaches, joint modeling
and fully conditional specification, into one framework;
2. `where`: The `where` argument is a logical matrix of the same size
of `data` that specifies which cells should be imputed. This opens
up some new analytic possibilities;
3. Multivariate tests: There are new functions `D1()`, `D2()`, `D3()`
and `anova()` that perform multivariate parameter tests on the
repeated analysis from on multiply-imputed data;
4. `formulas`: The old `form` argument has been redesign and is now
renamed to `formulas`. This provides an alternative way to specify
imputation models that exploits the full power of R's native
formula's.
5. Better integration with the `tidyverse` framework, especially
for packages `dplyr`, `tibble` and `broom`;
6. Improved numerical algorithms for low-level imputation function.
Better handling of duplicate variables.
7. Last but not least: A brand new edition AND online version of
[Flexible Imputation of Missing Data. Second Edition.](https://stefvanbuuren.name/fimd/)
See [MICE: Multivariate Imputation by Chained Equations](http://stefvanbuuren.github.io/mice/)
for more resources.
I'll be happy to take feedback and discuss suggestions. Please submit these
through Github's issues facility.
## Resources
### Books
1. Van Buuren, S. (2018). [Flexible Imputation of Missing Data. Second Edition.](https://stefvanbuuren.name/fimd/). Chapman & Hall/CRC. Boca Raton, FL.
### Course materials
1. [Handling Missing Data in `R` with `mice`](https://stefvanbuuren.github.io/Winnipeg/)
2. [Statistical Methods for combined data sets](https://stefvanbuuren.github.io/RECAPworkshop/)
### Vignettes
1. [Ad hoc methods and the MICE algorithm](https://gerkovink.github.io/miceVignettes/Ad_hoc_and_mice/Ad_hoc_methods.html)
2. [Convergence and pooling](https://gerkovink.github.io/miceVignettes/Convergence_pooling/Convergence_and_pooling.html)
3. [Inspecting how the observed data and missingness are related](https://gerkovink.github.io/miceVignettes/Missingness_inspection/Missingness_inspection.html)
4. [Passive imputation and post-processing](https://gerkovink.github.io/miceVignettes/Passive_Post_processing/Passive_imputation_post_processing.html)
5. [Imputing multilevel data](https://gerkovink.github.io/miceVignettes/Multi_level/Multi_level_data.html)
6. [Sensitivity analysis with `mice`](https://gerkovink.github.io/miceVignettes/Sensitivity_analysis/Sensitivity_analysis.html)
7. [Generate missing values with `ampute`](https://rianneschouten.github.io/mice_ampute/vignette/ampute.html)
### Code from publications
1. [Flexible Imputation of Missing Data. Second edition.](https://github.com/stefvanbuuren/FIMD/tree/master/R)