-
Notifications
You must be signed in to change notification settings - Fork 90
/
Copy pathpredict.model_fit.Rd
174 lines (147 loc) · 6.69 KB
/
predict.model_fit.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/predict.R, R/predict_raw.R
\name{predict.model_fit}
\alias{predict.model_fit}
\alias{predict_raw.model_fit}
\alias{predict_raw}
\title{Model predictions}
\usage{
\method{predict}{model_fit}(object, new_data, type = NULL, opts = list(), ...)
\method{predict_raw}{model_fit}(object, new_data, opts = list(), ...)
predict_raw(object, ...)
}
\arguments{
\item{object}{A \link[=model_fit]{model fit}.}
\item{new_data}{A rectangular data object, such as a data frame.}
\item{type}{A single character value or \code{NULL}. Possible values
are \code{"numeric"}, \code{"class"}, \code{"prob"}, \code{"conf_int"}, \code{"pred_int"},
\code{"quantile"}, \code{"time"}, \code{"hazard"}, \code{"survival"}, or \code{"raw"}. When \code{NULL},
\code{predict()} will choose an appropriate value based on the model's mode.}
\item{opts}{A list of optional arguments to the underlying
predict function that will be used when \code{type = "raw"}. The
list should not include options for the model object or the
new data being predicted.}
\item{...}{Additional \code{parsnip}-related options, depending on the
value of \code{type}. Arguments to the underlying model's prediction
function cannot be passed here (use the \code{opts} argument instead).
Possible arguments are:
\itemize{
\item \code{interval}: for \code{type} equal to \code{"survival"} or \code{"quantile"}, should
interval estimates be added, if available? Options are \code{"none"}
and \code{"confidence"}.
\item \code{level}: for \code{type} equal to \code{"conf_int"}, \code{"pred_int"}, or \code{"survival"},
this is the parameter for the tail area of the intervals
(e.g. confidence level for confidence intervals).
Default value is \code{0.95}.
\item \code{std_error}: for \code{type} equal to \code{"conf_int"} or \code{"pred_int"}, add
the standard error of fit or prediction (on the scale of the
linear predictors). Default value is \code{FALSE}.
\item \code{quantile}: for \code{type} equal to \code{quantile}, the quantiles of the
distribution. Default is \code{(1:9)/10}.
\item \code{eval_time}: for \code{type} equal to \code{"survival"} or \code{"hazard"}, the
time points at which the survival probability or hazard is estimated.
}}
}
\value{
With the exception of \code{type = "raw"}, the result of
\code{predict.model_fit()}
\itemize{
\item is a tibble
\item has as many rows as there are rows in \code{new_data}
\item has standardized column names, see below:
}
For \code{type = "numeric"}, the tibble has a \code{.pred} column for a single
outcome and \code{.pred_Yname} columns for a multivariate outcome.
For \code{type = "class"}, the tibble has a \code{.pred_class} column.
For \code{type = "prob"}, the tibble has \code{.pred_classlevel} columns.
For \code{type = "conf_int"} and \code{type = "pred_int"}, the tibble has
\code{.pred_lower} and \code{.pred_upper} columns with an attribute for
the confidence level. In the case where intervals can be
produces for class probabilities (or other non-scalar outputs),
the columns are named \code{.pred_lower_classlevel} and so on.
For \code{type = "quantile"}, the tibble has a \code{.pred} column, which is
a list-column. Each list element contains a tibble with columns
\code{.pred} and \code{.quantile} (and perhaps other columns).
For \code{type = "time"}, the tibble has a \code{.pred_time} column.
For \code{type = "survival"}, the tibble has a \code{.pred} column, which is
a list-column. Each list element contains a tibble with columns
\code{.eval_time} and \code{.pred_survival} (and perhaps other columns).
For \code{type = "hazard"}, the tibble has a \code{.pred} column, which is
a list-column. Each list element contains a tibble with columns
\code{.eval_time} and \code{.pred_hazard} (and perhaps other columns).
Using \code{type = "raw"} with \code{predict.model_fit()} will return
the unadulterated results of the prediction function.
In the case of Spark-based models, since table columns cannot
contain dots, the same convention is used except 1) no dots
appear in names and 2) vectors are never returned but
type-specific prediction functions.
When the model fit failed and the error was captured, the
\code{predict()} function will return the same structure as above but
filled with missing values. This does not currently work for
multivariate models.
}
\description{
Apply a model to create different types of predictions.
\code{predict()} can be used for all types of models and uses the
"type" argument for more specificity.
}
\details{
For \code{type = NULL}, \code{predict()} uses
\itemize{
\item \code{type = "numeric"} for regression models,
\item \code{type = "class"} for classification, and
\item \code{type = "time"} for censored regression.
}
\subsection{Interval predictions}{
When using \code{type = "conf_int"} and \code{type = "pred_int"}, the options
\code{level} and \code{std_error} can be used. The latter is a logical for an
extra column of standard error values (if available).
}
\subsection{Censored regression predictions}{
For censored regression, a numeric vector for \code{eval_time} is required when
survival or hazard probabilities are requested. The time values are required
to be unique, finite, non-missing, and non-negative. The \code{predict()}
functions will adjust the values to fit this specification by removing
offending points (with a warning).
\code{predict.model_fit()} does not require the outcome to be present. For
performance metrics on the predicted survival probability, inverse probability
of censoring weights (IPCW) are required (see the \code{tidymodels.org} reference
below). Those require the outcome and are thus not returned by \code{predict()}.
They can be added via \code{\link[=augment.model_fit]{augment.model_fit()}} if \code{new_data} contains a column
with the outcome as a \code{Surv} object.
Also, when \code{type = "linear_pred"}, censored regression models will by default
be formatted such that the linear predictor \emph{increases} with time. This may
have the opposite sign as what the underlying model's \code{predict()} method
produces. Set \code{increasing = FALSE} to suppress this behavior.
}
}
\examples{
\dontshow{if (!parsnip:::is_cran_check()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
library(dplyr)
lm_model <-
linear_reg() \%>\%
set_engine("lm") \%>\%
fit(mpg ~ ., data = mtcars \%>\% dplyr::slice(11:32))
pred_cars <-
mtcars \%>\%
dplyr::slice(1:10) \%>\%
dplyr::select(-mpg)
predict(lm_model, pred_cars)
predict(
lm_model,
pred_cars,
type = "conf_int",
level = 0.90
)
predict(
lm_model,
pred_cars,
type = "raw",
opts = list(type = "terms")
)
\dontshow{\}) # examplesIf}
}
\references{
\url{https://www.tidymodels.org/learn/statistics/survival-metrics/}
}
\keyword{internal}