man/predict.model_fit.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/predict.R, R/predict_raw.R
\name{predict.model_fit}
\alias{predict.model_fit}
\alias{predict_raw.model_fit}
\alias{predict_raw}
\title{Model predictions}
\usage{
\method{predict}{model_fit}(object, new_data, type = NULL, opts = list(), ...)

\method{predict_raw}{model_fit}(object, new_data, opts = list(), ...)

predict_raw(object, ...)
}
\arguments{
\item{object}{A \link[=model_fit]{model fit}.}

\item{new_data}{A rectangular data object, such as a data frame.}

\item{type}{A single character value or \code{NULL}. Possible values
are \code{"numeric"}, \code{"class"}, \code{"prob"}, \code{"conf_int"}, \code{"pred_int"},
\code{"quantile"}, \code{"time"}, \code{"hazard"}, \code{"survival"}, or \code{"raw"}. When \code{NULL},
\code{predict()} will choose an appropriate value based on the model's mode.}

\item{opts}{A list of optional arguments to the underlying
predict function that will be used when \code{type = "raw"}. The
list should not include options for the model object or the
new data being predicted.}

\item{...}{Additional \code{parsnip}-related options, depending on the
value of \code{type}. Arguments to the underlying model's prediction
function cannot be passed here (use the \code{opts} argument instead).
Possible arguments are:
\itemize{
\item \code{interval}: for \code{type} equal to \code{"survival"} or \code{"quantile"}, should
interval estimates be added, if available? Options are \code{"none"}
and \code{"confidence"}.
\item \code{level}: for \code{type} equal to \code{"conf_int"}, \code{"pred_int"}, or \code{"survival"},
this is the parameter for the tail area of the intervals
(e.g. confidence level for confidence intervals).
Default value is \code{0.95}.
\item \code{std_error}: for \code{type} equal to \code{"conf_int"} or \code{"pred_int"}, add
the standard error of fit or prediction (on the scale of the
linear predictors). Default value is \code{FALSE}.
\item \code{quantile}: for \code{type} equal to \code{quantile}, the quantiles of the
distribution. Default is \code{(1:9)/10}.
\item \code{eval_time}: for \code{type} equal to \code{"survival"} or \code{"hazard"}, the
time points at which the survival probability or hazard is estimated.
}}
}
\value{
With the exception of \code{type = "raw"}, the result of
\code{predict.model_fit()}
\itemize{
\item is a tibble
\item has as many rows as there are rows in \code{new_data}
\item has standardized column names, see below:
}

For \code{type = "numeric"}, the tibble has a \code{.pred} column for a single
outcome and \code{.pred_Yname} columns for a multivariate outcome.

For \code{type = "class"}, the tibble has a \code{.pred_class} column.

For \code{type = "prob"}, the tibble has \code{.pred_classlevel} columns.

For \code{type = "conf_int"} and \code{type = "pred_int"}, the tibble has
\code{.pred_lower} and \code{.pred_upper} columns with an attribute for
the confidence level. In the case where intervals can be
produces for class probabilities (or other non-scalar outputs),
the columns are named \code{.pred_lower_classlevel} and so on.

For \code{type = "quantile"}, the tibble has a \code{.pred} column, which is
a list-column. Each list element contains a tibble with columns
\code{.pred} and \code{.quantile} (and perhaps other columns).

For \code{type = "time"}, the tibble has a \code{.pred_time} column.

For \code{type = "survival"}, the tibble has a \code{.pred} column, which is
a list-column. Each list element contains a tibble with columns
\code{.eval_time} and \code{.pred_survival} (and perhaps other columns).

For \code{type = "hazard"}, the tibble has a \code{.pred} column, which is
a list-column. Each list element contains a tibble with columns
\code{.eval_time} and \code{.pred_hazard} (and perhaps other columns).

Using \code{type = "raw"} with \code{predict.model_fit()} will return
the unadulterated results of the prediction function.

In the case of Spark-based models, since table columns cannot
contain dots, the same convention is used except 1) no dots
appear in names and 2) vectors are never returned but
type-specific prediction functions.

When the model fit failed and the error was captured, the
\code{predict()} function will return the same structure as above but
filled with missing values. This does not currently work for
multivariate models.
}
\description{
Apply a model to create different types of predictions.
\code{predict()} can be used for all types of models and uses the
"type" argument for more specificity.
}
\details{
For \code{type = NULL}, \code{predict()} uses
\itemize{
\item \code{type = "numeric"} for regression models,
\item \code{type = "class"} for classification, and
\item \code{type = "time"} for censored regression.
}
\subsection{Interval predictions}{

When using \code{type = "conf_int"} and \code{type = "pred_int"}, the options
\code{level} and \code{std_error} can be used. The latter is a logical for an
extra column of standard error values (if available).
}

\subsection{Censored regression predictions}{

For censored regression, a numeric vector for \code{eval_time} is required when
survival or hazard probabilities are requested. The time values are required
to be unique, finite, non-missing, and non-negative. The \code{predict()}
functions will adjust the values to fit this specification by removing
offending points (with a warning).

\code{predict.model_fit()} does not require the outcome to be present. For
performance metrics on the predicted survival probability, inverse probability
of censoring weights (IPCW) are required (see the \code{tidymodels.org} reference
below). Those require the outcome and are thus not returned by \code{predict()}.
They can be added via \code{\link[=augment.model_fit]{augment.model_fit()}} if \code{new_data} contains a column
with the outcome as a \code{Surv} object.

Also, when \code{type = "linear_pred"}, censored regression models will by default
be formatted such that the linear predictor \emph{increases} with time. This may
have the opposite sign as what the underlying model's \code{predict()} method
produces. Set \code{increasing = FALSE} to suppress this behavior.
}
}
\examples{
\dontshow{if (!parsnip:::is_cran_check()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
library(dplyr)

lm_model <-
  linear_reg() \%>\%
  set_engine("lm") \%>\%
  fit(mpg ~ ., data = mtcars \%>\% dplyr::slice(11:32))

pred_cars <-
  mtcars \%>\%
  dplyr::slice(1:10) \%>\%
  dplyr::select(-mpg)

predict(lm_model, pred_cars)

predict(
  lm_model,
  pred_cars,
  type = "conf_int",
  level = 0.90
)

predict(
  lm_model,
  pred_cars,
  type = "raw",
  opts = list(type = "terms")
)
\dontshow{\}) # examplesIf}
}
\references{
\url{https://www.tidymodels.org/learn/statistics/survival-metrics/}
}
\keyword{internal}