-
Notifications
You must be signed in to change notification settings - Fork 90
/
Copy pathcubist_rules.Rd
103 lines (84 loc) · 4.15 KB
/
cubist_rules.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cubist_rules.R
\name{cubist_rules}
\alias{cubist_rules}
\title{Cubist rule-based regression models}
\usage{
cubist_rules(
mode = "regression",
committees = NULL,
neighbors = NULL,
max_rules = NULL,
engine = "Cubist"
)
}
\arguments{
\item{mode}{A single character string for the type of model.
The only possible value for this model is "regression".}
\item{committees}{A non-negative integer (no greater than 100) for the number
of members of the ensemble.}
\item{neighbors}{An integer between zero and nine for the number of training
set instances that are used to adjust the model-based prediction.}
\item{max_rules}{The largest number of rules.}
\item{engine}{A single character string specifying what computational engine
to use for fitting.}
}
\description{
\code{cubist_rules()} defines a model that derives simple feature rules from a tree
ensemble and creates regression models within each rule. This function can fit
regression models.
\Sexpr[stage=render,results=rd]{parsnip:::make_engine_list("cubist_rules")}
More information on how \pkg{parsnip} is used for modeling is at
\url{https://www.tidymodels.org/}.
}
\details{
Cubist is a rule-based ensemble regression model. A basic model tree
(Quinlan, 1992) is created that has a separate linear regression model
corresponding for each terminal node. The paths along the model tree are
flattened into rules and these rules are simplified and pruned. The parameter
\code{min_n} is the primary method for controlling the size of each tree while
\code{max_rules} controls the number of rules.
Cubist ensembles are created using \emph{committees}, which are similar to
boosting. After the first model in the committee is created, the second
model uses a modified version of the outcome data based on whether the
previous model under- or over-predicted the outcome. For iteration \emph{m}, the
new outcome \verb{y*} is computed using
\figure{comittees.png}
If a sample is under-predicted on the previous iteration, the outcome is
adjusted so that the next time it is more likely to be over-predicted to
compensate. This adjustment continues for each ensemble iteration. See
Kuhn and Johnson (2013) for details.
After the model is created, there is also an option for a post-hoc
adjustment that uses the training set (Quinlan, 1993). When a new sample is
predicted by the model, it can be modified by its nearest neighbors in the
original training set. For \emph{K} neighbors, the model-based predicted value is
adjusted by the neighbor using:
\figure{adjust.png}
where \code{t} is the training set prediction and \code{w} is a weight that is inverse
to the distance to the neighbor.
This function only defines what \emph{type} of model is being fit. Once an engine
is specified, the \emph{method} to fit the model is also defined. See
\code{\link[=set_engine]{set_engine()}} for more on setting the engine, including how to set engine
arguments.
The model is not trained or fit until the \code{\link[=fit.model_spec]{fit()}} function is used
with the data.
Each of the arguments in this function other than \code{mode} and \code{engine} are
captured as \link[rlang:topic-quosure]{quosures}. To pass values
programmatically, use the \link[rlang:injection-operator]{injection operator} like so:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{value <- 1
cubist_rules(argument = !!value)
}\if{html}{\out{</div>}}
}
\references{
\url{https://www.tidymodels.org}, \href{https://www.tmwr.org/}{\emph{Tidy Modeling with R}}, \href{https://www.tidymodels.org/find/parsnip/}{searchable table of parsnip models}
Quinlan R (1992). "Learning with Continuous Classes." Proceedings
of the 5th Australian Joint Conference On Artificial Intelligence, pp.
343-348.
Quinlan R (1993)."Combining Instance-Based and Model-Based Learning."
Proceedings of the Tenth International Conference on Machine Learning, pp.
236-243.
Kuhn M and Johnson K (2013). \emph{Applied Predictive Modeling}. Springer.
}
\seealso{
\code{\link[Cubist:cubist.default]{Cubist::cubist()}}, \code{\link[Cubist:cubistControl]{Cubist::cubistControl()}}, \Sexpr[stage=render,results=rd]{parsnip:::make_seealso_list("cubist_rules")}
}