-
Notifications
You must be signed in to change notification settings - Fork 90
/
Copy pathdetails_rule_fit_h2o.Rd
141 lines (120 loc) · 4.99 KB
/
details_rule_fit_h2o.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rule_fit_h2o.R
\name{details_rule_fit_h2o}
\alias{details_rule_fit_h2o}
\title{RuleFit models via h2o}
\description{
\code{\link[h2o:h2o.rulefit]{h2o::h2o.rulefit()}} fits a model that derives simple feature rules from a tree
ensemble and uses the rules as features to a regularized (LASSO) model. \code{\link[agua:h2o_train]{agua::h2o_train_rule()}}
is a wrapper around this function.
}
\details{
For this engine, there are multiple modes: classification and regression
\subsection{Tuning Parameters}{
This model has 3 tuning parameters:
\itemize{
\item \code{trees}: # Trees (type: integer, default: 50L)
\item \code{tree_depth}: Tree Depth (type: integer, default: 3L)
\item \code{penalty}: Amount of Regularization (type: double, default: 0) Note
that \code{penalty} for the h2o engine in `rule_fit()`` corresponds to
the L1 penalty (LASSO).
}
Other engine arguments of interest:
\itemize{
\item \code{algorithm}: The algorithm to use to generate rules. should be one of
“AUTO”, “DRF”, “GBM”, defaults to “AUTO”.
\item \code{min_rule_length}: Minimum length of tree depth, opposite of
\code{tree_dpeth}, defaults to 3.
\item \code{max_num_rules}: The maximum number of rules to return. The default
value of -1 means the number of rules is selected by diminishing
returns in model deviance.
\item \code{model_type}: The type of base learners in the ensemble, should be one
of: “rules_and_linear”, “rules”, “linear”, defaults to
“rules_and_linear”.
}
}
\subsection{Translation from parsnip to the underlying model call (regression)}{
\code{\link[agua:h2o_train]{agua::h2o_train_rule()}} is a wrapper around
\code{\link[h2o:h2o.rulefit]{h2o::h2o.rulefit()}}.
The \strong{agua} extension package is required to fit this model.
\if{html}{\out{<div class="sourceCode r">}}\preformatted{library(rules)
rule_fit(
trees = integer(1),
tree_depth = integer(1),
penalty = numeric(1)
) \%>\%
set_engine("h2o") \%>\%
set_mode("regression") \%>\%
translate()
}\if{html}{\out{</div>}}
\if{html}{\out{<div class="sourceCode">}}\preformatted{## RuleFit Model Specification (regression)
##
## Main Arguments:
## trees = integer(1)
## tree_depth = integer(1)
## penalty = numeric(1)
##
## Computational engine: h2o
##
## Model fit template:
## agua::h2o_train_rule(x = missing_arg(), y = missing_arg(), weights = missing_arg(),
## validation_frame = missing_arg(), rule_generation_ntrees = integer(1),
## max_rule_length = integer(1), lambda = numeric(1))
}\if{html}{\out{</div>}}
}
\subsection{Translation from parsnip to the underlying model call (classification)}{
\code{\link[agua:h2o_train]{agua::h2o_train_rule()}} for \code{rule_fit()} is a
wrapper around \code{\link[h2o:h2o.rulefit]{h2o::h2o.rulefit()}}.
The \strong{agua} extension package is required to fit this model.
\if{html}{\out{<div class="sourceCode r">}}\preformatted{rule_fit(
trees = integer(1),
tree_depth = integer(1),
penalty = numeric(1)
) \%>\%
set_engine("h2o") \%>\%
set_mode("classification") \%>\%
translate()
}\if{html}{\out{</div>}}
\if{html}{\out{<div class="sourceCode">}}\preformatted{## RuleFit Model Specification (classification)
##
## Main Arguments:
## trees = integer(1)
## tree_depth = integer(1)
## penalty = numeric(1)
##
## Computational engine: h2o
##
## Model fit template:
## agua::h2o_train_rule(x = missing_arg(), y = missing_arg(), weights = missing_arg(),
## validation_frame = missing_arg(), rule_generation_ntrees = integer(1),
## max_rule_length = integer(1), lambda = numeric(1))
}\if{html}{\out{</div>}}
}
\subsection{Preprocessing requirements}{
Factor/categorical predictors need to be converted to numeric values
(e.g., dummy or indicator variables) for this engine. When using the
formula method via \code{\link[=fit.model_spec]{fit()}}, parsnip will
convert factor columns to indicators.
}
\subsection{Other details}{
To use the h2o engine with tidymodels, please run \code{h2o::h2o.init()}
first. By default, This connects R to the local h2o server. This needs
to be done in every new R session. You can also connect to a remote h2o
server with an IP address, for more details see
\code{\link[h2o:h2o.init]{h2o::h2o.init()}}.
You can control the number of threads in the thread pool used by h2o
with the \code{nthreads} argument. By default, it uses all CPUs on the host.
This is different from the usual parallel processing mechanism in
tidymodels for tuning, while tidymodels parallelizes over resamples, h2o
parallelizes over hyperparameter combinations for a given resample.
h2o will automatically shut down the local h2o instance started by R
when R is terminated. To manually stop the h2o server, run
\code{h2o::h2o.shutdown()}.
}
\subsection{Saving fitted model objects}{
Models fitted with this engine may require native serialization methods
to be properly saved and/or passed between R sessions. To learn more
about preparing fitted models for serialization, see the bundle package.
}
}
\keyword{internal}