-
Notifications
You must be signed in to change notification settings - Fork 29
Regression
Ondřej Moravčík edited this page Mar 25, 2015
·
15 revisions
A statistical technique for estimating the relationships among variables.
Linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variable) denoted X.
- method: LinearRegressionWithSGD
- model: LinearRegressionModel
- ruby: regression/linear.rb
data = [
LabeledPoint.new(0.0, [0.0]),
LabeledPoint.new(1.0, [1.0]),
LabeledPoint.new(3.0, [2.0]),
LabeledPoint.new(2.0, [3.0])
]
lrm = LinearRegressionWithSGD.train($sc.parallelize(data), initial_weights: [1.0])
lrm.intercept # => 0.0
lrm.weights # => [0.9285714285714286]
lrm.predict([0.0]) < 0.5
# => true
lrm.predict([1.0]) - 1 < 0.5
# => true
lrm.predict(SparseVector.new(1, {0 => 1.0})) - 1 < 0.5
# => true
An alternative regularized version of least squares
- method: LassoWithSGD
- model: LassoModel
- ruby: regression/lasso.rb
data = [
LabeledPoint.new(0.0, [0.0]),
LabeledPoint.new(1.0, [1.0]),
LabeledPoint.new(3.0, [2.0]),
LabeledPoint.new(2.0, [3.0])
]
lrm = LassoWithSGD.train($sc.parallelize(data), initial_weights: [1.0])
lrm.predict([0.0]) - 0 < 0.5
# => true
lrm.predict([1.0]) - 1 < 0.5
# => true
lrm.predict(SparseVector.new(1, {0 => 1.0})) - 1 < 0.5
# => true
For non-linear least-squares problems.
- method: RidgeRegressionWithSGD
- model: RidgeRegressionModel
- ruby: regression/ridge.rb
data = [
LabeledPoint.new(0.0, [0.0]),
LabeledPoint.new(1.0, [1.0]),
LabeledPoint.new(3.0, [2.0]),
LabeledPoint.new(2.0, [3.0])
]
lrm = RidgeRegressionWithSGD.train($sc.parallelize(data), initial_weights: [1.0])
lrm.predict([0.0]) - 0 < 0.5
# => true
lrm.predict([1.0]) - 1 < 0.5
# => true
lrm.predict(SparseVector.new(1, {0 => 1.0})) - 1 < 0.5
# => true