-
Notifications
You must be signed in to change notification settings - Fork 29
Clustering
Ondřej Moravčík edited this page Mar 25, 2015
·
4 revisions
Is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters)
K-Means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
- method: KMeans
- model: KMeansModel
- ruby: clustering/kmeans.rb
data = [
DenseVector.new([0.0,0.0]),
DenseVector.new([1.0,1.0]),
DenseVector.new([9.0,8.0]),
DenseVector.new([8.0,9.0])
]
model = KMeans.train($sc.parallelize(data), 2, max_iterations: 10,
runs: 30, initialization_mode: "random")
model.predict([0.0, 0.0]) == model.predict([1.0, 1.0])
# => true
model.predict([8.0, 9.0]) == model.predict([9.0, 8.0])
# => true
- method: GaussianMixture
- model: GaussianMixtureModel
- ruby: clustering/gaussian_mixture.rb
data = [
DenseVector.new([-0.1, -0.05]),
DenseVector.new([-0.01, -0.1]),
DenseVector.new([0.9, 0.8]),
DenseVector.new([0.75, 0.935]),
DenseVector.new([-0.83, -0.68]),
DenseVector.new([-0.91, -0.76])
]
model = GaussianMixture.train($sc.parallelize(data), 3, convergence_tol: 0.0001, max_iterations: 50, seed: 10)
labels = model.predict($sc.parallelize(data)).collect