K-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. It is popular for cluster analysis in data mining. K-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.
Figure 1. K-means clustering service.
Step1 |
K initial "means" are randomly generated in the data domain. |
Step2 |
K clusters are created by associating each observation with the nearest average. The partition here represents the Voronoi diagram generated by this method. |
Step3 |
The centroid of each of the k clusters becomes the new mean. |
Step4 |
Repeat steps 2 and 3 until convergence is reached. |
Project name | K-means clustering service |
---|---|
Samples requirement | K-means clustering can be performed with either a distance matrix or raw data. |
Timeline | 3-5 days. |
Deliverables | We provide you with raw data and calculation result analysis service. |
Price | Inquiry |
CD ComputaBio provides corresponding analysis services. The goal of cluster analysis is to collect data for classification on a similar basis. Clustering originates from many fields, including mathematics, computer science, statistics, biology, and economics. In different application fields, many clustering technologies have been developed. These technical methods are used to describe data, measure the similarity between different data sources, and classify data sources into different clusters. If you have needs in this regard, please feel free to contact us.