K-Means Clustering Algorithm

Ayse Yaman
2 min readApr 26, 2022
In the image given below K=3

In this article, I’ll mention K-means algorithm is one of the commonly used unsupervised learning algorithms. The algorithm is a method for finding clusters and cluster centers in a set of unlabeled data.

The aim is to ensure that the clusters obtained at the end of the process have maximum similarities within clusters and minimum similarities between clusters. With K-means, it can be said that clustering is correct as long as the similarity within the cluster is large and the similarity between the clusters is small.

N objects are randomly selected to represent the center point or mean of each cluster. The remaining objects are included in the clusters with which they are most similar, taking into account their distance from the mean values of the clusters.

Working of K-Means Algorithm

K: number of clusters needed before starting the algorithm

  1. Determination of cluster centers
  2. Clustering the data outside the center according to their distance
  3. Determination of new centers according to the clustering (or shifting of old centers to the new center)
  4. Repeating steps 2 and 3 until stable state is reached.
In the image given below K=2

The K-means algorithm tries to detect K clusters that will make the squared error the smallest.

Given a set of observations (x1, x2, …, xn)

Goal: Find C={µ1, …,µk }, a set of k cluster centers, that minimize the expression.

Otherwise, k-means clustering aims to partition the n observations into k (≤ n) sets S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares.

image

The k-means algorithm attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups.

K-Means Clustering Algorithm Applications

  • Behavioral segmentation
  • Market segmentation
  • Document Clustering
  • Image segmentation
  • Image compression
  • Customer segmentation

Thank you!

References:

https://hastie.su.domains/Papers/ESLII.pdf

https://en.wikipedia.org/wiki/K-means_clustering

https://aws.amazon.com/tr/blogs/machine-learning/k-means-clustering-with-amazon-sagemaker/

--

--