Compressing Images with K-Means Clustering
May 24, 2026 (about 15 hours ago)
The internet is flooded with content every day. As of today, approximately 402 million terabytes of data are created each day[]. That number is almost impossible to picture so imagine, that every second, the world produces equivalent to over 70 years of 4K video.
That is why compression matters - instead of storing every detail exactly as it was captured, we look for ways to represent the same information using less space. Images are one of the media types that can be compressed, and in this post I'll explain one simple method to do so.
Compression by reducing the palette
There are many ways to compress images and they are categorized into two main types - lossless and lossy.
The first one preserves exact original quality but results in smaller file-size reductions. It detects repeating patterns and optimizes the data structures to code the images better. The second one, lossy, allows us to reduce the size significantly by discarding parts of the image while trying to maintain the representation of it.
We will focus on the lossy one that shrinks the color palette of an image, replacing multiple similar colors with a single representative one - a technique called color quantization.

Pixels are just numbers
Images are basically 2D arrays of pixels. Each pixel is one color, and each color is stored as three numbers: red, green, and blue.

Every square on the grid is one sampled pixel. Typical images use RGB, with 8 bits per channel.
R = 8 bits, G = 8 bits, B = 8 bits
So one pixel is
3 × 8 = 24 bits = 3 bytes
That is often called 24-bit color, true color, or sometimes PNG-24. If the image has transparency, we would need to add another 8-bit channel, turning it into PNG-32 with RGBA (RGB-alpha) color space.
If we wanted to save space, we could lower the data needed to represent one pixel. If we used only 256 colors, then every pixel would take 8 bits and store an index into that palette. So instead of pixel = (124, 81, 203), we would store pixel = 17 that translates to 17 -> (124, 81, 203). That format is called PNG-8 or indexed color.
As you can see, now instead of using 3 bytes, we only use 1, thus cutting our raw pixel data to one-third. We could go even further and use a 16-color palette, reducing the original size by a factor of 6, or a 4-color one that would reduce it by a factor of 12. You can do the math!
The K-means algorithm
Before we get our hands dirty, we need to get some knowledge first!
K-means is an unsupervised learning algorithm used for data clustering, which groups unlabeled data points into groups or clusters. It is one of the most popular clustering methods used in machine learning to group data points that are close to each other in attributes into some k-number of clusters.
Let's first ignore images for a second and imagine that we have a set of 2D data points (x, y). The demo below shows the standard K-means loop on those points.
Let's imagine that we have a set of 2D data points. Each point has an x and y value.
From 2D points to colors
The 2D example used points like [x, y], but K-means does not really care what those numbers mean. A color can also be a point, just with three values instead of two: [red, green, blue].
Instead of looking at every pixel at once, imagine we sampled a few colors from the image.
In code, I represent one RGB color as a tuple with three numbers:
type Rgb = [number, number, number];
const pixel: Rgb = [231, 86, 42];
How nearest color is calculated
When assigning a pixel to a centroid, we need to know which centroid is the closest one. For that I used squared Euclidean distance. I skipped the square root because it does not change which distance is the smallest.
function squaredRgbDistance(a: Rgb, b: Rgb) {
return (a[0] - b[0]) ** 2 + (a[1] - b[1]) ** 2 + (a[2] - b[2]) ** 2;
}
Then for each pixel we check every centroid and remember the index of the closest one:
for (let pixelIndex = 0; pixelIndex < pixels.length; pixelIndex++) {
const pixel = pixels[pixelIndex];
let bestCentroidIndex = 0;
let bestDistance = Infinity;
for (let centroidIndex = 0; centroidIndex < centroids.length; centroidIndex++) {
const distance = squaredRgbDistance(pixel, centroids[centroidIndex]);
if (distance < bestDistance) {
bestDistance = distance;
bestCentroidIndex = centroidIndex;
}
}
assignments[pixelIndex] = bestCentroidIndex;
}
Moving centroids
After every pixel has its closest centroid, we need to move each centroid to the average color of all pixels assigned to it. I use running totals for this, because every pixel already knows which centroid it belongs to.
const sums: Rgb[] = Array.from({ length: k }, () => [0, 0, 0]);
const counts: number[] = Array.from({ length: k }, () => 0);
for (let pixelIndex = 0; pixelIndex < pixels.length; pixelIndex++) {
const pixel = pixels[pixelIndex];
const centroidIndex = assignments[pixelIndex];
sums[centroidIndex][0] += pixel[0];
sums[centroidIndex][1] += pixel[1];
sums[centroidIndex][2] += pixel[2];
counts[centroidIndex]++;
}
Applying it to images
As mentioned before, every pixel color in an image is a point [red, green, blue].
So instead of clustering 2D points like [x, y], we cluster colors like [231, 86, 42]. The final centroids become the palette. K-means does not know whether that pixel belongs to an apple, a banana, or a shadow. It only sees values like [231, 86, 42].

Limitations
K-means is simple but it is not perfect.
First, you need to pick k yourself. There is no automatic way to know how many colors an image needs. Too few and it looks posterized. Too many and you barely save any space.
Second, the algorithm starts with random centroids. Run it twice on the same image and you may get slightly different palettes. The demo above picks random pixels from the image itself, which helps, but it is still luck of the draw.
Third, K-means can get stuck in local optima. It greedily improves the clusters step by step, but it cannot backtrack. If the random start was bad, the final palette might miss an important color that was only represented by a few pixels.
Fourth, it treats RGB space as Euclidean geometry. In reality, human eyes do not perceive color differences equally in all directions. Two blues that look almost identical to us might be far apart in RGB distance, while a red-green pair that looks very different might score as closer. For serious compression, people convert to LAB color space first, which is designed to match human vision better.
Fifth, K-means has no idea about spatial structure. It looks at every pixel in isolation. That means two neighboring pixels that are almost the same color might end up assigned to different centroids, creating noise-like artifacts at edges. Algorithms that consider pixel neighborhoods, or add dithering after palette reduction, usually look smoother.
Sixth, some centroids can become empty clusters if no pixel is close enough to them. The implementation above handles this by restarting them from random pixels, but it wastes iterations.
For real-world image compression, formats like PNG-8 and GIF use more sophisticated palette selection (often based on median cut or octree quantization) rather than K-means, partly because those methods are faster and deterministic. K-means is still a great way to understand the core idea though.
Final implementation
Here is the complete algorithm used by the demo. The React component around it only handles loading the image into canvas and rendering the output. This is the actual K-means part.
type Rgb = [number, number, number];
interface KMeansImageInput {
pixels: Rgb[];
k: number;
maxIterations: number;
}
interface KMeansImageResult {
palette: Rgb[];
indexedPixels: number[];
iterations: number;
}
function squaredRgbDistance(a: Rgb, b: Rgb) {
return (a[0] - b[0]) ** 2 + (a[1] - b[1]) ** 2 + (a[2] - b[2]) ** 2;
}
function randomPixel(pixels: Rgb[]) {
return [...pixels[Math.floor(Math.random() * pixels.length)]] as Rgb;
}
function compressImageColors({ pixels, k, maxIterations }: KMeansImageInput): KMeansImageResult {
let totalIters = 0;
const centroids: Rgb[] = [];
const assignments: number[] = new Array(pixels.length).fill(0);
// 1. Initialize centroids from random pixels in the image.
for (let i = 0; i < k; i++) {
centroids[i] = randomPixel(pixels);
}
for (let iters = 0; iters < maxIterations; iters++) {
totalIters++;
const previousCentroids = centroids.map((centroid) => [...centroid] as Rgb);
// 2. Assign each pixel to the nearest centroid.
for (let pixelIndex = 0; pixelIndex < pixels.length; pixelIndex++) {
const pixel = pixels[pixelIndex];
let bestCentroidIndex = 0;
let bestDistance = Infinity;
for (let centroidIndex = 0; centroidIndex < centroids.length; centroidIndex++) {
const centroid = centroids[centroidIndex];
const distance = squaredRgbDistance(pixel, centroid);
if (distance < bestDistance) {
bestDistance = distance;
bestCentroidIndex = centroidIndex;
}
}
assignments[pixelIndex] = bestCentroidIndex;
}
// 3. Move each centroid to the average RGB value of assigned pixels.
const sums: Rgb[] = Array.from({ length: k }, () => [0, 0, 0]);
const counts: number[] = Array.from({ length: k }, () => 0);
for (let pixelIndex = 0; pixelIndex < pixels.length; pixelIndex++) {
const pixel = pixels[pixelIndex];
const centroidIndex = assignments[pixelIndex];
sums[centroidIndex][0] += pixel[0];
sums[centroidIndex][1] += pixel[1];
sums[centroidIndex][2] += pixel[2];
counts[centroidIndex]++;
}
for (let centroidIndex = 0; centroidIndex < k; centroidIndex++) {
if (counts[centroidIndex] === 0) {
centroids[centroidIndex] = randomPixel(pixels);
continue;
}
centroids[centroidIndex] = [
sums[centroidIndex][0] / counts[centroidIndex],
sums[centroidIndex][1] / counts[centroidIndex],
sums[centroidIndex][2] / counts[centroidIndex],
];
}
// 4. Stop early if centroids barely moved.
const movedDistance = centroids.reduce((sum, centroid, index) => {
return sum + squaredRgbDistance(centroid, previousCentroids[index]);
}, 0);
if (movedDistance < 1) {
break;
}
}
return {
palette: centroids.map(([r, g, b]) => [Math.round(r), Math.round(g), Math.round(b)]),
indexedPixels: assignments,
iterations: totalIters,
};
}
Acknowledgements
Photo by Jonas Kakaroto on Unsplash.
The idea for this post was based on the K-means & Image Segmentation - Computerphile video.
References
If this resonated with you, hit that heart! It makes my day