Compressing images with singular value decomposition (SVD)

Compressing images with singular value decomposition (SVD)

The singular matrix decomposition plays a major role in linear algebra and has a lot of applications, including lossy image compression. In this post we will discuss it in the context of the mentioned image compression with the focus on the intuition behind the algorithm, without going deep into the theory.

SVD and the idea behind it

Any matrix ARm×nA \in \mathbb{R}^{m\times n} describes some linear transformation from Rn\mathbb{R}^n to Rm\mathbb{R}^m by specifying in it’s columns where the basis vectors should land relative to the initial basis. Basically, linear transformations are able to stretch, rotate and flip space and, of course, do any combination of these actions at once. So we might be interested in creating some kind of normal form for linear transformations consisting of only geometrically simple operations, so that we are able to convert any linear transformation to this normal form. This is the key idea behind SVD — it is a normal form that consists of 2 rotations and 1 scaling.

Formally, any matrix ARm×nA \in \mathbb{R}^{m\times n} can be written as A=UΣVA = U \Sigma V^\top

where:

  • URm×mU \in \mathbb{R}^{m\times m} is an orthogonal matrix that rotates space, column vectors of this matrix are often referred to as left singular vectors.
  • ΣRm×n\Sigma \in \mathbb{R}^{m\times n} is a diagonal matrix (all elements that are not on the diagonal are zero, diagonal elements are called singular values), which scales space.
  • VRn×nV^\top \in \mathbb{R}^{n\times n} is an orthogonal matrix, which only rotates space. Row vectors of VV^\top are also known as right singular vectors.
  • Singular values (on the diagonal) of Σ\Sigma are sorted in descending order and are different. We will denote them with σ1>σ2>>σr>0\sigma_1 > \sigma_2 > \dots > \sigma_r > 0 where rr is the amount of non-zero entries on the diagonal (it also follows, that r=rank(Σ)=rank(A)r = \rank(\Sigma) = \rank(A)).

So, any linear transformation AA can be decomposed in a rotation VV^\top, followed by some scaling by Σ\Sigma, followed by another rotation UU.

Visualizing SVD

Consider the following simple shear matrix: A:=(1101)A := \left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \end{array} \right)

We can use the Open Source manim python library to visualize AA as well as its decomposition A=UΣVA = U \Sigma V^\top:

# file: "svd_2d.py"
from manimlib.imports import *


class SVD_2D(LinearTransformationScene):
    CONFIG = {
        "include_background_plane": True,
        "include_foreground_plane": True,
        "foreground_plane_kwargs": {
            "x_radius": FRAME_WIDTH,
            "y_radius": FRAME_HEIGHT,
            "secondary_line_ratio": 0
        },
        "background_plane_kwargs": {
            "color": GREY,
            "secondary_color": DARK_GREY,
            "axes_color": GREY,
            "stroke_width": 2,
        },
        "show_coordinates": True,
        "show_basis_vectors": True,
        "basis_vector_stroke_width": 6,
        "i_hat_color": X_COLOR,
        "j_hat_color": Y_COLOR,
        "leave_ghost_vectors": False
    }

    def construct(self):

        circle = Circle()
        circle.move_to(RIGHT + UP)

        self.add_transformable_mobject(circle)

        text_applying = TextMobject("Applying $ A $ (overall transformation)")
        text_applying.to_edge(UP + LEFT)

        text_applying_inverse = TextMobject("Applying $ A^{-1} $ (resetting)")
        text_applying_inverse.to_edge(UP + LEFT)

        text_applying_vt = TextMobject(r"Applying $ V^\top $ (rotating)")
        text_applying_vt.to_edge(UP + LEFT)

        text_applying_s = TextMobject(r"Applying $ \Sigma $ (stretching)")
        text_applying_s.to_edge(UP + LEFT)

        text_applying_u = TextMobject("Applying $ U $ (rotating)")
        text_applying_u.to_edge(UP + LEFT)
        
        A = np.array([
            [1, 1],
            [0, 1]
        ])

        self.play(Write(text_applying), run_time=0.5)
        self.apply_matrix(A)

        self.wait()
        self.wait(0.5)

        self.leave_ghost_vectors = True

        self.play(ReplacementTransform(text_applying, text_applying_inverse), run_time=0.5)
        self.apply_inverse(A)

        self.wait()
        self.wait(0.5)

        self.leave_ghost_vectors = False

        U, S, VT = np.linalg.svd(A)
        
        self.play(ReplacementTransform(text_applying_inverse, text_applying_vt), run_time=0.5)
        self.apply_matrix(VT)

        self.wait()
        self.wait(0.5)

        self.play(ReplacementTransform(text_applying_vt, text_applying_s), run_time=0.5)
        self.apply_matrix(np.diag(S))

        self.wait()
        self.wait(0.5)

        self.play(ReplacementTransform(text_applying_s, text_applying_u), run_time=0.5)
        self.apply_matrix(U)

        self.wait()
        self.wait(0.5)

In order to compute the SVD decomposition, I used the np.linalg.svd function which returns the tuple with the UU, Σ\Sigma and VV^\top matrices as expected. However, for efficiency reasons numpy returnes Σ\Sigma as a vector (It doesn’t make sense to store zeroes that are off the diagonal). So that’s why I later on use np.diag to convert a vector to a matrix with the vector on the diagonal.

By running manim with the command manim svd_2d.py SVD_2D -m, we get:

As you can see, the geometric properties of the matrices in the decompositions as described above hold (UU and VV^\top rotate space, Σ\Sigma only stretches it).

Compressing images with SVD

Any image can be represented as a matrix of pixels, where each pixel (typically) consists of 3 bytes — for the red, green and blue components of the color, respectively. So, if we want to efficiently store the image, we need to somehow efficiently encode 3 matrices RR, GG and BB for each color component, respectively.

Let’s focus on compressing just one matrix ARm×nA \in \mathbb{R}^{m\times n}. As stated above, applying the SVD decomposition gives us: A=UΣV=(u1un)(σ100000σ200000σr000000000000)(v1vn)A = U \Sigma V^\top = \begin{pmatrix} \vertbar & & \vertbar \\ u_1 & \cdots & u_n\\ \vertbar & & \vertbar \\ \end{pmatrix} \begin{pmatrix} \sigma_1 & 0 & \cdots & 0 & 0 & \cdots & 0 \\ 0 & \sigma_2 & \cdots & 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots & \cdots & \vdots \\ 0 & 0 & \cdots & \sigma_r & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 & 0 &\cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 & 0 & \cdots & 0 \\ \end{pmatrix} \begin{pmatrix} \horzbar & v_1^\top & \horzbar \\ & \vdots & \\ \horzbar & v_n^\top & \horzbar \\ \end{pmatrix}

We can re-write this decomposition in a slightly different way: A=σ1(u1)(v1)++σr(ur)(vr)=i=1rσi(ui)(vi)\begin{aligned} A &= \sigma_1 \begin{pmatrix} \vertbar \\ u_1 \\ \vertbar\\ \end{pmatrix} \begin{pmatrix} \horzbar & v_1^\top & \horzbar \\ \end{pmatrix} + \cdots + \sigma_r \begin{pmatrix} \vertbar \\ u_r \\ \vertbar\\ \end{pmatrix} \begin{pmatrix} \horzbar & v_r^\top & \horzbar \\ \end{pmatrix} \\ &= \sum_{i=1}^r{ \sigma_i \begin{pmatrix} \vertbar \\ u_i \\ \vertbar\\ \end{pmatrix} \begin{pmatrix} \horzbar & v_i^\top & \horzbar \\ \end{pmatrix} } \end{aligned}

With this notation, we can think of SVD from a slightly different perspective — SVD allows us to take an arbitrary matrix and write it down as a sum of rank-1 matrices.

If we take a random matrix AA, it is very unlikely that it will not be full-rank. So for a random image of width ww and height hh, the matrix will almost certainly have rank min{w,h}\min\{w,h\}. We can approximate AA with the first k<rk < r summands: AA:=i=1kσi(ui)(vi)A \approx A' := \sum_{i=1}^k{ \sigma_i \begin{pmatrix} \vertbar \\ u_i \\ \vertbar\\ \end{pmatrix} \begin{pmatrix} \horzbar & v_i^\top & \horzbar \\ \end{pmatrix} }

This approximation is good, because σ1>>σr>0\sigma_1 > \dots > \sigma_r > 0 — first singular values contribute more than the following. By the way, the Eckhart-Young theorem states that such an approximation is indeed the best possible one, so, restricted to the model in which we approximate a matrix with a sum of rank-1 matrices, we get the best possible compression algorithm.

With this approximation idea we are now ready to implement image compression. Let’s start with some image — for example with this photo I recently took in Kaiserslautern, Germany:

Photograph in Kaiserslautern, Germany

If the image is saved as img.jpg, we can read it with:

import numpy as np

import matplotlib
import matplotlib.image as image

A = image.imread("img.jpg")

We can extract the 3 color component matrices as briefly mentioned above as follows:

R = A[:,:,0] / 0xff
G = A[:,:,1] / 0xff
B = A[:,:,2] / 0xff

Now, we compute the SVD decomposition:

R_U, R_S, R_VT = np.linalg.svd(R)
G_U, G_S, G_VT = np.linalg.svd(G)
B_U, B_S, B_VT = np.linalg.svd(B)

Now we need to determine kk, with which we will approximate the 3 matrices. There are different ways of doing it — I will just calculate the maximum rank of the compressed image, divided by the maximum possible rank of the initial matrix (which is the width of the image) and call this ratio the relative rank.

relative_rank = 0.2
max_rank = int(relative_rank * min(R.shape[0], R.shape[1]))
print("max rank = %d" % max_rank)  # 144

Now, we can implement a function that will only use the first kk columns of the UU matrix and the first kk rows of the VV^\top matrix, as well as σ1,,σk\sigma_1, \dots, \sigma_k. By doing this, we essentially emulate what a decompressing algorithm would do.

def read_as_compressed(U, S, VT, k):
    A = np.zeros((U.shape[0], VT.shape[1]))
    for i in range(k):
        U_i = U[:,[i]]
        VT_i = np.array([VT[i]])
        A += S[i] * (U_i @ VT_i)
    return A

Actually, it is easier and more efficient to perform the same operation with a lower-rank matrix multiplication.

def read_as_compressed(U, S, VT, k):
    return (U[:,:k] @ np.diag(S[:k])) @ VT[:k]

So we can compute the color matrices of the approximated image with:

R_compressed = read_as_compressed(R_U, R_S, R_VT, max_rank)
G_compressed = read_as_compressed(G_U, G_S, G_VT, max_rank)
B_compressed = read_as_compressed(B_U, B_S, B_VT, max_rank)

compressed_float = np.dstack((R_compressed, G_compressed, B_compressed))
compressed = (np.minimum(compressed_float, 1.0) * 0xff).astype(np.uint8)

With matplotlib, we can also easily plot both the original as well as the compressed image by simply calling:

import matplotlib.pyplot as plt

plt.figure(figsize = (16, 9))
plt.imshow(A)

plt.figure(figsize = (16, 9))
plt.imshow(compressed)

image.imsave("compressed.png", compressed)

By approximating the image with a matrix of maximum rank equal to 20%20\% of the maximum rank of the original image, we get:

Compressed image

We can also run the algorithm for other relative rank values:

Relative rank = 100%100\%, Rank: k=720=r=n<mk = 720 = r = n < m and we get the original image:

Photograph in Kaiserslautern, Germany

Relative rank = 90%90\%, Rank: k=648k = 648:

Compressed image

Relative rank = 80%80\%, Rank: k=576k = 576:

Compressed image

Relative rank = 70%70\%, Rank: k=503k = 503:

Compressed image

Relative rank = 60%60\%, Rank: k=432k = 432:

Compressed image

Relative rank = 50%50\%, Rank: k=360k = 360:

Compressed image

Relative rank = 40%40\%, Rank: k=288k = 288:

Compressed image

Relative rank = 30%30\%, Rank: k=216k = 216:

Compressed image

Relative rank = 20%20\%, Rank: k=144k = 144:

Compressed image

Relative rank = 10%10\%, Rank: k=72k = 72:

Compressed image

Relative rank = 5%5\%, Rank: k=36k = 36:

Compressed image

Relative rank = 2%2\%, Rank: k=14k = 14:

Compressed image

Relative rank = 1%1\%, Rank: k=7k = 7:

Compressed image

Relative rank = 0.4%0.4\%, Rank: k=2k = 2:

Compressed image

Relative rank = 0.2%0.2\%, Rank: k=1k = 1:

Compressed image

Relative rank = 0%0\%, Rank: k=0k = 0:

In this case we don’t approximate at all and thus get the zero-matrix which means all pixels are black:

Compressed image


Copyright © 2019 — 2023 Alexander Mayorov. All rights reserved.