Singular Value Decomposition and its various interpretations and applications (with interactive examples)

Published: 23 Mar 2024

💡 This is a long read, but don't miss the interactive examples I have sprinkled throughout that might help you with learning!

I love working on side projects. I always have to be coding and learning something, even if my work does not require me to do that anymore. However, what if the side project this time does not have to involve the act of creating, but it's purely about learning? I decided to choose a learning topic that is practical in both my work as a software engineer and game developer - "Linear Algebra".

I use Linear Algebra pretty frequently. I use basic vectors and matrix operations for game physics, camera projection and other game dev related work. But they often don't go beyond high-school level math. What if I want to create intelligent systems involving recommendation engines, 3D reconstruction, face recognition, image compression, large language models and...? It looks like there is more to learn about Linear Algebra!

I decided to approach my learning in a non-linear fashion. I decided to start with the topic of Singular Value Decomposition (SVD) first, and then returning to revise on the "intermediate" concepts (for e.g., "ranks", "subspaces", "linear independence", "orthogonality", "eigenvectors") as and when I need them. The result? I became much more intimately familiar with Linear Algebra as compared to when I first learnt it in college, because now I have a better idea about their applications and what they are leading towards!

This article is written in the same way I revised on Linear Algebra, by starting with Singular Value Decomposition first. I will add side-notes along the way that fill in the gaps in understanding important "intermediate concepts", as well as interactive examples to solidify our main concepts. Let's see if this learning method works for you!

Also, this article only serves as an introduction to this big and important topic in Linear Algebra. I hope to use this article as a starting point for more complex topics later on.

Introducing SVD using linear transformations

Most of the time when I am about to learn a new technical topic, I would visit Wikiepdia for an introduction first. And so... here is what Wikipedia has to say about Singular Value Decomposition.

📖 "In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix into a rotation, followed by a rescaling, followed by another rotation." - Wikipedia

In other words, we only need 3 matrices to form any complex matrix. If we visualize this complex matrix as a linear transformation, we can see it as:

A=R2SR1

To put it simply but not 100% accurately, a "linear transformation" happens when we transform a vector into another vector by multiplying it with a matrix. For example, let us transform a vector (x, y, z) by multiplying it with a matrix.

[ 200 020 000 ] [ x y z ] = [ 2x 2y 0 ]

What we see here is that we have a new vector that is both stretched and has lost it's z-dimension. All vectors multiplied by this matrix will be "transformed" to the same effect.

To put it more accurately, a matrix multiplication actually "transforms" the space that the vector resides in. In this example, we are transforming the original 3D "vector space" to become 2D instead, and scaled 2 times along the x and y axes.

To quote Prof Gilbert Strang:

"Matrices act. They don't just sit there."

Note that in the example that is coming up, when we "scale" and "rotate" a shape, we don't exactly change the shape directly. It is more akin to us changing / transforming the space that the shape sits on, in order to change the shape.

Let us see this in action - step through the interactive visualization to transform the black square to fit the skewed grey shape with just the rotation > scale > rotation operations.

As illustrated above, we rotated the shape by R1 = 45°, then we scaled it by S = (1.5, 0.1, 1.0), then we rotated it again by R2 = 45°. This is represented by the following expression:

A=R2SR1 =[ cos(45°) -sin(45°) sin(45°) cos(45°) ] [ 1.5 0 0 0.1 ] [ cos(45°) -sin(45°) sin(45°) cos(45°) ]

Let us rewrite this as the key expression that is commonly used to describe the SVD of a matrix:

A=UΣVT =[ u1 u2 ... un ] [ σ1 σ2 ... σn ] [ v1T v2T ... vnT ]

When we compare the expression UΣVT with the expression R2SR1, where U=R2, VT=R1, and Σ=S, we see that:

Matrix transposition is the swapping of the rows and columns of the matrix.

A= [ abc def ] and AT= [ ad be cf ]

This is helpful because of the rules behind certain matrix operations such as multiplication. For example, if we want to do a dot product between 2 vectors A and B, we can present it as:

ATB= [ x y z ] [ x' y' z' ] = (x.x')+ (y.y')+ (z.z')

A set of vectors are linearly independent if no vectors within the set can be combined to form any vectors in that same set. For example, the following matrix does not contain linearly independent vectors:

A=[ 120 241 360 ]

This is because the second column vector is a multiple of the first column vector. Compare this to the next matrix, which contains linearly independent vectors:

B=[ 100 010 001 ]

Now, let's imagine that each column vector of a matrix represents a potential axis of the space it defines, i.e. looking at B, we have 3 vectors pointing in the traditional x, y, z axes of a 3D vector space. Whereas in A, only the first and last column vectors identifies the unique axes of its space, as the second column vector is simply a scaled version of the first column vector.

Why is this important? Because when we multiply by a matrix, we are transforming into the space defined by that matrix. And the number of linearly independent vectors tells us the number of dimensions and the axes of the space we are transforming into.

2 vectors are orthogonal to each other if they are perpendicular; An orthogonal matrix is one which all column vectors are perpendicular to each other.

An orthogonal matrix naturally assumes that all it's column vectors are linearly independent to each other, as the vectors are all perpendicular!

An orthonormal matrix is one which all its vectors are orthogonal while also being unit vectors. Therefore it defines a linear transformation that does not have a scaling factor.

A rotation matrix should be orthonormal because it should purely rotate the subject without changing the volume or scale of the vector space, and hence of the subject.

The determinant of a matrix describes by how much does the matrix change the "scale/area/volume" of the original space. An orthonormal matrix (and hence a rotation matrix) has a determinant of "1".

There are many resources online on how to calculate the determinant of a matrix. Hence we shall not cover it here.

Here's another way to look at this - let v be a vector that we want to transform using the linear transformation A. This is simply shown as:

Av=x

If we split the resulting vector x as a scalar value with a unit vector, we shall get:

Av=σu

Imagine that now instead of just transforming 1 vector, we are going to transform a series of vectors using this expression:

A [ v1 v2 ... vn ] =[ u1 u2 ... un ] [ σ1 σ2 ... σn ]

which can be re-arranged as

AV=UΣ

A=UΣVT

But why does SVD matter? Because it exposes some interesting properties that can be used to solve a variety of problems!

Dimensionality Reduction and Data Compression

Consider this rank 1 matrix

A= [ 1 2 3 4 -1 -2 -3 -4 2 4 6 8 10 20 30 40 ] = [ 1 -1 2 10 ] [ 1 2 3 4 ] =uvT

It is another way to describe the number of dimensions of the space defined by the matrix. We have to use another word (i.e. the word "rank") to describe "dimensions in space", because the number of rows of the matrix also describes "dimensions" in terms of the number of variables in the system

For example, a matrix with 3 rows may describe 3 variables (x,y,z), i.e. 3 dimensions! But the vector space described by the same matrix does not need to be of 3 dimensions. As example, you can have a 3D vector lying on a 2D plane.

A matrix with n number of linearly independent vectors, and hence describing a vector space of n dimensions, will be of "rank n".

A matrix is said to be of "full rank" if the number of linearly independent vectors = smallest dimension of the matrix (either the number of rows or number of columns), and it is said to be "rank deficient" if otherwise.

This matrix is of rank = 1 because it only has 1 linearly independent column vector. By decomposing this bigger matrix to 2 smaller ones u and vT, we represent the same matrix with less data (16 vs just 8 numbers!). This becomes significant when we have a bigger, more complex matrix.

Hence the idea is - when we factorize a complex matrix, we can compress the amount of data required to represent it! In this case, we have achieved lossless compression.

Now imagine we have a higher ranked matrix A. We cannot simply reduce it into 1 pair of row matrix and column matrix. But what we can do is to approximate A by using a sum of ranked 1 matrices like so:

A= σ1u1v1T+ σ2u2v2T+ ... ...+ σnunvnT

= [ u1 u2 ... un ] [ σ1 σ2 ... σn ] [ v1T v2T ... vnT ]

=UΣVT

Again, we arrive at the standard SVD expression.

We see that Σ is a diagonal matrix consisting of scalar values "σ". These are known as singular values. Each of them represents the "scale of influence" of its respective pair of left and right singular vectors u and v (i.e. a "singular vector pair").

In Σ, these singular values σ are arranged in descending order of magnitude. This means that the singular vector pairs earlier in the sequence has a higher influence in approximating the complex matrix A.

This is a helpful property of SVD, as that means we can remove singular vector pairs that has the least influence in approximating A by turning their corresponding singular values σ to 0. This helps us achieve lossy compression with minimal information loss!

In order to solidify this concept further, let us look at this interactive example that uses the same skewed parallelogram we had. Use the slider to play with different values of σ2 and see how it changes the shape.

A= [ u1 u2 ] [ σ1 0 0 σ2 ] [ v1T v2T ]

=[ cos(45°) -sin(45°) sin(45°) cos(45°) ] [ {{s3Sigma[0][0]}} 0.00 0.00 {{s3Sigma[1][1]}} ] [ cos(45°) -sin(45°) sin(45°) cos(45°) ]

=[ {{s3A[0][0]}} {{s3A[0][1]}} {{s3A[1][0]}} {{s3A[0][1]}} ]

We can see that when we adjust the values for σ1 and σ2, we change the scaling factor of their respective dimensions. As the parallelogram has the least variance along the dimension represented by σ2, when σ2=0, we compress the parallelogram into a line that is relatively close to the original shape (i.e. having the least information loss).

The fact that the 2D parallelogram has been turned into a 1D line is also of significance! This shows the idea of dimensionality reduction - by turning σ2=0, we have turned Σ from being of rank = 2 to being of rank = 1, which in turn changes the rank of the linear transformation A from being of rank = 2 to being of rank = 1 as well (i.e. both matrices are now made up of only 1 linearly independent column vector)! This shows that the rank of Σ actually exposes the rank of A. Cool!

Let us now visualize this concept of lossy compression via dimensionality reduction using another example. Adjust the "quality" slider to see how it compresses the image!

σ1={{s4s[0][0]}}
σ2={{s4s[1][1]}}
σ3={{s4s[2][2]}}
σ4={{s4s[3][3]}}
σ5={{s4s[4][4]}}
σ6...σ10=0

In the image matrix, we can tell that out of 10 column vectors, there are only 5 that are linearly independent. This is because the left half of the image is the same as the right half.

Hence Σ and the image matrix are both of rank = 5 (the last 5 singular values σ are 0), i.e. the last 5 singular vector pairs do not tell us any new patterns or information about the image, and so there is no change to the image when we adjust the "quality" slider between 50% - 100%. But the change to the image quality gets increasingly visible as the affected singular value gets bigger. The first 5 singular vector pairs have a higher influence in approximating our image matrix.

Finding hidden correlations and winning the Netflix Prize

NETFLIX
Prize

On 2 October 2006, Netflix held a competition with a grand prize of US $1,000,000 for anyone who is able to beat Netflix's own algorithm for predicting user ratings for films, based on previous user ratings. Many teams that participated realized that SVD is the basis for the winning algorithm.

Not only did dimensionality reduction become a useful feature (given that the teams were given very large data matrices), SVD could also be used to find hidden patterns or correlations between the rows and columns of a data matrix!

💡 This kind of statistical inference from large data sets is one of the methods for machine learning! I believe this is one of the methods used by various online shoppnig and social media platforms for their recommendation engines.

As you might be able to tell based on intuition from the previous example, each singular vector pair describes the relationship of some unique "characteristics" of the image, and the strength of each relationship is shown in its corresponding singular value.

Another way to put this is that in A=UΣVT:

Let us look at how this is reflected mathematically:

Let A=UΣVT, and hence AT=VΣTUT

ATA= VΣTUTUΣVT

ATA= VΣ2VT

(ATA)V= VΣ2

Here ATA is what is known as a covariance matrix (it describes the dot product, and hence the directional relationship) of each column in A with every other columns in A.

Also, if you haven't already noticed, (ATA)V= VΣ2 actually follows the equation for finding eigenvectors and eigenvalues. You should see that the eigenvectors are the column vectors of V, and the eigenvalues are the squares of the singular values in Σ!

This means that each column vector v describes the column-wise characteristics of the data matrix, while the square of each singular value σ2 describes the strength of that characteristic.

An eigenvector v of the matrix A is a vector that does not change direction when multiplied by A. v simply gets scaled by a certain amount λ, which is known as the eigenvalue. In other words:

Av=λv

In Linear Algebra, we often find relationships/patterns/characteristics in data by looking at directional vectors. For example, sometimes we look at whether two data properties are positively or negatively correlated; Sometimes, we also check for "similarity" between 2 sets of data by their dot-product.

Eigenvectors are great because it extracts the directional vectors that are inherently found within the matrix, telling us hidden characteristics / relationships between the data points. This importance of eigenvectors is reflected in the German word eigen, which means "characteristic of".

And conveniently, the corresponding eigenvalues inform us about the strength of the relationships / characteristics.

We can use the same steps to derive that (AAT)U= UΣ2 , and hence U describes the row-wise characteristics of the data matrix.


Returning to the "Netflix Prize", remember all we had in our data matrix are each user's ratings for each film. We do not know anything else about the users or about the films (e.g. genre). But with SVD, we can sniff out the "eigen-concept" that describes the relationship between the user and the film, and the strength of that relationship.

Imagine the following data matrix of user ratings, in which you have not rated for any shows yet. Please give a rating of 1-10 for the shows "Star Wars" and "Twilight", and we would recommend you your next watch, be it "Star Trek" or "Lovesick", even though we do not know anything else about the shows and our users beyond just the ratings they have given.

Go ahead, give it a try!

Users \ Movies Star Wars Star Trek Twilight Harry Potter
FantasyLove {{movieRatings[0][0]}} {{movieRatings[0][1]}} {{movieRatings[0][2]}} {{movieRatings[0][3]}}
LoveSickBoy {{movieRatings[1][0]}} {{movieRatings[1][1]}} {{movieRatings[1][2]}} {{movieRatings[1][3]}}
GeekGal92 {{movieRatings[2][0]}} {{movieRatings[2][1]}} {{movieRatings[2][2]}} {{movieRatings[2][3]}}
You {{movieRatings[3][1]}} {{movieRatings[3][3]}}

Your next recommended watch: {{userRecommendation}}

💡 Note that we can get more accurate results with more data.

The recommendation for your next watch is discovered via SVD. We factorize the data matrix into 3 separate matrices represented by UΣVT. Based on your inputs, we have shown in the tables below the matrices showing the relationships between users, movies and concepts.

Σ - "concept strength" matrix

Concept 1 Concept 2 Concept 3 Concept 4
{{movieS[0]}} 0.0 0.0 0.0
0.0 {{movieS[1]}} 0.0 0.0
0.0 0.0 {{movieS[2]}} 0.0
0.0 0.0 0.0 {{movieS[3]}}

This is the "concept strength" matrix that is represented by Σ. Right now, we do not know what each "concept" might mean, except that it refers to some "property" of the movie (e.g. perhaps "genre")? We can get a better guess when we compare this with the U and V matrices.

For now, let us apply dimensionality reduction by removing concepts that have the least significance (i.e. lowest singular values). In our case, as shown in the table, we shall only keep concepts 1 and 2.

VT - concept to movie matrix

Concept \ Movies Star Wars Star Trek Twilight Harry Potter
Concept 1 {{movieV[0][0]}} {{movieV[1][0]}} {{movieV[2][0]}} {{movieV[3][0]}}
Concept 2 {{movieV[0][1]}} {{movieV[1][1]}} {{movieV[2][1]}} {{movieV[3][1]}}

This is the "concept to movie" matrix that is represented by VT. We can see that Twilight and Harry Potter weighs heavily on Concept 1, while Star Wars and Star Trek weighs heavily on Concept 2.

Note that we do not know what Concept 1 or Concept 2 represents. It could represent genre or popularity, or even both!

U - user to concept matrix

Users \ Concept Concept 1 Concept 2
FantasyLove {{movieU[0][0]}} {{movieU[0][1]}}
LoveSickBoy {{movieU[1][0]}} {{movieU[1][1]}}
GeekGal92 {{movieU[2][0]}} {{movieU[2][1]}}
You {{movieU[3][0]}} {{movieU[3][1]}}

This is the "user to concept" matrix that is represented by U. Interestingly, we can see that FantasyLove and LoveSickBoy share the same taste for movies belonging to Concept 1, whereas GeekGal likes movies belonging to Concept 2.

And you? {{userConcept}}

Solving linear regression with SVD

You have probably learnt about drawing "best-fit lines" (i.e. linear regression) in Math/Science class in Secondary/middle school. This is an important concept as it helps us build a predictive model that describes the relationship between a dependent variable and its independent variables (for e.g., predicting housing prices based on proximity distances and land area, or stock prices from past prices and trading volume, etc.). As you can see, what we are about to cover has applications in Machine Learning.

Finding the best-fit line analytically was easy, as we could simply use human-sight to estimate a line that has the least average distance from every data point. However, it is impossible for a computer to do that (unless you make this into a Computer Vision problem). The computer can only calculate the best-fit line numerically (this is also known as solving the "linear least squares problem"). And as it turns out, we can solve this problem with, you guess it, SVD!

The least-squares solution is an estimated solution of an overdetermined system of linear equations (i.e. the number of equations is more than the number of variables, and so A is "tall"). In this estimated solution, the sum of squared errors is minimized.

Here is an example of SVD being used to solve a linear least squares problem. Click/tap on the graph to add new data points and watch the program find the best-fit line!

How is this done? First, let's get a hint from looking at the points you've just drawn in the interactive example - have you noticed that we are actually drawing a line along the direction with the biggest variance among the data points (i.e. the direction with the biggest spread)?

Taking a look at what we learn from "Dimensionality Reduction", we know that each singular value calculated from SVD shows the variance of points along each principle direction. So, let's form our data matrix (let's call it A) from the data points you drew, and then work towards applying SVD on this data matrix.

A= [ x1 y1 x2 y2 ... ... xn yn ]

Remember that we are able to use SVD to analyze the rotational and scalar components of a data matrix, but it does not account for "translation". So we need to remove the translational component from our data by centering its data points around their mean. This will give us a new data matrix A'.

A'= [ x1-x y1-y x2-x y2-y ... ... xn-x yn-y ]

where x= i = 1 n xi n and y= i = 1 n yi n

For clarity, here is a code snippet:


// Finding the mean data point
let meanX = 0, meanY = 0;
for(let i = 0; i < data.length ; i++) {
    meanX += data[i][0];
    meanY += data[i][1];
}
meanX /= data.length;
meanY /= data.length;

// Building the new data matrix A' where points are centered around the mean data point
const centeredData = [];
for (let i = 0; i < data.length; i++) {
    centeredData.push([data[i][0] - meanX, data[i][1] - meanY]);
}
            

Following that, we decompose this data matrix using SVD to get UΣVT.

A'=UΣVT
= you have not inserted enough points in the interactive example

We want to select the right singular vector in V that has the highest corresponding singular value in Σ (highlighted in red), because it is the direction vector / eigenvector along which the data points has the biggest variance!

If you remember our discussion on finding correlations using SVD, the right singular vectors (column vectors in V) represents the column-wise characteristics in the data matrix, while the left singular vectors (column vectors in U) represents the row-wise characteristics.

We want to look at "column-wise characteristics" because each column in the data matrix represents a unique feature (in our case, "x" or "y"), and we are looking for the feature along which our data points have the biggest variance.

The chosen right singular vector is a direction vector / eigenvector that describes the relationship between the data points with respect to the chosen unique feature. Thus, it informs us of the gradient of the best-fit line.

P.S. The column space of a data matrix is also known as its "feature space".

Now, we can simply define the best-fit line using the equation:

y-y= (vyvx) (x-x>)

The steps we have described is also known as Principle Component Analysis.

Solving linear equations with SVD

Let's look at solving the following system of linear equations:

{ a+7b+3c=0 2a+4b+c=0 4a+8b+6c=0

This can be forumulated in the following form:

Ax=0 [ 1 7 3 2 4 1 4 8 6 ] [ a b c ] = [ 0 0 0 ]

This is equivalent to saying that we are looking for the nullspace of A.

The span of a set of vectors is defined as the collection of all possible linear combinations of those vectors.

The "nullspace" of a matrix is a set of vectors (i.e. a span) that describes all possible linear combinations of those vectors that, when transformed by the matrix, would give the zero vector.

Since A is full-ranked and square, and hence invertible:

x=A-10
x=0

That is easy! We see that our linear system only has a trivial solution, i.e., the zero vector is the only vector in the nullspace of A.

However, what if A is not full-ranked and hence not invertible? Consider the following linear system:

{ a+b+0.25c=0 2a+4b+c=0 4a+8b+2c=0

where A= [ 1 1 0.25 2 4 1 4 8 2 ]

Since A is not full-ranked (the last 2 columns are linearly dependent), A is not invertible. How then do we calculate the nullspace of A?

The answer - we shall use SVD on A, and the nullspace of A shall be the set of right singular vectors that has zero as their singular values!

But how did we arrive at this conclusion? Let's break this down together.

A=UΣVT

A = [ {{ffsNullU[0][0]}} {{ffsNullU[0][1]}} {{ffsNullU[0][2]}} {{ffsNullU[1][0]}} {{ffsNullU[1][1]}} {{ffsNullU[1][2]}} {{ffsNullU[2][0]}} {{ffsNullU[2][1]}} {{ffsNullU[2][2]}} ] [ {{ffsNullQ[0][0]}} {{ffsNullQ[0][1]}} {{ffsNullQ[0][2]}} ] [ {{ffsNullV[0][0]}} {{ffsNullV[1][0]}} {{ffsNullV[2][0]}} {{ffsNullV[0][1]}} {{ffsNullV[1][1]}} {{ffsNullV[2][1]}} {{ffsNullV[0][2]}} {{ffsNullV[1][2]}} {{ffsNullV[2][2]}} ]

A = [ u1 u2 u3 ] [ σ1 σ2 0 ] [ v1T v2T v3T ]

A [ v1 v2 v3 ] = [ u1 u2 u3 ] [ σ1 σ2 0 ]

A [ v1 v2 v3 ] = [ σ1u1 σ2u2 0.u3 ]

Av3=0

We see that the nullspace of A = span { v3 }, because v3 is the only vector in the SVD that can become the zero vector when transformed by A. This shows that the right singular vectors, with corresponding singular values that are 0, form the nullspace of A.

Using the same example matrix A:

Ax=0
UΣVTx=0
ΣVTx=0

[ σ1 σ2 0 ] [ v1T v2T v3T ] x=0

[ σ1v1x σ2v2x 0.v3x ] =0

As σ₁ ≠ 0 and σ₂ ≠ 0, that means dot(v₁, x) = 0 and dot(v₂, x) = 0, which implies that x has to be orthogonal to both v₁ and v₂.

And as we know v₃ is the only other vector that is orthogonal to both v₁ and v₂, that means x = span { v₃ }


Alright, now let us look at another similar problem.

Solve Ax=c, where c0.

We can solve for x using SVD.

Ax=UΣVTx

Ax= [ u1 u2 u3 ] [ σ1 σ2 0 ] [ v1T v2T v3T ]x

Ax= [ u1 u2 u3 ] [ σ1v1.x σ2v2.x 0v3.x ]

Ax=au1+bu2 , where a, b ∈ ℝ

This shows that Ax=span{u1,u2}

And since we knew that

A [ v1 v2 v3 ] = [ σ1u1 σ2u2 0.u3 ]

...which means...

Av1=σ1u1 and

Av2=σ2u2

From this, we can infer that x=span{v1,v2}, (i.e., spanning the right singular vectors where the singular value is not zero) because any linear combination of v1 and v2, when transformed by A, will fall on the plane with its basis described by span{u1,u2}.

The Moore-Penrose Pseudoinverse

Lastly, there is another way we can use SVD to solve linear systems, by deriving what we call the Moore-Penrose Pseudoinverse - A+.

Again, let us consider a linear system of the form:

Ax=b

We can solve for x using the form:

x=A+b =VΣ+UTb

where A+=VΣ+UT is the pseudoinverse.

To calculate the pseudoinverse A+, we can obtain U and V via SVD, and obtain Σ+ by replacing each non-zero singular value σ in Σ with it's reciprocal (i.e. 1/σ).

If it happens that A is invertible, you will find that A-1=A+

However, if A is overdetermined (i.e., height > width), you'll get a least-squares solution, and if A is underdetermined (i.e., width > height) you'll get a minimum-norm solution, both helpful in estimating a solution for x.

When a system of linear equations is underdetermined (i.e., A's width is bigger than it's height, meaning there are more variables than there are equations), we will find that there are infinitely many solutions.

The minimum-norm solution just refers to the solution that has the smallest Euclidean norm, i.e. the solution that is nearest to the origin.

Calculating the SVD of a matrix

After all the talk about various interpretation and application of SVD, we have not even considered how we may calculate the SVD of a matrix! Of course, if you are writing for software, there are already math libraries that do this for any language of your choosing.

We will not be going into the details of the algorithms for computing the SVD of a given matrix. Perhaps that can be a separate blog post of its own.


Thank you for reading! It takes time to create content such as this. If you'd like to support free and open education, consider dropping a tip!