to notebooks

This commit is contained in:
Pawel Sarkowicz
2026-03-30 18:45:32 -04:00
parent 767673eb7a
commit 9093ea2c13
38 changed files with 41415 additions and 1975 deletions
File diff suppressed because one or more lines are too long
+914
View File
@@ -0,0 +1,914 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# QR Decompositions\n",
"\n",
"QR decompositions are a powerful tool in linear algebra and data science for several reasons. They provide a way to decompose a matrix into an orthogonal matrix $Q$ aand an upper triangular matrix $R$, which can simplify many computations and analyses.\n",
"\n",
"> **Theorem**: Let $A$ is an $m \\times n$ matrix with linearly independent columns ($m \\geq n$ in this case), then $A$ can be decomposed as $A = QR$ where $Q$ is an $m \\times n$ matrix whose columns form an orthonormal basis for Col($A$) and $R$ is an $n \\times n$ upper-triangular invertible matrix with positive entries on the diagonal.\n",
"\n",
"In the literature, sometimes the QR decomposition is phrased as follows: any $m \\times n$ matrix $A$ can also be written as $A = QR$ where $Q$ is an $m \\times m$ orthogonal matrix ($Q^T = Q^{-1}$), and $R$ is an $m \\times n$ upper-triangular matrix. One follows from the other by playing around with some matrix equations. Indeed, suppose that $A = Q_1R_1$ is a decomposition as above (that is, $Q_1$ is $m \\times n$ and $R_1$ is $n \\times n$). Use can use the Gram-Schmidt procedure to extend the columns of $Q_1$ to an orthonormal basis for all of $\\mathbb{R}^m$, and put the remaining vectors in a $(m - n) \\times n$ matrix $Q_2$. Then\n",
"\n",
"$$ A = Q_1R_1 = \\begin{bmatrix} Q_1 & Q_2 \\end{bmatrix}\\begin{bmatrix} R_1 \\\\ 0 \\end{bmatrix}. $$\n",
"\n",
"The left matrix is an $m \\times m$ orthogonal matrix and the right matrix is $m \\times n$ upper triangular. Moreover, the decomposition provides orthonormal bases for both the column space of $A$ and the perp of the column space of $A$; $Q_1$ will consist of an orthonormal basis for the column space of $A$ and $Q_2$ will consist of an orthonormal basis for the perp of the column space of $A$. \n",
"\n",
"However, we will often want to use the decomposition when $Q$ is $m \\times n$, $R$ is $n \\times n$, and the columns of $Q$ form an orthonormal basis for the column space of $A$. For example, the python function `numpy.linalg.qr` give QR decompositions this way (again, assuming that the columns of $A$ are linearly independent, so $m \\geq n$).\n",
"\n",
"> **Key take-away**. The QR decomposition provides an orthonormal basis for the column space of $A$. If $A$ has rank $k$, then the first $k$ columns of $Q$ will form a basis for the column space of $A$. \n",
"\n",
"For small matrices, one can find $Q$ and $R$ by hand, assuming that $A = [ a_1\\ \\cdots\\ a_n ]$ has full column rank. Let $e_1,\\dots,e_n$ be the unnormalized vectors we get when we apply Gram-Schmidt to $c_1,\\dots,c_n$, and let $u_1,\\dots,u_n$ be their normalizations. Let\n",
"$$ r_j = \\begin{bmatrix} \\langle e_1,c_j \\rangle \\\\ \\vdots \\\\ \\langle e_n, c_j \\rangle \\end{bmatrix}, $$\n",
"and note that $\\langle e_i,c_j \\rangle = 0$ whenever $i > j$. Thus\n",
"$$ Q = \\begin{bmatrix} u_1 & \\cdots & u_n \\end{bmatrix} \\text{ and } R = \\begin{bmatrix} r_1 & \\cdots & r_n \\end{bmatrix} $$\n",
"give rise to a $A = QR$, where the columns of $Q$ form an orthonormal basis for $\\text{Col}(A)$ and $R$ is upper-triangular. We can also compute $R$ directly from $Q$ and $Q$. Indeed, note that $Q^TQ = I$, so\n",
"$$ Q^TA = Q^T(QR) = IR = R. $$\n",
"\n",
"> **Example**. Find a QR decomposition for the matrix\n",
"> $$ A = \\begin{bmatrix} 1 & 1 & 1 \\\\ 0 & 1 & 1 \\\\ 0 & 0 & 1 \\\\ 0 & 0 & 0 \\end{bmatrix}. $$\n",
"> Note that one trivially see (or by applying the Gram-Schmidt procedure) that\n",
"> $$ \\begin{bmatrix} 1 \\\\ 0 \\\\ 0 \\\\ 0 \\end{bmatrix}, \\begin{bmatrix} 0 \\\\ 1 \\\\ 0 \\\\ 0 \\end{bmatrix}, \\begin{bmatrix} 0 \\\\ 0 \\\\ 1 \\\\ 0 \\end{bmatrix} $$\n",
"> forms an orthonormal basis for the column space of $A$. So with\n",
"> $$ Q = \\begin{bmatrix} 1 & 0 & 0 \\\\ 0 & 1 & 0 \\\\ 0 & 0 & 1 \\\\ 0 & 0 & 0 \\end{bmatrix} \\text{ and }R = \\begin{bmatrix} 1 & 1 & 1\\\\ 0 & 1 & 1 \\\\ 0 & 0 & 1 \\end{bmatrix}, $$\n",
"> we have $A = QR$.\n",
"\n",
"Let's do a more involved example.\n",
"> **Example**. Consider the matrix\n",
"> $$ A = \\begin{bmatrix} 1 & 0 & 0 \\\\ 1 & 1 & 0 \\\\ 1 & 1 & 1 \\\\ 1 & 1 & 1 \\end{bmatrix}. $$\n",
"> One can apply the Gram-Schmidt procedure to the columns of $A$ to find that\n",
"> $$ \\begin{bmatrix} 1 \\\\ 1 \\\\ 1 \\\\ 1 \\end{bmatrix}, \\begin{bmatrix} -3 \\\\ 1 \\\\ 1 \\\\ 1 \\end{bmatrix}, \\begin{bmatrix} 0 \\\\ -\\frac{2}{3} \\\\ \\frac{1}{3} \\\\ \\frac{1}{3}\\end{bmatrix} $$\n",
"> forms an orthogonal basis for the column space of $A$. Normalizing, we get that\n",
"> $$ Q = \\begin{bmatrix} \\frac{1}{2} & -\\frac{3}{\\sqrt{12}} & 0 \\\\ \\frac{1}{2} & \\frac{1}{\\sqrt{12}} & -\\frac{2}{\\sqrt{6}} \\\\ \\frac{1}{2} & \\frac{1}{\\sqrt{12}} & \\frac{1}{\\sqrt{6}} \\\\ \\frac{1}{2} & \\frac{1}{\\sqrt{12}} & \\frac{1}{\\sqrt{6}} \\end{bmatrix} $$\n",
"> is an appropriate $Q$. Thus\n",
"> $$ \\begin{split} R = Q^TA &= \\begin{bmatrix} \\frac{1}{2} & \\frac{1}{2} & \\frac{1}{2} & \\frac{1}{2} \\\\ -\\frac{3}{\\sqrt{12}} & \\frac{1}{\\sqrt{12}} & \\frac{1}{\\sqrt{12}} & \\frac{1}{\\sqrt{12}} \\\\ 0 & -\\frac{2}{\\sqrt{6}} & \\frac{1}{\\sqrt{6}} & \\frac{1}{\\sqrt{6}} \\end{bmatrix}\\begin{bmatrix} 1 & 0 & 0 \\\\ 1 & 1 & 0 \\\\ 1 & 1 & 1 \\\\ 1 & 1 & 1 \\end{bmatrix} \\\\ &= \\begin{bmatrix} 2 & \\frac{3}{2} & 1 \\\\ 0 & \\frac{3}{\\sqrt{12}} & \\frac{2}{\\sqrt{12}} \\\\ 0 & 0 & \\frac{2}{\\sqrt{6}} \\end{bmatrix}. \\end{split} $$\n",
"> So all together,\n",
"> $$A = \\begin{bmatrix} \\frac{1}{2} & -\\frac{3}{\\sqrt{12}} & 0 \\\\ \\frac{1}{2} & \\frac{1}{\\sqrt{12}} & -\\frac{2}{\\sqrt{6}} \\\\ \\frac{1}{2} & \\frac{1}{\\sqrt{12}} & \\frac{1}{\\sqrt{6}} \\\\ \\frac{1}{2} & \\frac{1}{\\sqrt{12}} & \\frac{1}{\\sqrt{6}} \\end{bmatrix}\\begin{bmatrix} 2 & \\frac{3}{2} & 1 \\\\ 0 & \\frac{3}{\\sqrt{12}} & \\frac{2}{\\sqrt{12}} \\\\ 0 & 0 & \\frac{2}{\\sqrt{6}} \\end{bmatrix}. $$\n",
"\n",
"To do this numerically, we can use `numpy.linalg.qr`.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# Define our matrices\n",
"A = np.array([[1,1,1],[0,1,1],[0,0,1],[0,0,0]])\n",
"B = np.array([[1,0,0],[1,1,0],[1,1,1],[1,1,1]])\n",
"\n",
"# Take QR decompositions\n",
"QA, RA = np.linalg.qr(A)\n",
"QB, RB = np.linalg.qr(B)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Our resulting matrices are:\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b48e6d97",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"QA = [[ 1. 0. 0.]\n",
" [-0. 1. 0.]\n",
" [-0. -0. 1.]\n",
" [-0. -0. -0.]]\n",
"\n",
"RA = [[1. 1. 1.]\n",
" [0. 1. 1.]\n",
" [0. 0. 1.]]\n",
"\n",
"QB = [[-0.5 0.8660254 0. ]\n",
" [-0.5 -0.28867513 0.81649658]\n",
" [-0.5 -0.28867513 -0.40824829]\n",
" [-0.5 -0.28867513 -0.40824829]]\n",
"\n",
"RB = [[-2. -1.5 -1. ]\n",
" [ 0. -0.8660254 -0.57735027]\n",
" [ 0. 0. -0.81649658]]\n"
]
}
],
"source": [
"print(f\"QA = {QA}\\n\")\n",
"print(f\"RA = {RA}\\n\")\n",
"print(f\"QB = {QB}\\n\")\n",
"print(f\"RB = {RB}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to use QR decompositions\n",
"\n",
"One of the primary uses of QR decompositions is to solve least squares problems, as introduced above. Assuming that $A$ has full column rank, we can write $A = QR$ as a QR decomposition, and then we can find a least-squares solution to $Ax = b$ by solving the upper-triangular system.\n",
"\n",
"> **Theorem**. Let $A$ be an $m \\times n$ matrix with full column rank, and let $A = QR$ be a QR factorization of $A$. Then, for each $b \\in \\mathbb{R}^m$, the equation $Ax = b$ has a unique least-squares solution, arising from the system\n",
"> $$ Rx = Q^Tb. $$\n",
"\n",
"Normal equations can be *ill-conditioned*, i.e., small errors in calculating $A^TA$ give large errors when trying to solve the least-squares problem. When $A$ has full column rank, a QR factorization will allow one to compute a solution to the least-squares problem more reliably. \n",
"\n",
"> **Example**. Let\n",
"> $$ A = \\begin{bmatrix} 1 & 0 & 0 \\\\ 1 & 1 & 0 \\\\ 1 & 1 & 1 \\\\ 1 & 1 & 1 \\end{bmatrix} \\text{ and } b = \\begin{bmatrix} 1 \\\\ 1 \\\\ 1 \\\\ 0 \\end{bmatrix}. $$\n",
"> We can find the least-squares solution $Ax = b$ by using the QR decomposition. Let us use the QR decomposition from above, and solve the system\n",
"> $$ Rx = Q^Tb. $$\n",
"> As\n",
"> $$ \\begin{bmatrix} \\frac{1}{2} & -\\frac{3}{\\sqrt{12}} & 0 \\\\ \\frac{1}{2} & \\frac{1}{\\sqrt{12}} & -\\frac{2}{\\sqrt{6}} \\\\ \\frac{1}{2} & \\frac{1}{\\sqrt{12}} & \\frac{1}{\\sqrt{6}} \\\\ \\frac{1}{2} & \\frac{1}{\\sqrt{12}} & \\frac{1}{\\sqrt{6}} \\end{bmatrix}^T\\begin{bmatrix} 1 \\\\ 1 \\\\ 1 \\\\ 0 \\end{bmatrix} = \\begin{bmatrix} \\frac{3}{2} \\\\ -\\frac{1}{2\\sqrt{3}} \\\\ -\\frac{1}{\\sqrt{6}}, \\end{bmatrix} $$\n",
"> we are looking at the system\n",
"> $$ \\begin{bmatrix} 2 & \\frac{3}{2} & 1 \\\\ 0 & \\frac{3}{\\sqrt{12}} & \\frac{2}{\\sqrt{12}} \\\\ 0 & 0 & \\frac{2}{\\sqrt{6}} \\end{bmatrix}x =\\begin{bmatrix} \\frac{3}{2} \\\\ -\\frac{1}{2\\sqrt{3}} \\\\ -\\frac{1}{\\sqrt{6}} \\end{bmatrix}. $$\n",
"> Solving this system yields that\n",
"> $$ x_0 = \\begin{bmatrix} 1 \\\\ 0 \\\\ -\\frac{1}{2} \\end{bmatrix} $$\n",
"> is a least-squares solution to $Ax = b$.\n",
"\n",
"Let us set this system up in python and use `numpy.linalg.solve`. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# Define matrix and vector\n",
"A = np.array([[1,0,0],[1,1,0],[1,1,1],[1,1,1]])\n",
"b = np.array([[1],[1],[1],[0]])\n",
"\n",
"# Take the QR decomposition of A\n",
"Q, R = np.linalg.qr(A)\n",
"\n",
"# Solve the linear system Rx = Q.T b\n",
"beta = np.linalg.solve(R,Q.T @ b)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"This yields\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "3f71de8a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1.00000000e+00],\n",
" [ 6.40987562e-17],\n",
" [-5.00000000e-01]])"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"beta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"which (basically) agrees with our exact least-squares solution.\n",
"Note that `numpy.linalg.lstsq` still gives a **ever so slightly** different result. \n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "dcda7f8d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1.00000000e+00],\n",
" [ 2.22044605e-16],\n",
" [-5.00000000e-01]])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.linalg.lstsq(A,b)[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"---\n",
"\n",
"Let's go back to the house example. While we're at it, let's get used to using pandas to make a dataframe. \n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# First let us make a dictionary incorporating our data.\n",
"# Each entry corresponds to a column (feature of our data)\n",
"data = {\n",
" 'Square ft': [1600, 2100, 1550, 1600, 2000],\n",
" 'Bedrooms': [3, 4, 2, 3, 4],\n",
" 'Price': [500, 650, 475, 490, 620]\n",
"}\n",
"\n",
"# Create a pandas DataFrame\n",
"df = pd.DataFrame(data)\n",
"\n",
"# Create our matrix X and our target y\n",
"X = df[[\"Square ft\", \"Bedrooms\"]].to_numpy()\n",
"y = df[[\"Price\"]].to_numpy()\n",
"\n",
"# Augment X with a column of 1's (intercept)\n",
"X_aug = np.hstack((np.ones((X.shape[0], 1)), X))\n",
"\n",
"# Perform QR decomposition\n",
"Q, R = np.linalg.qr(X_aug)\n",
"\n",
"# Solve the upper triangular system Rx = Q^Ty\n",
"beta = np.linalg.solve(R, Q.T @ y)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Let's look at the output.\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "3d1e5bab",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Q = [[-0.4472136 0.32838365 0.40496317]\n",
" [-0.4472136 -0.63745061 -0.22042299]\n",
" [-0.4472136 0.42496708 -0.7689174 ]\n",
" [-0.4472136 0.32838365 0.40496317]\n",
" [-0.4472136 -0.44428376 0.17941406]] \n",
"\n",
"R = [[-2.23606798e+00 -3.95784032e+03 -7.15541753e+00]\n",
" [ 0.00000000e+00 -5.17687164e+02 -1.50670145e+00]\n",
" [ 0.00000000e+00 0.00000000e+00 7.27908474e-01]] \n",
"\n",
"beta = [[-3.05053797e-13]\n",
" [ 3.00000000e-01]\n",
" [ 5.00000000e+00]]\n"
]
}
],
"source": [
"print(f\"Q = {Q} \\n\\nR = {R} \\n\\nbeta = {beta}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"As we can see, the least-squares solution agrees with what we got by hand and by other python methods (if we agree that the tiny first component is essentially zero).\n",
"\n",
"---\n",
"\n",
"The QR decomposition of a matrix is also useful for computing orthogonal projections.\n",
"> **Theorem**. Let $A$ be an $m \\times n$ matrix with full column rank. If $A = QR$ is a QR decomposition, then $QQ^T$ is the projection onto the column space of $A$, i.e., $QQ^Tb = \\text{Proj}_{\\text{Col}(A)}b$ for all $b \\in \\mathbb{R}^m$.\n",
"\n",
"Let's see what our range projections are for the matrices above. Note that the first example above will have the orthogonal projection just being\n",
"$$ \\begin{bmatrix} 1 \\\\ & 1 \\\\ & & 1\\\\ & & & 0 \\end{bmatrix}. $$\n",
"Let's look at the other matrix. \n",
"\n",
"> **Example**. Working with the matrix\n",
"> $$ A = \\begin{bmatrix} 1 & 0 & 0 \\\\ 1 & 1 & 0 \\\\ 1 & 1 & 1 \\\\ 1 & 1 & 1 \\end{bmatrix}, $$\n",
"> the projection onto the column space if given by\n",
"> $$ QQ^T = \\begin{bmatrix} 1 \\\\ & 1 \\\\ & & \\frac{1}{2} & \\frac{1}{2} \\\\ & & \\frac{1}{2} & \\frac{1}{2} \\end{bmatrix}. $$\n",
"> This is a well-understood projection: it is the direct sum of the identity on $\\mathbb{R}^2$ and the projection onto the line $y = x$ in $\\mathbb{R}^2$.\n",
"\n",
"Now let's use python to implement the projection.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# Create our matrix A\n",
"A = np.array([[1,0,0],[1,1,0],[1,1,1],[1,1,1]])\n",
"\n",
"# Take the QR decomposition\n",
"Q, R = np.linalg.qr(A)\n",
"\n",
"# Create the range projection\n",
"P = Q @ Q.T\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "5bfd7362",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1.00000000e+00, 2.89687929e-17, 2.89687929e-17, 2.89687929e-17],\n",
" [2.89687929e-17, 1.00000000e+00, 7.07349921e-17, 7.07349921e-17],\n",
" [2.89687929e-17, 7.07349921e-17, 5.00000000e-01, 5.00000000e-01],\n",
" [2.89687929e-17, 7.07349921e-17, 5.00000000e-01, 5.00000000e-01]])"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"P"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"The output gives\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d26b49a6",
"metadata": {},
"outputs": [],
"source": [
"array([[1.00000000e+00, 2.89687929e-17, 2.89687929e-17, 2.89687929e-17],\n",
" [2.89687929e-17, 1.00000000e+00, 7.07349921e-17, 7.07349921e-17],\n",
" [2.89687929e-17, 7.07349921e-17, 5.00000000e-01, 5.00000000e-01],\n",
" [2.89687929e-17, 7.07349921e-17, 5.00000000e-01, 5.00000000e-01]])\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"As we can see, the two off-diagonal blocks are all tiny, hence we treat them as zero. Note that if they were not actually zero, then this wouldn't actually be a projection. This can cause some problems.\n",
"\n",
"Let's write a function to implement this, assuming that columns of A are linearly independent. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"def proj_onto_col_space(A):\n",
"\t# Take the QR decomposition\n",
"\tQ,R = np.linalg.qr(A)\n",
"\t# The projection is just Q @ Q.T\n",
"\tP = Q @ Q.T\n",
"\n",
"\treturn P\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"We'll come back to this later. We should really be incorporating some sort of error tolerance so that things are **super super** tiny can actually just be sent to zero. \n",
"\n",
"> **Remark**. Another way to get the projection onto the column space of an $n \\times p$ matrix $A$ of full column rank is to take\n",
"> $$ P = A(A^TA)^{-1}A^T. $$\n",
"> Indeed, let $b \\in \\mathbb{R}^n$ and let $x_0 \\in \\mathbb{R}^p$ be a solution to the normal equations\n",
"> $$ A^TAx_0 = A^Tb. $$\n",
"> Then $x_0 = (A^TA)^{-1}A^Tb$ and so $Ax_0 = A(A^TA^{-1})A^Tb$ is the (unique!) vector in the column space of $A$ which is closest to $b$, i.e., the projection of $b$ onto the column space of $A$.\n",
"> However, taking transposes, multiplying, and inverting is not what we would like to do numerically. "
]
},
{
"cell_type": "markdown",
"id": "4ae21758",
"metadata": {},
"source": [
"# Singular Value Decomposition\n",
"\n",
"The SVD is a very important matrix decomposition in both data science and linear algebra.\n",
"\n",
"> **Theorem**. For any matrix $n \\times p$ matrix $X$, there exist an orthogonal $n \\times n$ matrix $U$, an orthogonal $p \\times p$ matrix $V$, and a diagonal $n \\times p$ matrix $\\Sigma$ with non-negative entries such that\n",
"> $$ X = U\\Sigma V^T. $$\n",
"> - The columns of $U$ are left **left singular vectors**.\n",
"> - The columns of $V$ are the **right singular vectors**.\n",
"> - $\\Sigma$ has **singular values** $\\sigma_1 \\geq \\sigma_2 \\geq \\cdots \\geq \\sigma_r > 0$ on its diagonal, where $r$ is the rank of $X$.\n",
"\n",
"> **Remark**. The SVD is clearly a generalization of matrix diagonalization, but it also generalizes the **polar decomposition** of a matrix. Recall that every $n \\times n$ matrix $A$ can be written as $A = UP$ where $U$ is orthogonal (or unitary) and $P$ is a positive matrix. This is because if\n",
"> $$ A = U_0\\Sigma V^T $$\n",
"> is the SVD for $A$, then $\\Sigma$ is an $n \\times n$ diagonal matrix with non-negative entries, hence any orthogonal conjugate of it is positive, and so\n",
"> $$ A = (U_0V^T)(V\\Sigma V^T). $$\n",
"> Take $U = U_0V^T$ and $P = V\\Sigma V^T$. \n",
"\n",
"By hand, the algorithm for computing an SVD is as follows.\n",
"1. Both $AA^T$ and $A^TA$ are symmetric (they are positive in fact), and so they can be orthogonally diagonalized; one can form an orthogonal basis of eigenvectors. Let $v_1,\\dots,v_p$ be an orthonormal basis of eigenvectors for $\\mathbb{R}^p$ which correspond to eigenvectors of $A^TA$ in decreasing order. Suppose that $A^TA$ has $r$ non-zero eigenvalues. Let $V$ be the matrix whose columns contain the $v_i$'s. This gives our right singular vectors and our singular values. \n",
"2. Let $u_i = \\frac{1}{\\sigma_i}Av_i$ for $i = 1,\\dots,r$, and extend this collection of vectors to an orthonormal basis for $\\mathbb{R}^n$ if necessary. Let $U$ be the corresponding matrix.\n",
"3. Let $\\Sigma$ be the $n \\times p$ matrix whose diagonal entries are $\\sigma_1 \\geq \\sigma_2 \\geq \\cdots \\geq \\sigma_r$, and then zeroes if necessary. \n",
"\n",
"> **Example**. Let us compute the SVD of\n",
"> $$ A = \\begin{bmatrix} 3 & 2 & 2 \\\\ 2 & 3 & -2 \\end{bmatrix}. $$\n",
"> First we note that\n",
"> $$ A^TA = \\begin{bmatrix} 13 & 12 & 2 \\\\ 12 & 13 & -2 \\\\ 2 & -2 & 8 \\end{bmatrix}, $$\n",
"> which has eigenvalues $25,9,0$ with corresponding eigenvectors\n",
"> $$ \\begin{bmatrix} 1 \\\\ 1 \\\\ 0 \\end{bmatrix}, \\begin{bmatrix} 1 \\\\ -1 \\\\ 4 \\end{bmatrix}, \\begin{bmatrix} -2 \\\\ 2 \\\\ 1 \\end{bmatrix}. $$\n",
"> Normalizing, we get\n",
"> $$ V = \\begin{bmatrix} \\frac{1}{\\sqrt{2}} & \\frac{1}{3\\sqrt{2}} & -\\frac{2}{3} \\\\ \\frac{1}{\\sqrt{2}} & -\\frac{1}{3\\sqrt{2}} & \\frac{2}{3} \\\\ 0 & \\frac{4}{3\\sqrt{2}} & \\frac{1}{3} \\end{bmatrix}. $$\n",
"> Now we set $u_1 = \\frac{1}{5}Av_1$ and $u_2 = \\frac{1}{3}Av_2$ to get\n",
"> $$ U = \\begin{bmatrix} \\frac{1}{\\sqrt{2}} & \\frac{1}{\\sqrt{2}} \\\\ \\frac{1}{\\sqrt{2}} & -\\frac{1}{\\sqrt{2}} \\end{bmatrix}. $$\n",
"> So\n",
"> $$ A = \\begin{bmatrix} \\frac{1}{\\sqrt{2}} & \\frac{1}{\\sqrt{2}} \\\\ \\frac{1}{\\sqrt{2}} & -\\frac{1}{\\sqrt{2}} \\end{bmatrix} \\begin{bmatrix} 5 & 0 & 0 \\\\ 0 & 3 & 0 \\end{bmatrix} \\begin{bmatrix} \\frac{1}{\\sqrt{2}} & \\frac{1}{3\\sqrt{2}} & -\\frac{2}{3} \\\\ \\frac{1}{\\sqrt{2}} & -\\frac{1}{3\\sqrt{2}} & \\frac{2}{3} \\\\ 0 & \\frac{4}{3\\sqrt{2}} & \\frac{1}{3} \\end{bmatrix}^T $$\n",
"> is our SVD decomposition.\n",
"\n",
"We note that in practice, we avoid the computation of $X^TX$ because if the entries of $X$ have errors, then these errors will be squared in $X^TX$. There are better computational tools to get singular values and singular vectors which are more accurate. This is what our python tools will use. \n",
"\n",
"Let's use `numpy.linalg.svd` for the above matrix."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"#Define our matrix\n",
"A = np.array([[3,2,2],[2,3,-2]])\n",
"\n",
"# Take the SVD\n",
"U, S, Vh = np.linalg.svd(A)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Our SVD matrices are\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "5336313f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"U = [[-0.70710678 -0.70710678]\n",
" [-0.70710678 0.70710678]]\n",
"\n",
"S = [5. 3.]\n",
"\n",
"Vh.T = [[-7.07106781e-01 -2.35702260e-01 -6.66666667e-01]\n",
" [-7.07106781e-01 2.35702260e-01 6.66666667e-01]\n",
" [-6.47932334e-17 -9.42809042e-01 3.33333333e-01]]\n"
]
}
],
"source": [
"print(f\"U = {U}\\n\\nS = {S}\\n\\nVh.T = {Vh.T}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"Because the eigenvalues of the hermitian squares of\n",
"$$ \\begin{bmatrix} 1 & 1 & 1\\\\ 0 & 1 & 1 \\\\ 0 & 0 & 1 \\\\ 0 & 0 & 0 \\end{bmatrix} \\text{ and } \\begin{bmatrix} 1 & 0 & 0 \\\\ 1 & 1 & 0 \\\\ 1 & 1 & 1 \\\\ 1 & 1 & 1 \\end{bmatrix} $$\n",
"are quite atrocious, an exact SVD decomposition is difficult to compute by hand. However, we can of course use python.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# Define our matrices\n",
"A = np.array([[1,1,1],[0,1,1],[0,0,1],[0,0,0]])\n",
"B = np.array([[1,0,0],[1,1,0],[1,1,1],[1,1,1]])\n",
"\n",
"# SVD decomposition\n",
"U_A, S_A, Vh_A = np.linalg.svd(A)\n",
"U_B, S_B, Vh_B = np.linalg.svd(B)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"The resulting matrices are\n"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "a13a3391",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"U_A = [[ 0.73697623 0.59100905 0.32798528 0. ]\n",
" [ 0.59100905 -0.32798528 -0.73697623 0. ]\n",
" [ 0.32798528 -0.73697623 0.59100905 0. ]\n",
" [ 0. 0. 0. 1. ]]\n",
"\n",
"S_A = [2.2469796 0.80193774 0.55495813]\n",
"\n",
"Vh_A.T = [[ 0.32798528 0.73697623 0.59100905]\n",
" [ 0.59100905 0.32798528 -0.73697623]\n",
" [ 0.73697623 -0.59100905 0.32798528]]\n",
"\n",
"U_B = [[-2.41816250e-01 7.12015746e-01 -6.59210496e-01 0.00000000e+00]\n",
" [-4.52990541e-01 5.17957311e-01 7.25616837e-01 6.71536163e-17]\n",
" [-6.06763739e-01 -3.35226641e-01 -1.39502200e-01 -7.07106781e-01]\n",
" [-6.06763739e-01 -3.35226641e-01 -1.39502200e-01 7.07106781e-01]]\n",
"\n",
"S_B = [2.8092118 0.88646771 0.56789441]\n",
"\n",
"Vh_B.T = [[-0.67931306 0.63117897 -0.37436195]\n",
" [-0.59323331 -0.17202654 0.7864357 ]\n",
" [-0.43198148 -0.75632002 -0.49129626]]\n"
]
}
],
"source": [
"print(f\"U_A = {U_A}\\n\\nS_A = {S_A}\\n\\nVh_A.T = {Vh_A.T}\\n\\nU_B = {U_B}\\n\\nS_B = {S_B}\\n\\nVh_B.T = {Vh_B.T}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another final note is that the **operator norm** of a matrix $A$ agrees with its largest singular value. "
]
},
{
"cell_type": "markdown",
"id": "7c8396ff",
"metadata": {},
"source": [
"## Pseudoinverses and using the SVD\n",
"The SVD can be used to determine a least-squares solution for a given system. Recall that if $v_1,\\dots,v_p$ is an orthonormal basis for $\\mathbb{R}^p$ consisting of eigenvectors of $A^TA$, arranged so that they correspond to eigenvalues $\\sigma_1 \\geq \\sigma_2 \\geq \\cdots \\geq \\sigma_r$, then $\\{Av_1,\\dots,Av_r\\}$ is an orthogonal basis for the column space of $A$. In essence, this means that when we have our left singular vectors $u_1,\\dots,u_n$ (constructed based on our algorithm as above), we have that the first $r$ vectors form an orthonormal basis for the column space of $A$, and that the remaining $n - r$ vectors form an orthonormal basis for the perp of the column space of $A$ (which is also equal to the nullspace of $A^T$). \n",
"\n",
"> **Definition**. Let $A$ be an $n \\times p$ matrix and suppose that the rank of $A$ is $r \\leq \\min\\{n,p\\}$. Suppose that $A = U\\Sigma V^T$ is the SVD, where the singular values are decreasing. Partition\n",
"> $$ U = \\begin{bmatrix} U_r & U_{n-r} \\end{bmatrix} \\text{ and } V = \\begin{bmatrix} V_r & V_{p-r} \\end{bmatrix} $$\n",
"> into submatrices, where $U_r$ and $V_r$ are the matrices whose columns are the first $r$ columns of $U$ and $V$ respectively. So $U_r$ is $n \\times r$ and $V_r$ is $p \\times r$. Let $D$ be the diagonal $r \\times r$ matrices whose diagonal entries are $\\sigma_1,\\dots, \\sigma_r$, so that\n",
">$$ \\Sigma = \\begin{bmatrix} D & 0 \\\\ 0 & 0 \\end{bmatrix} $$\n",
"> and note that\n",
"> $$ A = U_rDV_r^T. $$\n",
"> We call this the reduced singular value decomposition of $A$. Note that $D$ is invertible, and its inverse is simply\n",
"> $$ D = \\begin{bmatrix} \\sigma_1^{-1} \\\\ & \\sigma_2^{-1} \\\\ & & \\ddots \\\\ & & & \\sigma_r^{-1} \\end{bmatrix}. $$\n",
"> The **pseudoinverse** (or **Moore-Penrose inverse**) of $A$ is the matrix\n",
"> $$ A^+ = V_rD^{-1}U_r^T. $$\n",
"\n",
"We note that the pseudoinverse $A^+$ is a $p \\times n$ matrix. \n",
"\n",
"With the pseudoinverse, we can actually find least-squares solutions quite easily. Indeed, if we are looking for the least-squares solution to the system $Ax = b$, define\n",
"$$ x_0 = A^+b. $$\n",
"Then \n",
"$$ \\begin{split} Ax_0 &= (U_rDV_r^T)(V_rD^{-1}U_r^Tb) \\\\ &= U_rDD^{-1}U_r^Tb \\\\ &= U_rU_r^Tb \\end{split} $$\n",
"As mentioned before, the columns of $U_r$ form an orthonormal basis for the column space of $A$ and so $U_rU_r^T$ is the orthogonal projection onto the range of $A$. That is, $Ax_0$ is precisely the projection of $b$ onto the column space of $A$, meaning that this yields a least-squares solution. This gives the following.\n",
"\n",
"> **Theorem**. Let $A$ be an $n \\times p$ matrix and $b \\in \\mathbb{R}^n$. Then\n",
"> $$ x_0 = A^+b$$\n",
"> is a least-squares solution to $Ax = b$. \n",
"\n",
"Taking pseudoinverses is quite involved. We'll do one example by hand, and then use python -- and we'll see something go wrong! There is a function `numpy.linalg.pinv` in numpy that will take a pseudoinverse. We can also just use `numpy.linalg.svd` and do the process above.\n",
"\n",
"> **Example**. We have the following SVD $A = U\\Sigma V^T$. \n",
"> $$ \\begin{bmatrix} 1 & 1 & 2\\\\ 0 & 1 & 1 \\\\ 1 & 0 & 1 \\\\ 0 & 0 & 0 \\end{bmatrix} = \\begin{bmatrix} \\sqrt{\\frac{2}{3}} & 0 & 0 & -\\frac{1}{\\sqrt{3}} \\\\ \\frac{1}{\\sqrt{6}} & \\frac{1}{\\sqrt{2}} & 0 & \\frac{1}{\\sqrt{3}} \\\\ \\frac{1}{\\sqrt{6}} & -\\frac{1}{\\sqrt{2}} & 0 & \\frac{1}{\\sqrt{3}} \\\\ 0 & 0 & 1 & 0 \\end{bmatrix} \\begin{bmatrix} 3 & 0 & 0 \\\\ 0 & 1 & 0 \\\\ 0 & 0 & 0 \\\\ 0 & 0 & 0 \\end{bmatrix}\\begin{bmatrix} \\frac{1}{\\sqrt{6}} & -\\frac{1}{\\sqrt{2}} & -\\frac{1}{\\sqrt{3}} \\\\ \\frac{1}{\\sqrt{6}} & \\frac{1}{\\sqrt{2}} & -\\frac{1}{\\sqrt{3}} \\\\ \\sqrt{\\frac{2}{3}} & 0 & \\frac{1}{\\sqrt{3}} \\end{bmatrix}^T. $$\n",
"> Can we find a least-squares solution to $Ax = b$, where\n",
"> $$ b = \\begin{bmatrix} 1 \\\\ 1 \\\\ 1 \\\\ 1 \\end{bmatrix}? $$\n",
"> The pseudoinverse of $A$ is\n",
"> $$ \\begin{split} A^+ &= \\begin{bmatrix} \\frac{1}{\\sqrt{6}} & -\\frac{1}{\\sqrt{2}} \\\\ \\frac{1}{\\sqrt{6}} & \\frac{1}{\\sqrt{2}} \\\\ \\sqrt{\\frac{2}{3}} & 0 \\end{bmatrix} \\begin{bmatrix} 3 \\\\ & 1 \\end{bmatrix} \\begin{bmatrix} \\sqrt{\\frac{2}{3}} & 0 \\\\ \\frac{1}{\\sqrt{6}} & \\frac{1}{\\sqrt{2}} \\\\ \\frac{1}{\\sqrt{6}} & -\\frac{1}{\\sqrt{2}} \\\\ 0 & 0 \\end{bmatrix}^T \\\\ &= \\begin{bmatrix} \\frac{1}{9} & -\\frac{4}{9} & \\frac{5}{9} & 0 \\\\ \\frac{1}{9} & \\frac{5}{9} & -\\frac{4}{9} & 0 \\\\ \\frac{2}{9} & \\frac{1}{9} & \\frac{1}{9} & 0\\end{bmatrix}, \\end{split} $$\n",
"> and so a least-squares solution is given by\n",
"> $$ \\begin{split} x_0 &= A^+b \\\\ &= \\begin{bmatrix} \\frac{1}{9} & -\\frac{4}{9} & \\frac{5}{9} & 0 \\\\ \\frac{1}{9} & \\frac{5}{9} & -\\frac{4}{9} & 0 \\\\ \\frac{2}{9} & \\frac{1}{9} & \\frac{1}{9} & 0\\end{bmatrix}\\begin{bmatrix} 1 \\\\ 1 \\\\ 1 \\\\ 1 \\end{bmatrix} \\\\ &= \\begin{bmatrix} \\frac{2}{9} \\\\ \\frac{2}{9} \\\\ \\frac{4}{9} \\end{bmatrix}. \\end{split} $$\n",
"\n",
"Now let's do this with python, and see an example of how things can go wrong. We'll try to take the pseudoinverse manually first.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# Create our matrix A and our target b\n",
"A = np.array([[1,1,2],[0,1,1],[1,0,1],[0,0,0]])\n",
"b = np.array([[1],[1],[1],[1]])\n",
"\n",
"# Take the SVD decomposition\n",
"U, S, Vh = np.linalg.svd(A)\n",
"\n",
"# Prepare the pseudoinverse\n",
"# Recall that we invert the non-zero diagonal entries of the diagonal matrix.\n",
"# So we first build S_inv to be the appropriate size\n",
"S_inv = np.zeros((Vh.shape[0], U.shape[0])) \n",
"# We then fill in the appropriate values on the diagonal\n",
"S_inv[:len(S), :len(S)] = np.diag(1/S)\n",
"\n",
"# Build the pseudoinverse\n",
"A_pinv = Vh.T @ S_inv @ U.T\n",
"\n",
"# Compute the least-squares solution\n",
"beta = A_pinv @ b\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"What is the result?\t\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "862ed810",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 2.74080345e+15],\n",
" [ 2.74080345e+15],\n",
" [-2.74080345e+15]])"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"beta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"This is **WAY** off the mark. So what happened? Well, when we look at our singular values, we have\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "2d3df55d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3.00000000e+00, 1.00000000e+00, 1.21618839e-16])"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"S"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"As we got this matrix numerically, the last entry is actually non-zero, but tiny. This isn't exactly what's going on since we know that the rank of A is 2. So when we invert the singular values and throw them on the diagonal, have `1/1.21618839e-16` which is a very large value. This value then messes up the rest of the computation. So how do we fix this? One can set tolerances in numpy, but we'll get to that later. Let's just note that `numpy.linalg.pinv` will already incorporate this. Let's see what we get.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# Create our matrix A and our target b\n",
"A = np.array([[1,1,2],[0,1,1],[1,0,1],[0,0,0]])\n",
"b = np.array([[1],[1],[1],[1]])\n",
"\n",
"# Build the pseudoinverse\n",
"A_pinv = np.linalg.pinv(A)\n",
"\n",
"# Compute the least-squares solution\n",
"beta = A_pinv @ b\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "2657ea4b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"A_pinv=[[ 0.11111111 -0.44444444 0.55555556 0. ]\n",
" [ 0.11111111 0.55555556 -0.44444444 0. ]\n",
" [ 0.22222222 0.11111111 0.11111111 0. ]]\n",
"\n",
"beta=[[0.22222222]\n",
" [0.22222222]\n",
" [0.44444444]]\n"
]
}
],
"source": [
"print(f\"A_pinv={A_pinv}\\n\\nbeta={beta}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Condition Number\n",
"Numerical calculations involving matrix equations are quite reliable if we use the SVD. This is because the orthogonal matrices $U$ and $V$ preserve lengths and angles, leaving the stability of the problem to be governed by the singular values of the matrix $X$. Recall that if $X = U\\Sigma V^T$, then solving the least-squares problem involves dividing by the non-zero singular values $\\sigma_i$ of $X$. If these values are very small, their inverses become very large, and this will amplify any numerical errors.\n",
"\n",
"> **Definition**. Let $X$ be an $n \\times p$ matrix and let $\\sigma_1 \\geq \\cdots \\geq \\sigma_r$ be the non-zero singular values of $X$. The **condition number** of $X$ is the quotient\n",
"> $$ \\kappa(X) = \\frac{\\sigma_1}{\\sigma_r} $$\n",
"> of the largest and smallest non-zero singular values.\n",
"\n",
"A condition number close to 1 indicates a well-conditioned problem, while a large condition number indicates that small perturbations in data may lead to large changes in computation. Geometrically, $\\kappa(X)$ measures how much $X$ distorts space. \n",
"\n",
"> **Example**. Consider the matrices\n",
"> $$ A = \\begin{bmatrix} 1 \\\\ & 1 \\end{bmatrix} \\text{ and } B = \\begin{bmatrix} 1 \\\\ & \\frac{1}{10^6} \\end{bmatrix}. $$\n",
"> The condition numbers are\n",
"> $$ \\kappa(A) = 1 \\text{ and } \\kappa(B) = 10^6. $$\n",
"> Inverting $X_2$ includes dividing by $\\frac{1}{10^6}$, which will amplify errors by $10^6$.\n",
"\n",
"Let's look our main example in python by using `numpy.linalg.cond`. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# First let us make a dictionary incorporating our data.\n",
"# Each entry corresponds to a column (feature of our data)\n",
"data = {\n",
" 'Square ft': [1600, 2100, 1550, 1600, 2000],\n",
" 'Bedrooms': [3, 4, 2, 3, 4],\n",
" 'Price': [500, 650, 475, 490, 620]\n",
"}\n",
"\n",
"# Create a pandas DataFrame\n",
"df = pd.DataFrame(data)\n",
"\n",
"# Create out matrix X\n",
"X = df[['Square ft', 'Bedrooms']].to_numpy()\n",
"\n",
"# Check the condition number\n",
"cond_X = np.linalg.cond(X)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Let's see what we got.\n"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "8aa6bca9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(4329.082589067693)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cond_X"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"so this is quite a high condition number! This should be unsurprising, as clearly the number of bedrooms is correlated to the size of a house (especially so in our small toy example). "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
File diff suppressed because one or more lines are too long
+751
View File
@@ -0,0 +1,751 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Principal Component Analysis\n",
"\n",
"Principal Component Analysis (PCA) addresses the issues of multicollinearity and dimensionality mentioned at the end of the previous section by transforming the data into a new coordinate system. The new axes -- called principal components -- are chosen to capture the maximum variance in the data. In linear algebra terms, we are finding a subspace of potentially smaller dimension that best approximates our data.\n",
"\n",
"> **Example**: Let us return to our house example. Suppose we decide to list the square footage in both square feet and square meters. Let's add this feature to our dataset.\n",
"> |House | Square ft | Square m | Bedrooms | Price (in $1000s) |\n",
"> | --- | --- | --- | --- | --- |\n",
"> | 0 | 1600 | 148 | 3 | 500 |\n",
"> | 1 | 2100 | 195 | 4 | 650 |\n",
"> | 2 | 1550 | 144 | 2 | 475 |\n",
"> | 3 | 1600 | 148 | 3 | 490 |\n",
"> | 4 | 2000 | 185 | 4 | 620 |\n",
"> \n",
"> In this case, our associated matrix is:\n",
"> $$ X = \\begin{bmatrix} 1600 & 148 & 3 & 500 \\\\ 2100 & 195 & 4 & 650 \\\\ 1550 & 144 & 2 & 475 \\\\ 1600 & 148 & 3 & 490 \\\\ 2000 & 185 & 4 & 620 \\end{bmatrix} $$\n",
"\n",
"There are a few problems with the above data and the associated matrix $X$ (this time, we're not looking to make predictions, so we don't omit the last column).\n",
"- **Redundancy**: Square feet and square meters give the same information. It's just a matter of if you're from a civilized country or from an uncivilized country.\n",
"- **Numerical instability**: The columns of $X$ are nearly linearly dependent. Indeed, the second column is almost a multiple of the first. Moreover, one can make a safe bet that the number of bedrooms increases as the square footage does, so that the first and third columns are correlated.\n",
"- **Interpretation difficulty**: We used the square footage and bedrooms *together* in the previous section to predict the price of a house. However, because of their correlation, this obfuscates the true relationship, say, between the square footage and the price of a house, or the number of bedrooms and the price of a house. \n",
"\n",
"So the question becomes: what do we do about this? We will try to get a smaller matrix (less columns) that contains the same, or a close enough, amount of information. The point is that the data is *effectively* lower-dimensional. \n",
"\n",
"Let's do a little analysis on our dataset before progressing. Let's use `pandas.DataFrame.describe`, `pandas.DataFrame.corr` and `numpy.linalg.cond`. First, let's set up our data.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# First let us make a dictionary incorporating our data.\n",
"# Each entry corresponds to a column (feature of our data)\n",
"data = {\n",
" 'Square ft': [1600, 2100, 1550, 1600, 2000],\n",
"\t'Square m': [148, 195, 144, 148, 185],\n",
" 'Bedrooms': [3, 4, 2, 3, 4],\n",
" 'Price': [500, 650, 475, 490, 620]\n",
"}\n",
"\n",
"# Create a pandas DataFrame\n",
"df = pd.DataFrame(data)\n",
"\n",
"# Create out matrix X\n",
"X = df.to_numpy()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"Now let's see what it has to offer. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8514ed8b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Square ft</th>\n",
" <th>Square m</th>\n",
" <th>Bedrooms</th>\n",
" <th>Price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>5.000000</td>\n",
" <td>5.000000</td>\n",
" <td>5.00000</td>\n",
" <td>5.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>1770.000000</td>\n",
" <td>164.000000</td>\n",
" <td>3.20000</td>\n",
" <td>547.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>258.843582</td>\n",
" <td>24.052027</td>\n",
" <td>0.83666</td>\n",
" <td>81.516869</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1550.000000</td>\n",
" <td>144.000000</td>\n",
" <td>2.00000</td>\n",
" <td>475.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>1600.000000</td>\n",
" <td>148.000000</td>\n",
" <td>3.00000</td>\n",
" <td>490.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1600.000000</td>\n",
" <td>148.000000</td>\n",
" <td>3.00000</td>\n",
" <td>500.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>2000.000000</td>\n",
" <td>185.000000</td>\n",
" <td>4.00000</td>\n",
" <td>620.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>2100.000000</td>\n",
" <td>195.000000</td>\n",
" <td>4.00000</td>\n",
" <td>650.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Square ft Square m Bedrooms Price\n",
"count 5.000000 5.000000 5.00000 5.000000\n",
"mean 1770.000000 164.000000 3.20000 547.000000\n",
"std 258.843582 24.052027 0.83666 81.516869\n",
"min 1550.000000 144.000000 2.00000 475.000000\n",
"25% 1600.000000 148.000000 3.00000 490.000000\n",
"50% 1600.000000 148.000000 3.00000 500.000000\n",
"75% 2000.000000 185.000000 4.00000 620.000000\n",
"max 2100.000000 195.000000 4.00000 650.000000"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0eb032aa",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Square ft</th>\n",
" <th>Square m</th>\n",
" <th>Bedrooms</th>\n",
" <th>Price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Square ft</th>\n",
" <td>1.000000</td>\n",
" <td>0.999886</td>\n",
" <td>0.900426</td>\n",
" <td>0.998810</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Square m</th>\n",
" <td>0.999886</td>\n",
" <td>1.000000</td>\n",
" <td>0.894482</td>\n",
" <td>0.998395</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bedrooms</th>\n",
" <td>0.900426</td>\n",
" <td>0.894482</td>\n",
" <td>1.000000</td>\n",
" <td>0.909066</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Price</th>\n",
" <td>0.998810</td>\n",
" <td>0.998395</td>\n",
" <td>0.909066</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Square ft Square m Bedrooms Price\n",
"Square ft 1.000000 0.999886 0.900426 0.998810\n",
"Square m 0.999886 1.000000 0.894482 0.998395\n",
"Bedrooms 0.900426 0.894482 1.000000 0.909066\n",
"Price 0.998810 0.998395 0.909066 1.000000"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.corr()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6a166792",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(8222.19067218415)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.linalg.cond(X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"As we can see, everything is basically correlated, and we clearly have some redundancies. \n",
"\n",
"This section is structured as follows. \n",
"- [Low-rank approximation via SVD](#low-rank-approximation-via-svd)\n",
"- [Centering data](#centering-data)\n",
"\n",
"\n",
"## Low-rank approximation via SVD\n",
"\n",
"Let $A$ be an $n \\times p$ matrix and let $A = U\\Sigma V^T$ be a SVD. Let $u_1,\\dots,u_n$ be the columns of $U$, $v_1,\\dots,v_p$ be the column of $V$, and $\\sigma_1 \\geq \\cdots \\sigma_r > 0$ be the singular values, where $r \\leq \\min\\{n,p\\}$ is the rank of $A$. Then we have the **reduced singular value decomposition** (see [Pseudoinverses and using the svd](#pseudoinverses-and-using-the-svd))\n",
"$$ A = \\sum_{i=1}^r \\sigma_i u_iv_i^T $$\n",
"(note that $u_i$ is a $n \\times 1$ matrix and $v_i$ is a $p \\times 1$ matrix, so $u_iv_i^T$ is some $n \\times p$ matrix).\n",
"The key idea is that if the rank of $A$ is higher, say $s$, but the latter singular values are small, then we should still have an approximation like this. Say $\\sigma_{r+1},\\dots,\\sigma_{s}$ are tiny. Then\n",
"$$ \\begin{split} A &= \\sum_{i=1}^s \\sigma_i u_i v_i^T \\\\ &= \\sum_{i=1}^r \\sigma_i u_iv_i^T + \\sum_{i=r+1}^{s} \\sigma_i u_iv_i^T \\\\ &\\approx \\sum_{i=1}^r \\sigma_iu_i v_i^T \\end{split}. $$\n",
"So defining $A_r := \\sum_{i=1}^r \\sigma_i u_iv_i^T$, we are approximating $A$ by $A_r$.\n",
"\n",
"In what sense is this a good approximation though? Recall that the Frobenius norm of a matrix $A$ is defined as the sqrt root of the sum of the squares of all the entries:\n",
"$$ \\|A\\|_F = \\sqrt{\\sum_{i,j} a_{ij}^2}. $$\n",
"The Frobenius norm acts as a very nice generalization of the $L^2$ norm for vectors, and is an indispensable tool in both linear algebra and data science. The point is that this \"approximation\" above actually works in the Frobenius norm, and this reduced singular value decomposition in fact minimizes the error.\n",
"\n",
"> **Theorem** (EckartYoungMirsky). Let $A$ be an $n \\times p$ matrix of rank $r$. For $k \\leq r$,\n",
"> $$ \\min_{B \\text{ such that rank}(B) \\leq k} \\|A - B\\|_F = \\|A - A_k\\|_F. $$\n",
"> The (at most) rank $k$ matrix $A_k$ also realizes the minimum when optimizing for the operator norm.\n",
"\n",
"> **Example**. Recall that we have the following SVD:\n",
"> $$ \\begin{bmatrix} 3 & 2 & 2 \\\\ 2 & 3 & -2 \\end{bmatrix} = \\begin{bmatrix} \\frac{1}{\\sqrt{2}} & \\frac{1}{\\sqrt{2}} \\\\ \\frac{1}{\\sqrt{2}} & -\\frac{1}{\\sqrt{2}} \\end{bmatrix} \\begin{bmatrix} 5 & 0 & 0 \\\\ 0 & 3 & 0 \\end{bmatrix} \\begin{bmatrix} \\frac{1}{\\sqrt{2}} & \\frac{1}{3\\sqrt{2}} & -\\frac{2}{3} \\\\ \\frac{1}{\\sqrt{2}} & -\\frac{1}{3\\sqrt{2}} & \\frac{2}{3} \\\\ 0 & \\frac{4}{3\\sqrt{2}} & \\frac{1}{3} \\end{bmatrix}^T. $$\n",
"> So if we want a rank-one approximation for the matrix, we'll do the reduced SVD. We have\n",
"> $$ \\begin{split} A_1 &= \\sigma_1u_1v_1^T \\\\ &= 5\\begin{bmatrix} \\frac{1}{\\sqrt{2}} \\\\ \\frac{1}{\\sqrt{2}} \\end{bmatrix}\\begin{bmatrix} \\frac{1}{\\sqrt{2}} & \\frac{1}{\\sqrt{2}} & 0 \\end{bmatrix} \\\\ &= \\begin{bmatrix} \\frac{5}{2} & \\frac{5}{2} & 0 \\\\ \\frac{5}{2} & \\frac{5}{2} & 0 \\end{bmatrix} \\end{split}$$\n",
"> Now let's compute the (square of the) Frobenius norm of the difference $A - A_1$. We have\n",
"> $$ \\begin{split} \\|A - A_1\\|_F^2 &= \\left\\| \\begin{bmatrix} \\frac{1}{2} & -\\frac{1}{2} & 2 \\\\ -\\frac{1}{2} & \\frac{1}{2} & -2 \\end{bmatrix}\\right\\|_F^2 \\\\ &= 4(\\frac{1}{2})^2 + 2(2^2) = 9. \\end{split} $$\n",
"> So the Frobenius distance between $A$ and $A_1$ is 3, and we know by Eckart-Young-Mirsky that this is the smallest we can get when looking at the difference between $A$ and a (at most) rank one $2 \\times 3$ matrix. As mentioned, the operator norm $\\|A - A_1\\|$ also minimizes the distance (in operator norm). We know this to be the largest singular value. As $A - A_1$ has SVD\n",
"> $$ \\begin{bmatrix} \\frac{1}{2} & -\\frac{1}{2} & 2 \\\\ -\\frac{1}{2} & \\frac{1}{2} & -2 \\end{bmatrix} = \\begin{bmatrix} -\\frac{1}{\\sqrt{2}} & \\frac{1}{\\sqrt{2}} \\\\ \\frac{1}{\\sqrt{2}} & \\frac{1}{\\sqrt{2}} \\end{bmatrix}\\begin{bmatrix} 3 & 0 & 0 \\\\ 0 & 0 & 0 \\end{bmatrix} \\begin{bmatrix} -\\frac{1}{3\\sqrt{2}} & -\\frac{4}{\\sqrt{17}} & \\frac{1}{3\\sqrt{34}} \\\\ \\frac{1}{3\\sqrt{2}} & 0 & \\frac{1}{3}\\sqrt{\\frac{17}{2}} \\\\ -\\frac{2\\sqrt{2}}{3} & \\frac{1}{\\sqrt{17}} & \\frac{2}{3}\\sqrt{\\frac{2}{17}} \\end{bmatrix}, $$\n",
"> the operator norm is also 3. \n",
"\n",
"Now let's do this in python. We'll set up our matrix as usual, take the SVD, do the truncated construction of $A_1$, and use `numpy.linalg.norm` to look at the norms. \n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# Create our matrix A\n",
"A = np.array([[3,2,2],[2,3,-2]])\n",
"\n",
"# Take the SVD\n",
"U, S, Vh = np.linalg.svd(A)\n",
"\n",
"# Create our rank-1 approximation\n",
"sigma1 = S[0]\n",
"u1 = U[:, [0]]\t\t#shape (2,2)\n",
"v1T = Vh[[0], :]\t\t#shape (3,3)\n",
"A1 = sigma1 * (u1 @ v1T)\n",
"\n",
"# Take norms and view errors\n",
"frobenius_error = np.linalg.norm(A - A1, ord=\"fro\")\t#Frobenius norm\n",
"operator_error = np.linalg.norm(A - A1, ord=2)\t\t#operator norm\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Let's see if we get what we expect.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "799ea5da",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(4.999999999999999)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sigma1"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e17ad031",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.70710678],\n",
" [-0.70710678]])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"u1"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b75d1b41",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-7.07106781e-01, -7.07106781e-01, -6.47932334e-17]])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1T"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "cda3bc1a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[2.50000000e+00, 2.50000000e+00, 2.29078674e-16],\n",
" [2.50000000e+00, 2.50000000e+00, 2.29078674e-16]])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A1"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "5741dc92",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(3.0)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"frobenius_error"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "b1171244",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(3.0)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"operator_error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"So this numerically confirms the EYM theorem. \n",
"\n",
"## Centering data \n",
"In data science, we rarely apply low-rank approximation to raw values directly, because translation and units can dominate the geometry. Instead, we apply these methods to centered (and often standardized) data so that low-rank structure reflects relationships among features rather than the absolute location or measurement scale. Centering converts the problem from approximating an affine cloud to approximating a linear one, in direct analogy with including an intercept term in linear regression. Therefore, before we can analyze the variance structure, we must ensure our data is centered, i.e., that each feature has a mean of 0. We achieve this by subtracting the mean of each column from every entry in that column.\n",
"Suppose $X$ is our $n \\times p$ data matrix, and let\n",
"$$ \\mu = \\frac{1}{n}\\mathbb{1}^T X. $$\n",
"Then\n",
"$$ \\hat{X} = X - \\mu \\mathbb{1} $$\n",
"will be centered data matrix.\n",
"\n",
"> **Example**. Going back to our housing example, the means of the columns are 1770, 164, 3.2, and 547, respectively. So our centered matrix is\n",
"> $$ \\hat{X} = \\begin{bmatrix} -170 & -16 & -0.2 & -47 \\\\ 330 & 31 & 0.8 & 103 \\\\ -220 & -20 & -1.2 & -72 \\\\ -170 & -16 & -0.2 & -57 \\\\ 230 & 21 & 0.8 & 73 \\end{bmatrix}. $$\n",
"\n",
"Let's do this in python.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# First let us make a dictionary incorporating our data.\n",
"# Each entry corresponds to a column (feature of our data)\n",
"data = {\n",
" 'Square ft': [1600, 2100, 1550, 1600, 2000],\n",
"\t'Square m': [148, 195, 144, 148, 185],\n",
" 'Bedrooms': [3, 4, 2, 3, 4],\n",
" 'Price': [500, 650, 475, 490, 620]\n",
"}\n",
"\n",
"# Create a pandas DataFrame\n",
"df = pd.DataFrame(data)\n",
"\n",
"# Create out matrix X\n",
"X = df.to_numpy()\n",
"\n",
"# Get our vector of means\n",
"X_means = np.mean(X, axis=0)\n",
"\n",
"# Create our centered matrix\n",
"X_centered = X - X_means\n",
"\n",
"# Get the SVD for X_centered\n",
"U, S, Vh = np.linalg.svd(X_centered)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"This returns the following.\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "4288abb2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1770. , 164. , 3.2, 547. ])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_means"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "31c2ebf2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-1.70e+02, -1.60e+01, -2.00e-01, -4.70e+01],\n",
" [ 3.30e+02, 3.10e+01, 8.00e-01, 1.03e+02],\n",
" [-2.20e+02, -2.00e+01, -1.20e+00, -7.20e+01],\n",
" [-1.70e+02, -1.60e+01, -2.00e-01, -5.70e+01],\n",
" [ 2.30e+02, 2.10e+01, 8.00e-01, 7.30e+01]])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_centered"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"We will apply the low-rank approximations from the previous sections. First let's see what our SVD looks like, and what the condition number is.\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "d944d257",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"U = [[-0.32486018 -0.81524197 -0.01735449 -0.17188722 0.4472136 ]\n",
" [ 0.63705869 0.10707263 -0.3450375 -0.51345964 0.4472136 ]\n",
" [-0.42643013 0.35553416 -0.61058318 0.34487822 0.4472136 ]\n",
" [-0.33034709 0.436448 0.61781883 -0.3445052 0.4472136 ]\n",
" [ 0.44457871 -0.08381281 0.35515633 0.68497384 0.4472136 ]]\n",
"\n",
"S = [5.44828440e+02 7.61035608e+00 8.91429037e-01 2.41987799e-01]\n",
"\n",
"Vh.T = [[ 0.95017495 0.29361033 0.08182661 0.06530651]\n",
" [ 0.08827897 0.06690917 -0.71081981 -0.69459714]\n",
" [ 0.00276797 -0.04366082 0.69629997 -0.71641638]\n",
" [ 0.29894268 -0.95258064 -0.05662119 0.00417714]]\n",
"\n",
"Condition number of X_centered = 2251.4707027583063\n"
]
}
],
"source": [
"print(f\"U = {U}\\n\\nS = {S}\\n\\nVh.T = {Vh.T}\\n\")\n",
"print(\"Condition number of X_centered = \", np.linalg.cond(X_centered))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Now let's approximate our centered matrix $\\hat{X}$ by some lower-rank matrices. First, we'll define a function which will give us a rank $k$ truncated SVD. \n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# Defining the truncated svd\n",
"def reduced_svd_matrix_k(U, S, Vh, k):\n",
"\tUk = U[:, :k]\n",
"\tSk = np.diag(S[:k])\n",
"\tVhk = Vh[:k, :]\n",
"\treturn Uk @ Sk @ Vhk\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Now, as $\\hat{X}$ has rank 4, we can do a reduced matrix of rank 1,2,3. We will do this in a loop.\n",
"\n",
"> **Remark**. We'll divide the error by the (Frobenius) norm so that we have a relative error. E.g., if two houses are within 10k of each other, they are similarly priced. The magnitude of error being large doesn't say much if our quantities are large.\n",
"> \n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[-168.1743765 -15.62476472 -0.48991109 -52.91078079]\n",
" [ 329.79403078 30.64054254 0.96072753 103.7593243 ]\n",
" [-220.7553464 -20.50996365 -0.64308544 -69.45373002]\n",
" [-171.01485494 -15.88866823 -0.49818573 -53.80444804]\n",
" [ 230.15054706 21.38285405 0.67045472 72.40963456]] \n",
" k=1: relative Frobenius reconstruction error on centered data = 0.0141 \n",
"\n",
"[[-1.69996018e+02 -1.60398881e+01 -2.19027093e-01 -4.70007022e+01]\n",
" [ 3.30033282e+02 3.06950642e+01 9.25150039e-01 1.02983104e+02]\n",
" [-2.19960913e+02 -2.03289247e+01 -7.61220318e-01 -7.20311670e+01]\n",
" [-1.70039621e+02 -1.56664278e+01 -6.43206200e-01 -5.69684681e+01]\n",
" [ 2.29963269e+02 2.13401763e+01 6.98303572e-01 7.30172337e+01]] \n",
" k=2: relative Frobenius reconstruction error on centered data = 0.0017 \n",
"\n",
"[[-1.69997284e+02 -1.60288915e+01 -2.29799059e-01 -4.69998263e+01]\n",
" [ 3.30008114e+02 3.09136956e+01 7.10984571e-01 1.03000519e+02]\n",
" [-2.20005450e+02 -1.99420315e+01 -1.14021052e+00 -7.20003486e+01]\n",
" [-1.69994556e+02 -1.60579058e+01 -2.59724807e-01 -5.69996518e+01]\n",
" [ 2.29989175e+02 2.11151332e+01 9.18749820e-01 7.29993076e+01]] \n",
" k=3: relative Frobenius reconstruction error on centered data = 0.0004 \n",
"\n"
]
}
],
"source": [
"for k in [1, 2, 3]:\n",
"\t# Define our reduced matrix\n",
" Xck = reduced_svd_matrix_k(U, S, Vh, k)\n",
"\t# Compute the relative error\n",
" rel_err = np.linalg.norm(X_centered - Xck, ord=\"fro\") / np.linalg.norm(X_centered, ord=\"fro\")\n",
"\t# Print the information\n",
" print(Xck, \"\\n\", f\"k={k}: relative Frobenius reconstruction error on centered data = {rel_err:.4f}\", \"\\n\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"This seems to check out -- it says that one rank (or one feature) should be roughly enough to describe this data. This should make sense because clearly the square meterage, # of bedrooms, and price depend on the square footage. \n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
+473
View File
@@ -0,0 +1,473 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6ae6c7f8",
"metadata": {},
"source": [
"# Spectral Image Denoising via Truncated SVD\n",
"\n",
"This notebook extracts the image denoising project into a standalone workflow and extends it from **grayscale images** to **actual color images**.\n",
"\n",
"The core idea is the same as in the original write-up: if an image matrix has singular value decomposition\n",
"$$\n",
"A = U \\Sigma V^T,\n",
"$$\n",
"then the best rank-$k$ approximation to $A$ in Frobenius norm is obtained by truncating the SVD. This is the **EckartYoungMirsky theorem**.\n",
"\n",
"For a grayscale image, the image is a single matrix. For an RGB image, we treat the image as **three matrices**, one for each channel, and apply truncated SVD to each channel separately."
]
},
{
"cell_type": "markdown",
"id": "31a665c9",
"metadata": {},
"source": [
"## Outline\n",
"\n",
"1. Load an image from disk\n",
"2. Convert it to grayscale or keep it in RGB\n",
"3. Add synthetic Gaussian noise\n",
"4. Compute a truncated SVD reconstruction\n",
"5. Compare the original, noisy, and denoised images\n",
"6. Measure quality using MSE and PSNR\n",
"\n",
"This notebook is written so that you can use **your own image files** directly."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "88584c56",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from PIL import Image\n",
"from pathlib import Path"
]
},
{
"cell_type": "markdown",
"id": "30e96441",
"metadata": {},
"source": [
"## A note on color images\n",
"\n",
"For a grayscale image, SVD applies directly to a single matrix. For a color image $A \\in \\mathbb{R}^{n \\times p \\times 3}$, we write\n",
"$$\n",
"A = (A_R, A_G, A_B),\n",
"$$\n",
"where each channel is an $n \\times p$ matrix. We then compute a rank-$k$ approximation for each channel:\n",
"$$\n",
"A_R \\approx (A_R)_k,\\qquad\n",
"A_G \\approx (A_G)_k,\\qquad\n",
"A_B \\approx (A_B)_k,\n",
"$$\n",
"and stack them back together.\n",
"\n",
"This is the most direct extension of the grayscale method, and it works well as a first linear-algebraic treatment of color denoising."
]
},
{
"cell_type": "markdown",
"id": "f275cbc9",
"metadata": {},
"source": [
"## Helper functions\n",
"\n",
"We begin with some utilities for:\n",
"- loading images,\n",
"- adding Gaussian noise,\n",
"- reconstructing rank-$k$ approximations,\n",
"- computing image-quality metrics."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21adfcaf",
"metadata": {},
"outputs": [],
"source": [
"def load_image(path, mode=\"rgb\"):\n",
" \"\"\"\n",
" Load an image from disk.\n",
"\n",
" Parameters\n",
" ----------\n",
" path : str or Path\n",
" Path to the image file.\n",
" mode : {\"rgb\", \"gray\"}\n",
" Whether to load the image as RGB or grayscale.\n",
"\n",
" Returns\n",
" -------\n",
" np.ndarray\n",
" Float image array scaled to [0, 255].\n",
" Shape is (H, W, 3) for RGB and (H, W) for grayscale.\n",
" \"\"\"\n",
" path = Path(path)\n",
" img = Image.open(path)\n",
"\n",
" if mode.lower() in {\"gray\", \"grayscale\", \"l\"}:\n",
" img = img.convert(\"L\")\n",
" else:\n",
" img = img.convert(\"RGB\")\n",
"\n",
" return np.asarray(img, dtype=np.float64)\n",
"\n",
"\n",
"def show_image(img, title=None):\n",
" \"\"\"Display a grayscale or RGB image.\"\"\"\n",
" plt.figure(figsize=(6, 6))\n",
" if img.ndim == 2:\n",
" plt.imshow(np.clip(img, 0, 255), cmap=\"gray\", vmin=0, vmax=255)\n",
" else:\n",
" plt.imshow(np.clip(img, 0, 255).astype(np.uint8))\n",
" if title is not None:\n",
" plt.title(title)\n",
" plt.axis(\"off\")\n",
" plt.show()\n",
"\n",
"\n",
"def add_gaussian_noise(img, sigma=25, seed=0):\n",
" \"\"\"\n",
" Add Gaussian noise to an image.\n",
"\n",
" Parameters\n",
" ----------\n",
" img : np.ndarray\n",
" Image array in [0, 255].\n",
" sigma : float\n",
" Standard deviation of the noise.\n",
" seed : int\n",
" Random seed for reproducibility.\n",
"\n",
" Returns\n",
" -------\n",
" np.ndarray\n",
" Noisy image clipped to [0, 255].\n",
" \"\"\"\n",
" rng = np.random.default_rng(seed)\n",
" noisy = img + rng.normal(loc=0.0, scale=sigma, size=img.shape)\n",
" return np.clip(noisy, 0, 255)\n",
"\n",
"\n",
"def truncated_svd_matrix(A, k):\n",
" \"\"\"\n",
" Return the rank-k truncated SVD approximation of a 2D matrix A.\n",
" \"\"\"\n",
" U, s, Vt = np.linalg.svd(A, full_matrices=False)\n",
" k = min(k, len(s))\n",
" return (U[:, :k] * s[:k]) @ Vt[:k, :]\n",
"\n",
"\n",
"def truncated_svd_image(img, k):\n",
" \"\"\"\n",
" Apply truncated SVD to a grayscale or RGB image.\n",
"\n",
" For RGB images, truncated SVD is applied channel-by-channel.\n",
"\n",
" Parameters\n",
" ----------\n",
" img : np.ndarray\n",
" Shape (H, W) for grayscale or (H, W, 3) for RGB.\n",
" k : int\n",
" Truncation rank.\n",
"\n",
" Returns\n",
" -------\n",
" np.ndarray\n",
" Reconstructed image clipped to [0, 255].\n",
" \"\"\"\n",
" if img.ndim == 2:\n",
" recon = truncated_svd_matrix(img, k)\n",
" return np.clip(recon, 0, 255)\n",
"\n",
" if img.ndim == 3:\n",
" channels = []\n",
" for c in range(img.shape[2]):\n",
" channel_recon = truncated_svd_matrix(img[:, :, c], k)\n",
" channels.append(channel_recon)\n",
" recon = np.stack(channels, axis=2)\n",
" return np.clip(recon, 0, 255)\n",
"\n",
" raise ValueError(\"Image must be either 2D (grayscale) or 3D (RGB).\")\n",
"\n",
"\n",
"def mse(A, B):\n",
" \"\"\"Mean squared error between two images.\"\"\"\n",
" return np.mean((A.astype(np.float64) - B.astype(np.float64)) ** 2)\n",
"\n",
"\n",
"def psnr(A, B, max_val=255.0):\n",
" \"\"\"Peak signal-to-noise ratio in decibels.\"\"\"\n",
" err = mse(A, B)\n",
" if err == 0:\n",
" return np.inf\n",
" return 10 * np.log10((max_val ** 2) / err)"
]
},
{
"cell_type": "markdown",
"id": "fe1d4932",
"metadata": {},
"source": [
"## Choose an image"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42bafca5",
"metadata": {},
"outputs": [],
"source": [
"IMAGE_PATH = \"../images/bella.jpg\" \n",
"MODE = \"rgb\" # use \"gray\" for grayscale, \"rgb\" for color\n",
"\n",
"img = load_image(IMAGE_PATH, mode=MODE)\n",
"print(\"Image shape:\", img.shape)\n",
"show_image(img, title=f\"Original image ({MODE})\")"
]
},
{
"cell_type": "markdown",
"id": "05e52222",
"metadata": {},
"source": [
"## Add synthetic Gaussian noise\n",
"\n",
"We add noise so that the denoising effect is visible and measurable."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "528e69b3",
"metadata": {},
"outputs": [],
"source": [
"sigma = 25\n",
"seed = 0\n",
"\n",
"img_noisy = add_gaussian_noise(img, sigma=sigma, seed=seed)\n",
"\n",
"img_noisy.save('../images/bella_noisy.jpg')\n",
"\n",
"show_image(img_noisy, title=f\"Noisy image (sigma={sigma})\")"
]
},
{
"cell_type": "markdown",
"id": "1bbcc1d8",
"metadata": {},
"source": [
"## Visualizing rank-$k$ reconstructions\n",
"\n",
"For small $k$, the reconstruction captures only coarse structure.\n",
"As $k$ increases, more detail returns. For denoising, there is often a useful middle ground:\n",
"enough singular values to preserve structure, but not so many that we reintroduce noise."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "563df53a",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import math\n",
"\n",
"ks = [5, 20, 50, 100]\n",
"\n",
"# Collect all images + titles\n",
"images = []\n",
"titles = []\n",
"\n",
"# Original\n",
"images.append(img)\n",
"titles.append(\"Original\")\n",
"\n",
"# Noisy\n",
"images.append(img_noisy)\n",
"titles.append(\"Noisy\")\n",
"\n",
"# Reconstructions\n",
"for k in ks:\n",
" recon = truncated_svd_image(img_noisy, k)\n",
" images.append(recon)\n",
" titles.append(f\"k = {k}\")\n",
"\n",
"# Grid setup\n",
"ncols = 2\n",
"nrows = math.ceil(len(images) / ncols)\n",
"\n",
"fig, axes = plt.subplots(nrows, ncols, figsize=(6 * ncols, 4 * nrows))\n",
"axes = axes.flatten() # easier indexing\n",
"\n",
"# Plot everything\n",
"for i, (ax, im, title) in enumerate(zip(axes, images, titles)):\n",
" if im.ndim == 2:\n",
" ax.imshow(im, cmap=\"gray\", vmin=0, vmax=255)\n",
" else:\n",
" ax.imshow(np.clip(im, 0, 255).astype(np.uint8))\n",
" \n",
" ax.set_title(title)\n",
" ax.axis(\"off\")\n",
"\n",
"# Hide any unused axes\n",
"for j in range(len(images), len(axes)):\n",
" axes[j].axis(\"off\")\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig('../images/bella_truncated_svd_multiplie_ks.png')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "309579fa",
"metadata": {},
"source": [
"## Quantitative evaluation\n",
"\n",
"We compare each reconstruction against the **clean original image**, not against the noisy one.\n",
"A good denoising rank should typically:\n",
"- reduce MSE relative to the noisy image,\n",
"- increase PSNR relative to the noisy image."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56ce07ee",
"metadata": {},
"outputs": [],
"source": [
"baseline_mse = mse(img, img_noisy)\n",
"baseline_psnr = psnr(img, img_noisy)\n",
"\n",
"print(f\"Noisy image baseline -> MSE: {baseline_mse:.2f}, PSNR: {baseline_psnr:.2f} dB\")\n",
"\n",
"results = []\n",
"for k in ks:\n",
" recon = truncated_svd_image(img_noisy, k)\n",
" results.append((k, mse(img, recon), psnr(img, recon)))\n",
"\n",
"print(\"\\nRank-k reconstructions:\")\n",
"for k, m, p in results:\n",
" print(f\"k = {k:3d} | MSE = {m:10.2f} | PSNR = {p:6.2f} dB\")"
]
},
{
"cell_type": "markdown",
"id": "de9c3f3c",
"metadata": {},
"source": [
"## Automatic search over many values of $k$\n",
"\n",
"This is often useful because the best denoising rank is image-dependent."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e097dcf4",
"metadata": {},
"outputs": [],
"source": [
"candidate_ks = list(range(1, 151, 5))\n",
"\n",
"scores = []\n",
"for k in candidate_ks:\n",
" recon = truncated_svd_image(img_noisy, k)\n",
" scores.append((k, mse(img, recon), psnr(img, recon)))\n",
"\n",
"best_by_mse = min(scores, key=lambda x: x[1])\n",
"best_by_psnr = max(scores, key=lambda x: x[2])\n",
"\n",
"print(\"Best by MSE :\", best_by_mse)\n",
"print(\"Best by PSNR:\", best_by_psnr)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4b9dc5c7",
"metadata": {},
"outputs": [],
"source": [
"best_k = best_by_psnr[0]\n",
"best_recon = truncated_svd_image(img_noisy, best_k)\n",
"\n",
"fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
"\n",
"if img.ndim == 2:\n",
" axes[0].imshow(img, cmap=\"gray\", vmin=0, vmax=255)\n",
" axes[1].imshow(img_noisy, cmap=\"gray\", vmin=0, vmax=255)\n",
" axes[2].imshow(best_recon, cmap=\"gray\", vmin=0, vmax=255)\n",
"else:\n",
" axes[0].imshow(np.clip(img, 0, 255).astype(np.uint8))\n",
" axes[1].imshow(np.clip(img_noisy, 0, 255).astype(np.uint8))\n",
" axes[2].imshow(np.clip(best_recon, 0, 255).astype(np.uint8))\n",
"\n",
"axes[0].set_title(\"Original\")\n",
"axes[1].set_title(\"Noisy\")\n",
"axes[2].set_title(f\"Best reconstruction (k={best_k})\")\n",
"\n",
"for ax in axes:\n",
" ax.axis(\"off\")\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig('../images/bella_best_truncated.png')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "1a3acfe5",
"metadata": {},
"source": [
"## Remarks and possible extensions\n",
"\n",
"- The same rank $k$ was used for every color channel. You could instead choose different ranks per channel.\n",
"- You can test this on photographs, scanned documents, or screenshots.\n",
"- Truncated SVD is excellent for illustrating low-rank structure, but it is not the only denoising method.\n",
"- A more advanced next step would be to compare SVD denoising against:\n",
" - Gaussian blur,\n",
" - median filtering,\n",
" - wavelet denoising,\n",
" - non-local means,\n",
" - autoencoder-based denoising.\n",
"\n",
"For this notebook, though, the point is to keep the method squarely grounded in linear algebra."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
File diff suppressed because it is too large Load Diff
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
+455
View File
@@ -0,0 +1,455 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6ae6c7f8",
"metadata": {},
"source": [
"# Spectral Image Denoising via Truncated SVD\n",
"\n",
"This notebook extracts the image denoising project into a standalone workflow and extends it from **grayscale images** to **actual color images**.\n",
"\n",
"The core idea is the same as in the original write-up: if an image matrix has singular value decomposition\n",
"\\[\n",
"A = U \\Sigma V^T,\n",
"\\]\n",
"then the best rank-$k$ approximation to \\(A\\) in Frobenius norm is obtained by truncating the SVD. This is the **EckartYoungMirsky theorem**.\n",
"\n",
"For a grayscale image, the image is a single matrix. For an RGB image, we treat the image as **three matrices**, one for each channel, and apply truncated SVD to each channel separately."
]
},
{
"cell_type": "markdown",
"id": "31a665c9",
"metadata": {},
"source": [
"## Outline\n",
"\n",
"1. Load an image from disk\n",
"2. Convert it to grayscale or keep it in RGB\n",
"3. Add synthetic Gaussian noise\n",
"4. Compute a truncated SVD reconstruction\n",
"5. Compare the original, noisy, and denoised images\n",
"6. Measure quality using MSE and PSNR\n",
"\n",
"This notebook is written so that you can use **your own image files** directly."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "88584c56",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from PIL import Image\n",
"from pathlib import Path"
]
},
{
"cell_type": "markdown",
"id": "f275cbc9",
"metadata": {},
"source": [
"## Helper functions\n",
"\n",
"We begin with some utilities for:\n",
"- loading images,\n",
"- adding Gaussian noise,\n",
"- reconstructing rank-$k$ approximations,\n",
"- computing image-quality metrics."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21adfcaf",
"metadata": {},
"outputs": [],
"source": [
"def load_image(path, mode=\"rgb\"):\n",
" \"\"\"\n",
" Load an image from disk.\n",
"\n",
" Parameters\n",
" ----------\n",
" path : str or Path\n",
" Path to the image file.\n",
" mode : {\"rgb\", \"gray\"}\n",
" Whether to load the image as RGB or grayscale.\n",
"\n",
" Returns\n",
" -------\n",
" np.ndarray\n",
" Float image array scaled to [0, 255].\n",
" Shape is (H, W, 3) for RGB and (H, W) for grayscale.\n",
" \"\"\"\n",
" path = Path(path)\n",
" img = Image.open(path)\n",
"\n",
" if mode.lower() in {\"gray\", \"grayscale\", \"l\"}:\n",
" img = img.convert(\"L\")\n",
" else:\n",
" img = img.convert(\"RGB\")\n",
"\n",
" return np.asarray(img, dtype=np.float64)\n",
"\n",
"\n",
"def show_image(img, title=None):\n",
" \"\"\"Display a grayscale or RGB image.\"\"\"\n",
" plt.figure(figsize=(6, 6))\n",
" if img.ndim == 2:\n",
" plt.imshow(np.clip(img, 0, 255), cmap=\"gray\", vmin=0, vmax=255)\n",
" else:\n",
" plt.imshow(np.clip(img, 0, 255).astype(np.uint8))\n",
" if title is not None:\n",
" plt.title(title)\n",
" plt.axis(\"off\")\n",
" plt.show()\n",
"\n",
"\n",
"def add_gaussian_noise(img, sigma=25, seed=0):\n",
" \"\"\"\n",
" Add Gaussian noise to an image.\n",
"\n",
" Parameters\n",
" ----------\n",
" img : np.ndarray\n",
" Image array in [0, 255].\n",
" sigma : float\n",
" Standard deviation of the noise.\n",
" seed : int\n",
" Random seed for reproducibility.\n",
"\n",
" Returns\n",
" -------\n",
" np.ndarray\n",
" Noisy image clipped to [0, 255].\n",
" \"\"\"\n",
" rng = np.random.default_rng(seed)\n",
" noisy = img + rng.normal(loc=0.0, scale=sigma, size=img.shape)\n",
" return np.clip(noisy, 0, 255)\n",
"\n",
"\n",
"def truncated_svd_matrix(A, k):\n",
" \"\"\"\n",
" Return the rank-k truncated SVD approximation of a 2D matrix A.\n",
" \"\"\"\n",
" U, s, Vt = np.linalg.svd(A, full_matrices=False)\n",
" k = min(k, len(s))\n",
" return (U[:, :k] * s[:k]) @ Vt[:k, :]\n",
"\n",
"\n",
"def truncated_svd_image(img, k):\n",
" \"\"\"\n",
" Apply truncated SVD to a grayscale or RGB image.\n",
"\n",
" For RGB images, truncated SVD is applied channel-by-channel.\n",
"\n",
" Parameters\n",
" ----------\n",
" img : np.ndarray\n",
" Shape (H, W) for grayscale or (H, W, 3) for RGB.\n",
" k : int\n",
" Truncation rank.\n",
"\n",
" Returns\n",
" -------\n",
" np.ndarray\n",
" Reconstructed image clipped to [0, 255].\n",
" \"\"\"\n",
" if img.ndim == 2:\n",
" recon = truncated_svd_matrix(img, k)\n",
" return np.clip(recon, 0, 255)\n",
"\n",
" if img.ndim == 3:\n",
" channels = []\n",
" for c in range(img.shape[2]):\n",
" channel_recon = truncated_svd_matrix(img[:, :, c], k)\n",
" channels.append(channel_recon)\n",
" recon = np.stack(channels, axis=2)\n",
" return np.clip(recon, 0, 255)\n",
"\n",
" raise ValueError(\"Image must be either 2D (grayscale) or 3D (RGB).\")\n",
"\n",
"\n",
"def mse(A, B):\n",
" \"\"\"Mean squared error between two images.\"\"\"\n",
" return np.mean((A.astype(np.float64) - B.astype(np.float64)) ** 2)\n",
"\n",
"\n",
"def psnr(A, B, max_val=255.0):\n",
" \"\"\"Peak signal-to-noise ratio in decibels.\"\"\"\n",
" err = mse(A, B)\n",
" if err == 0:\n",
" return np.inf\n",
" return 10 * np.log10((max_val ** 2) / err)"
]
},
{
"cell_type": "markdown",
"id": "fe1d4932",
"metadata": {},
"source": [
"## Choose an image\n",
"\n",
"Set `IMAGE_PATH` to any real image on your machine.\n",
"\n",
"Examples:\n",
"- `\"bella.jpg\"`\n",
"- `\"my_photo.png\"`\n",
"- `\"images/dog.jpeg\"`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42bafca5",
"metadata": {},
"outputs": [],
"source": [
"IMAGE_PATH = \"bella.jpg\" # change this to your actual file\n",
"MODE = \"rgb\" # use \"gray\" for grayscale, \"rgb\" for color\n",
"\n",
"img = load_image(IMAGE_PATH, mode=MODE)\n",
"print(\"Image shape:\", img.shape)\n",
"show_image(img, title=f\"Original image ({MODE})\")"
]
},
{
"cell_type": "markdown",
"id": "05e52222",
"metadata": {},
"source": [
"## Add synthetic Gaussian noise\n",
"\n",
"We add noise so that the denoising effect is visible and measurable."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "528e69b3",
"metadata": {},
"outputs": [],
"source": [
"sigma = 25\n",
"seed = 0\n",
"\n",
"img_noisy = add_gaussian_noise(img, sigma=sigma, seed=seed)\n",
"show_image(img_noisy, title=f\"Noisy image (sigma={sigma})\")"
]
},
{
"cell_type": "markdown",
"id": "1bbcc1d8",
"metadata": {},
"source": [
"## Visualizing rank-$k$ reconstructions\n",
"\n",
"For small \\(k\\), the reconstruction captures only coarse structure.\n",
"As \\(k\\) increases, more detail returns. For denoising, there is often a useful middle ground:\n",
"enough singular values to preserve structure, but not so many that we reintroduce noise."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "563df53a",
"metadata": {},
"outputs": [],
"source": [
"ks = [5, 20, 50, 100]\n",
"\n",
"ncols = len(ks) + 2\n",
"fig, axes = plt.subplots(1, ncols, figsize=(4 * ncols, 4))\n",
"\n",
"# Original\n",
"if img.ndim == 2:\n",
" axes[0].imshow(img, cmap=\"gray\", vmin=0, vmax=255)\n",
"else:\n",
" axes[0].imshow(np.clip(img, 0, 255).astype(np.uint8))\n",
"axes[0].set_title(\"Original\")\n",
"axes[0].axis(\"off\")\n",
"\n",
"# Noisy\n",
"if img_noisy.ndim == 2:\n",
" axes[1].imshow(img_noisy, cmap=\"gray\", vmin=0, vmax=255)\n",
"else:\n",
" axes[1].imshow(np.clip(img_noisy, 0, 255).astype(np.uint8))\n",
"axes[1].set_title(\"Noisy\")\n",
"axes[1].axis(\"off\")\n",
"\n",
"# Reconstructions\n",
"for ax, k in zip(axes[2:], ks):\n",
" recon = truncated_svd_image(img_noisy, k)\n",
" if recon.ndim == 2:\n",
" ax.imshow(recon, cmap=\"gray\", vmin=0, vmax=255)\n",
" else:\n",
" ax.imshow(np.clip(recon, 0, 255).astype(np.uint8))\n",
" ax.set_title(f\"k = {k}\")\n",
" ax.axis(\"off\")\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "309579fa",
"metadata": {},
"source": [
"## Quantitative evaluation\n",
"\n",
"We compare each reconstruction against the **clean original image**, not against the noisy one.\n",
"A good denoising rank should typically:\n",
"- reduce MSE relative to the noisy image,\n",
"- increase PSNR relative to the noisy image."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56ce07ee",
"metadata": {},
"outputs": [],
"source": [
"baseline_mse = mse(img, img_noisy)\n",
"baseline_psnr = psnr(img, img_noisy)\n",
"\n",
"print(f\"Noisy image baseline -> MSE: {baseline_mse:.2f}, PSNR: {baseline_psnr:.2f} dB\")\n",
"\n",
"results = []\n",
"for k in ks:\n",
" recon = truncated_svd_image(img_noisy, k)\n",
" results.append((k, mse(img, recon), psnr(img, recon)))\n",
"\n",
"print(\"\\nRank-k reconstructions:\")\n",
"for k, m, p in results:\n",
" print(f\"k = {k:3d} | MSE = {m:10.2f} | PSNR = {p:6.2f} dB\")"
]
},
{
"cell_type": "markdown",
"id": "de9c3f3c",
"metadata": {},
"source": [
"## Automatic search over many values of \\(k\\)\n",
"\n",
"This is often useful because the best denoising rank is image-dependent."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e097dcf4",
"metadata": {},
"outputs": [],
"source": [
"candidate_ks = list(range(1, 151, 5))\n",
"\n",
"scores = []\n",
"for k in candidate_ks:\n",
" recon = truncated_svd_image(img_noisy, k)\n",
" scores.append((k, mse(img, recon), psnr(img, recon)))\n",
"\n",
"best_by_mse = min(scores, key=lambda x: x[1])\n",
"best_by_psnr = max(scores, key=lambda x: x[2])\n",
"\n",
"print(\"Best by MSE :\", best_by_mse)\n",
"print(\"Best by PSNR:\", best_by_psnr)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4b9dc5c7",
"metadata": {},
"outputs": [],
"source": [
"best_k = best_by_psnr[0]\n",
"best_recon = truncated_svd_image(img_noisy, best_k)\n",
"\n",
"fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
"\n",
"if img.ndim == 2:\n",
" axes[0].imshow(img, cmap=\"gray\", vmin=0, vmax=255)\n",
" axes[1].imshow(img_noisy, cmap=\"gray\", vmin=0, vmax=255)\n",
" axes[2].imshow(best_recon, cmap=\"gray\", vmin=0, vmax=255)\n",
"else:\n",
" axes[0].imshow(np.clip(img, 0, 255).astype(np.uint8))\n",
" axes[1].imshow(np.clip(img_noisy, 0, 255).astype(np.uint8))\n",
" axes[2].imshow(np.clip(best_recon, 0, 255).astype(np.uint8))\n",
"\n",
"axes[0].set_title(\"Original\")\n",
"axes[1].set_title(\"Noisy\")\n",
"axes[2].set_title(f\"Best reconstruction (k={best_k})\")\n",
"\n",
"for ax in axes:\n",
" ax.axis(\"off\")\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "87d08518",
"metadata": {},
"source": [
"## A note on color images\n",
"\n",
"For a grayscale image, SVD applies directly to a single matrix.\n",
"\n",
"For a color image \\(A \\in \\mathbb{R}^{n \\times p \\times 3}\\), we write\n",
"\\[\n",
"A = (A_R, A_G, A_B),\n",
"\\]\n",
"where each channel is an \\(n \\times p\\) matrix. We then compute a rank-\\(k\\) approximation for each channel:\n",
"\\[\n",
"A_R \\approx (A_R)_k,\\qquad\n",
"A_G \\approx (A_G)_k,\\qquad\n",
"A_B \\approx (A_B)_k,\n",
"\\]\n",
"and stack them back together.\n",
"\n",
"This is the most direct extension of the grayscale method, and it works well as a first linear-algebraic treatment of color denoising."
]
},
{
"cell_type": "markdown",
"id": "1a3acfe5",
"metadata": {},
"source": [
"## Remarks and possible extensions\n",
"\n",
"- The same rank \\(k\\) was used for every color channel. You could instead choose different ranks per channel.\n",
"- You can test this on photographs, scanned documents, or screenshots.\n",
"- Truncated SVD is excellent for illustrating low-rank structure, but it is not the only denoising method.\n",
"- A more advanced next step would be to compare SVD denoising against:\n",
" - Gaussian blur,\n",
" - median filtering,\n",
" - wavelet denoising,\n",
" - non-local means,\n",
" - autoencoder-based denoising.\n",
"\n",
"For this notebook, though, the point is to keep the method squarely grounded in linear algebra."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.x"
}
},
"nbformat": 4,
"nbformat_minor": 5
}