In many physics problem, we always encounter the solution of the linear equation $Ax=b$. Classically, we solve this linear equation in the case that $A$ is a square and invertible matrix. In fact, this case is too special to better model the real world. Based on the singular value decomposition which we have introduced, we can generalize this linear equation to the case that $A$ is a non-square matrix. Actually, when we consider the data analysis and data modeling, the matrix $A$ is always a non-square matrix.
Here we introduce few different things happened due to the non-square matrix $A\in\mathbb{R}^{n\times m}$. Since A is a non-square matrix, there are two canonical non-square matrix $A$, corresponding to the two different canonical non-square linear equations.
- Underdetermined System $(n<m)$
Since $n<m$, the shape of the matrix $A$ is short and fat. In this case, the number of equations is less than the number of unknowns. There is not enough measurements in vector $b$ to determine a single unique vector $x$. In other words, there is more degrees of freedom in the unknown vector $x$ and there is not enough values in the vector $b$ to determine the unknown vector $x$. In general, there are infinitely many solutions $x$ given by the underdetermined system. In fact, you can try to construct a very special underdetermined system such that there is only one solution. But it is too special that we can ignore it. - Overdetermined System $(n>m)$
Since $n>m$, the shape of the matrix $A$ is tall and skinny. In this case, the number of equations is more than the number of unknowns. Unfortunately, since there is not enough degrees of freedom in unknown vector $x$ to fit these equation, there is no solution $x$ given by the overdetermined system generally. In fact, you can try to construct a very special underdetermined system such that there is only one solution. But it is too special that we can ignore it.
The singular value decomposition allows us to approximately invert this matrix $A$ to compute what is known as the pseudo inverse and find a best fit solution $x$ that either comes as close to solving this equation as possible or solves this equation with the minimum $L^2$ norm of $x$. Here we introduce the Moore-Penrose pseudo inverse of the matrix $A$. Based on the economy singular value decomposition of the matrix $A$, we can decompose the matrix $A$ as $A=U\Sigma V^T$, where $U$ and $V$ satisfy $U^TU=I$ and $V^TV=VV^T=I$. The Moore-Penrose pseudo inverse of the matrix $A$ is defined as
Based on the properties of $U$ and $V$, we realize that the Moore-Penrose pseudo inverse is a left pseudo inverse of the matrix $A$. Now we can give a approximate solution of the linear equation $Ax=b$ by using the Moore-Penrose pseudo inverse of the matrix $A$.
Here we have given a best fit solution $x$ which comes as close to solving this equation as possible. Now we will give out this solution which satisfies the optimization problems. Now we give the optimization problems that the best fit solution $x$ satisfies for underdetermined system and overdetermined system.
Underdetermined System $(n<m)$
In this case, the best fit solution $x$ satisfies the optimization problemThe solution of this optimization problem is the solution of the linear equation $Ax=b$ with the minimum $L^2$ norm of $x$.
Overdetermined System
In this case, the best fit solution $x$ satisfies the optimization problemThe solution of this optimization problem is the solution of the linear equation $Ax=b$ that comes as close to solving this equation as possible.
Finally, we will clarify something that is highly confusing: the method of obtaining the optimal approximation solution is same as the normal process,but it is just an approximate solution. Here we substitute the approximate solution $\tilde{x}=A^{\dagger}b$ into the linear equation $Ax=b$.
where the matrix $UU^T$ is not a identity matrix and it might not even be close to the identity matrix. Before we explain the vector $UU^Tb$, we should discuss the project operator. Here we assume that we have a vector $v$ and a matrix $P$. Now we try to project the vector $v$ onto the column space of the matrix $P$. We assume the projection of the vector $v$ onto the column space of the matrix $P$ is $Px$, where $x$ is an unknown vector. Based on the definition of the projection, the vector $e=v-Px$ and all column vectors of $P$ are orthogonal. Therefore, we can derive the following linear equaiton,
From the above result, we can derive the projection vector $Px=P(P^TP)^{-1}P^Tv$. Here we call $P(P^TP)^{-1}P^T$ as the projection operator. Now we consider the matrix $U$ which satisfies $U^TU=I$, so the projection operator $U(U^TU)^{-1}U^T=UU^T$. Above all, the meaning of $UU^Tb$ is the projection of the vector $b$ onto the column space of the matrix $U$. Furthermore, the column space of the matrix $U$ is equal to the column space of the matrix $A$. Therefore, the vector $UU^Tb$ is the projection of the vector $b$ onto the column space of the matrix $A$. The errors crops up since $UU^Tb$ is an orthogonal projection of $B$ onto the column space of $A$.