Firstly, we introduce three different kinds of matrix norms and based on singular value decomposition, verify that these norms are only related to singular values.
- 2-norm (Spectral norm)
- Frobenius norm
- Nuclear norm
Based on the results of matrix norms, we realize that the above three matrix norms only depend on the singular values of the matrix $A$. Now we claim that the product with orthogonal matrices will not change the singular values of the original matrix. For simplicity, we call the left orthogonal matrix and right orthogonal matrix as $P$ and $Q$, respectively. Based on the singular value decomposition of $A$, we can derive the following results about the product of matrix $A$ and orthogonal matrices $P,\ Q$,
where $U_1U_1^T=PUU^TP^T=I=U_1^TU_1$ means $U_1$ is an orthogonal matrix. The same is true for $V_1$. We claim that the singular values of $PAQ$ is the same as the singular values of $A$. So we realize that the product with orthogonal matrices does not affect matrix norm.
Thm(Eckart-Young-Mirsky Theorem) The optimal rank-r approximation to $A$, under the spectral norm and the Frobenius norm, respectively, is given by the rank-r SVD truncation $\tilde{A}$,
where $\tilde{A}=\tilde{U}\tilde{\Sigma}\tilde{V}^T$ is the rank-r SVD truncation of $A$. Again, $\tilde{U}$ and $\tilde{V}$ denote the first $r$ leading columns of $U$ and $V$, respectively, and $\tilde{\Sigma}$ is the $r\times r$ leading submatrix of $\Sigma$.
Proof.
One needs to show that if we have a matrix approximation $B$ whose rank is $r$ and size is $n\times m$, then $\left\Vert A-B\right\Vert_2\geq \left\Vert A-\tilde{A}\right\Vert_2$ holds. This can be done as follows.
Here we introduce the concept of matrix kernel before the discussion. The kernel of matrix $A$ is defined as
Therefore, we realize that the kernel of matrix $\mathcal{N}(A)$ actually means the solution space of the linear equation $Ax=0$. Based on the linear equation theory, we know that when the rank of matrix $B$ is $r$, the dimension of $\mathcal{N}(B)$ is $n-r$. Here we consider the linear space $V{r+1}$ spanned by the first $r+1$ leading right singular vectors $v_1,\ v_2,\dots,v{r+1}$, whose dimension is $r+1$. Based on the dimensional formula, we derive that
which means that $\mathcal{N}(B)\cap V{r+1}\not=\empty$. Therefore, we have that there exists an vector $x=\gamma_1v_1+\cdots+\gamma{r+1}v_{r+1}$ such that
Based on the definition of 2-norm, we can derive that
which means that the optimal rank-r approximation to $A$ under the 2-norm is given by the rank-r SVD truncation $\tilde{A}$.
Now we claim that the best rank $r$ approximation to $A$ in the Frobenius norm, denoted by $\Vert\cdot\Vert_F$, is given by
First, note that we have
Hence, we need to show that if $B_r$ is any rank-$r$ matrix, then,
Based on the triangle inequality with the spectral norm, if $A=A’+A’’$, then $\sigma_1(A)\leq\sigma_1(A’)+\sigma_1(A’’)$. Suppose $A_r’$ and $A_r’’$ denote the rank $r$ approximation to $A’$ and $A’’$ by SVD method described above, respectively. Then, for any $i,j\geq1$
Here we know that $rank(A{i-1}’+A{j-1}’’)\leq i+j-2$, we can derive that
Since $\sigma_{r+1}(B_r)=0$, when $A’=A-B_r$ and $A’’=B_r$, we conclude that for $i\geq1,\ j=r+1$
Therefore, we have