SVD-Eckart-Young-Mirsky theorem

Firstly, we introduce three different kinds of matrix norms and based on singular value decomposition, verify that these norms are only related to singular values.

2-norm (Spectral norm)

$\left\Vert A\right\Vert_2=\sqrt{\lambda_{max}(A^TA)}=\sqrt{\lambda_{max}(U\Sigma V^TV\Sigma U^T)}=\sqrt{\lambda_{max}(\Sigma^2)}=\sigma_1$

Frobenius norm

$\left\Vert A\right\Vert_F=\sqrt{tr(A^TA)}=\sqrt{tr(U\Sigma V^TV\Sigma U^T)}=\sqrt{tr(\Sigma^2)}=\sqrt{\sum_{i=1}^m\sigma_i^2}$

Nuclear norm

$\left\Vert A\right\Vert_N=\sigma_1+\sigma_2+\dots+\sigma_m=\sum_{i=1}^m\sigma_i$

Based on the results of matrix norms, we realize that the above three matrix norms only depend on the singular values of the matrix $A$. Now we claim that the product with orthogonal matrices will not change the singular values of the original matrix. For simplicity, we call the left orthogonal matrix and right orthogonal matrix as $P$ and $Q$, respectively. Based on the singular value decomposition of $A$, we can derive the following results about the product of matrix $A$ and orthogonal matrices $P,\ Q$,

$PAQ=PU\Sigma V^TQ=U_1\Sigma V_1^T,\ U_1=PU,\ V_1=Q^TV$

where $U_1U_1^T=PUU^TP^T=I=U_1^TU_1$ means $U_1$ is an orthogonal matrix. The same is true for $V_1$. We claim that the singular values of $PAQ$ is the same as the singular values of $A$. So we realize that the product with orthogonal matrices does not affect matrix norm.

Thm(Eckart-Young-Mirsky Theorem) The optimal rank-r approximation to $A$, under the spectral norm and the Frobenius norm, respectively, is given by the rank-r SVD truncation $\tilde{A}$,

$argmin_{\hat{A},\ s.t.\ rank(\hat{A})=r}\left\Vert A-\hat{A}\right\Vert_F=\tilde{A}\quad argmin_{\hat{A},\ s.t.\ rank(\hat{A})=r}\left\Vert A-\hat{A}\right\Vert_2=\tilde{A}$

where $\tilde{A}=\tilde{U}\tilde{\Sigma}\tilde{V}^T$ is the rank-r SVD truncation of $A$. Again, $\tilde{U}$ and $\tilde{V}$ denote the first $r$ leading columns of $U$ and $V$, respectively, and $\tilde{\Sigma}$ is the $r\times r$ leading submatrix of $\Sigma$.

Proof.

One needs to show that if we have a matrix approximation $B$ whose rank is $r$ and size is $n\times m$, then $\left\Vert A-B\right\Vert_2\geq \left\Vert A-\tilde{A}\right\Vert_2$ holds. This can be done as follows.

Here we introduce the concept of matrix kernel before the discussion. The kernel of matrix $A$ is defined as
$\mathcal{N}(A)=\{x\in \mathbb{R}^n:Ax=0\}$
Therefore, we realize that the kernel of matrix $\mathcal{N}(A)$ actually means the solution space of the linear equation $Ax=0$. Based on the linear equation theory, we know that when the rank of matrix $B$ is $r$, the dimension of $\mathcal{N}(B)$ is $n-r$. Here we consider the linear space $V{r+1}$ spanned by the first $r+1$ leading right singular vectors $v_1,\ v_2,\dots,v{r+1}$, whose dimension is $r+1$. Based on the dimensional formula, we derive that
$dim(\mathcal{N}(B)\cap V_{r+1})=dim(\mathcal{N}(B))+dim(V_{r+1})-dim(\mathcal{N}(B)+V_{r+1})=n+1-dim(\mathcal{N}(B)+V_{r+1})\geq1$
which means that $\mathcal{N}(B)\cap V{r+1}\not=\empty$. Therefore, we have that there exists an vector $x=\gamma_1v_1+\cdots+\gamma{r+1}v_{r+1}$ such that
$x\in \mathcal{N}(B)\cap V_{r+1},\ \left\Vert x\right\Vert_2=1$
Based on the definition of 2-norm, we can derive that
$\left\Vert A-B\right\Vert_2\geq \left\Vert (A-B)x\right\Vert_2=\left\Vert Ax\right\Vert_2=\gamma_1^2\sigma_1^2+\cdots+\gamma_{r+1}^2\sigma_{r+1}^2\geq \sigma_{r+1}^2=\left\Vert A-\tilde{A}\right\Vert_2$
which means that the optimal rank-r approximation to $A$ under the 2-norm is given by the rank-r SVD truncation $\tilde{A}$.
Now we claim that the best rank $r$ approximation to $A$ in the Frobenius norm, denoted by $\Vert\cdot\Vert_F$, is given by
$A_r=U_r\Sigma_rV_r^T=\sum_{i=1}^r\sigma_iu_iv_i^T$
First, note that we have
$\left\Vert A-A_k\right\Vert_F^2=\left\Vert \sum_{i=r+1}^m\sigma_iu_iv_i^T\right\Vert_F^2=\sum_{i=r+1}^m\sigma_i^2$
Hence, we need to show that if $B_r$ is any rank-$r$ matrix, then,
$\left\Vert A-B_r\right\Vert_F^2\geq \sum_{i=r+1}^m\sigma_i^2$
Based on the triangle inequality with the spectral norm, if $A=A’+A’’$, then $\sigma_1(A)\leq\sigma_1(A’)+\sigma_1(A’’)$. Suppose $A_r’$ and $A_r’’$ denote the rank $r$ approximation to $A’$ and $A’’$ by SVD method described above, respectively. Then, for any $i,j\geq1$
$\begin{aligned} \sigma_i(A')+\sigma_j(A'')&=\sigma_1(A'-A_{i-1}')+\sigma_1(A''-A_{j-1}'')\\ &\geq\sigma_1(A'-A_{i-1}'+A''-A_{j-1}'')\\ &\geq\sigma_1(A-A_{i-1}'-A_{j-1}'')\\ \end{aligned}$
Here we know that $rank(A{i-1}’+A{j-1}’’)\leq i+j-2$, we can derive that

$\begin{aligned} \sigma_i(A')+\sigma_j(A'')&\geq\sigma_1(A-A_{i+j-2})\\ &=\sigma_{i+j-1}(A) \end{aligned}$

Since $\sigma_{r+1}(B_r)=0$, when $A’=A-B_r$ and $A’’=B_r$, we conclude that for $i\geq1,\ j=r+1$

$\sigma_i(A-B_r)= \sigma_i(A-B_r)+\sigma_{r+1}(B_r)\geq\sigma_{i+r}(A)$

Therefore, we have

$\left\Vert A-B_r\right\Vert_F^2=\sum_{i=1}^m\sigma_i(A-B_r)^2\geq\sum_{i=r+1}^m\sigma_i(A)^2=\left\Vert A-A_r\right\Vert_F^2$