Dominant Correlation
Here we introduce one of the most useful interpretation of the SVD. It is in terms of correlations among the columns of $X$ and correlations among the rows of $X$. We claim that the matrix $U$ and $V$ given by the SVD can be seen as the eigenvectors of a correlation matrix given by $XX^T$ or $X^TX$. Now we try to explain this claim.
We consider the structure of the correlation matrix $X^TX$ firstly. Since the size of the data matrix $X$ is $n\times m$, the size of the correlation matrix $X^TX$ is $m\times m$.
Based on the definition of $X^TX$, we realize that $X^TX$ is a correlation matrix among the columns of $X$. All entries of $X^TX$ are essentially an inner product between two columns of the data matrix $X$. Interestingly, if the entry $x_i^Tx_j$ is a large number, it means that the columns $x_i$ and $x_j$ are highly correlated. However, if this number is very small, it means that the columns $x_i$ and $x_j$ are nearly orthogonal, which means that the columns $x_i$ and $x_j$ are almost different. Therefore, the correlation matrix $X^TX$ can be seen as a matrix that measures the correlation among the columns of $X$.
Based on the structure of the correlation matrix $X^TX$, $X^TX$ is a symmetric and positive semi-definite matrix. It guarantees that we have non negative real eigenvalues, which have a direct correspondence on the singular values of the data matrix $X$. Now we assume that the matrix $X$ have the singular value decomposition $X=U\Sigma V^T$. We can derive that
which means that the columns of $V$ are the eigenvectors of the correlation matrix $X^TX$. The eigenvalues of the correlation matrix $X^TX$ are the squares of the singular values of the data matrix $X$. The same is true for the correlation matrix $XX^T$. We can derive that
which means that the columns of $U$ are the eigenvectors of the correlation matrix $XX^T$. The eigenvalues of the correlation matrix $XX^T$ are the squares of the singular values of the data matrix $X$. Therefore, we can conclude that the singular matrix $U$ and $V$ given by the SVD of the data matrix $X$ can be seen as the eigenvectors of the correlation matrix $X^TX$ or $XX^T$. The importance of this columns of $U$ and $V$ is quantified by eigenvalues of $X^TX$ or $XX^T$, which are the squares of the singular values of the data matrix $X$.
Here we introduce the method of snapshots to compute the singular value decomposition of the data matrix. If the data matrix $X$ is so large that you cannot actually compute it or store it all in memory, you can use the method of snapshots. However, in the vast majority of cases, we don’t recommend that you compute the SVD using correlation matrices.
Although you cannot load all columns of $X$ into memory, what you can do is calling two columns at a time. For example, you can call the first column $x_1$ and itself, then you will get the inner product $x_1^Tx_1$. After, you can call the first column $x_1$ and the second column $x_2$, then you will get the inner product $x_1^Tx_2$ and so on. You can compute the inner product by calling two columns of data matrix each time to get the correlation matrix $X^TX$. Here the resulting matrix is a $m\times m$ matrix. It is small enough that you can put it into memory and compute its eigendecomposition. Based the above discussion, we know that the righ singular vector $V$ and the singular value $\Sigma$ can be goten by this eigendecomposition. Then you can use this matrix to solve the left singular vector $U$.