E.g. 1, Deriving PCA by minimizing MSE
,
is the
-th sample with m dimension. Assume for simplicity that
has zero mean.
,
is the
-th basis vector with m dimension.
,
is the low dimension representation of
.
The optimization problem of PCA is
We can simplify the above problem by using and
, as
Introducing the Lagrange multipliers and
, the optimization problem is equivalent to
Therefore . Let it be
, we get
Left multiply the equation by and use eq. (2) -
and eq. (3) -
, we get
from which we can see is the eigenvalue of
and
is the corresponding eigenvector.
Substitute eq. (4) into eq. (1),
therefore ,
, ...,
should be the largest k eigenvalues.
or
In the above, we haven't used any differential technique, because we haven't defined the derivative of vector-by-matrix which could be a 3D tensor. However, in some cases such as
(w.r.t.
), the differential technique still works (see this example).
E.g. 2, , where
is a
matrix and
is a
matrix
In the above, we haven't used any differential technique, because we haven't defined the derivative of matrix-by-matrix which could be a 4D tensor. However, in some cases such as
, the differential technique still works (see this example). Besides, there is another excellent example of
: derivative of SVD - https://arxiv.org/pdf/1509.07838.pdf.