MOSES is a new take at tackling the problem of (M)emory-limited
(O)nline (S)ubspace (Es)timation; in particular, MOSES can estimate
the principal components of data and also reduce its dimension. In
terms of its origins, MOSES slightly generalises the popular
incremental Singular Vale Decomposition (iSVD) to work with
thin blocks of data. This simple generalisation is in part what
allows us to complement MOSES with a comprehensive statistical
analysis that was not available for incremental SVD, despite its
empirical success (more here). This generalisation also
enables us to concretely interpret MOSES as an approximate solver
for the underlying non-convex optimisation problem. We also find
that MOSES shows state-of-the-art performance in our numerical
experiments with both synthetic and real-world datasets while
being orders of magnitude faster than its competitors when the
ambient dimension (n
) is large (>1000).
The code is generally self contained and all datasets are included or
generated thus, in theory, just having Matlab
installed should be more
than enough. It has to be noted though that due the recent Matlab
changes
on how it handles character and string arrays you should use a recent
version of it -- the code was developed and tested in Matlab
2017b
and
tested also on versions 2017a
, 2018a
; moreover, to address different
OSes, care has been taken so that this code runs without any problems both
on Windows-based machines as well as Unix-based ones.
In this instance we perform a comparison using both synthetic and real data against a few similar methods which compute in part or fully an approximate memory-limited, streaming r-truncated SVD. These methods are the following:
- MOSES (https://arxiv.org/abs/1806.01304)
- Power Method (https://arxiv.org/abs/1307.0032)
- Frequent Directions (https://arxiv.org/abs/1501.01711)
- Robust Frequent Directions (https://arxiv.org/abs/1705.05067)
- GROUSE (https://arxiv.org/abs/1702.01005)
Running the comparison is simple -- just cd
to the cloned moses
directory within Matlab
and run comparison.m
. Running might take a
while, if you want to speed things up just try altering the setup
parameters shown below:
% experiments to run
run_synthetic = 1; % run synthetic evaluation (set 0 to skip)
run_real = 0; % run real data evaluation (set 0 to skip)
run_speed_test = 0; % run the calc. speed tests (set 0 to skip)
run_moses_scaling = 0; % run the scaling moses tests (set 0 to skip)
% global flags setup
% printing flags
pflag = 1; % print resulting figures to ./graphs/
pdf_print = 0; % print resulting figures as .pdf
fig_print = 1; % print resulting figures as .fig
% execution configuration
use_fast_moses_only = 1;% speed up by using fast moses <-- USE IT :)
use_offline_svds = 1; % drastically speed up execution by disabling
% offline svds calculation WARNING THIS OPTION IS
% PAINFULLY SLOW. <- DEF. DISABLE IT :)
use_blk_err = 0; % calc. errors per block not per column
% provides a DRASTIC improvement in speed but less
% granular error reporting. For GROUSE it is 100
% for PM and MOSES is equal to their respective
% block sizes for each run. <- Prob. use it
% moses scaling flags
run_full_scaling = 0; % no need to run unless performing full error check
run_exp1 = 1; % run experiment 1: fixed b, r, variable n
run_exp2 = 1; % run experiment 2: fixed r, n, variable b
run_exp3 = 1; % run experiment 3: fixed b, n, variable r
Please note that you can tweak the relevant section values if you want to run slightly different experiments but if you want to reproduce the results in the paper please leave these values as-is.
The synthetic dataset is measured using random vectors drawn from a power
law distribution with the following alpha values in this instance: 0.01
,
0.1
, 0.5
, and 1
while lambda always set to 1
. Practically speaking
this is eloquently materialised by using the following segment:
% generate the singular spectrum
dd = lambda*(1:n).^(-alpha);
% generate Sigma
Sigma = diag(dd);
% random initialization of S basis
S = orth(randn(n));
% given S and Sigma generate the dataset (Y)
Y = (S * Sigma * randn(n, T))/sqrt(T-1);
The real datasets are the the ones supplied with this paper retrieved from here and they are the following:
- Light Data (48x7712)
- Humidity Data (48x7712)
- Volt Data (46x7712)
- Temperature Data (56x7712)
To compare MOSES against Power Method, FD/RFD, and GROUSE we employ the following two metrics:
- The Frobenius norm of
Yr
vsY
columns seen so far normalised using their respective arrival time. - The final Frobenius norm of
Yr
vsY
normalised by the finalT
.
For speeding up the computation there are two modes that calculate the over-time errors. The first one is per-block while the second is per-column the latter being significantly slower than the former. For completeness, in our paper we used the per-column error calculation throughout but similar error metrics can be achieved by using the faster per-block errors.
Toggling this particular mode is done by setting use_blk_err
in comparison.m
to either 1
(on) / 0
(off).
The error metrics are calculated using the Frobenius norm for the
matrix columns seen so far normalised by the current time. The full
formula to find the error at column k
would be:
ErrFro(k) = sum(sum((Y(:, 1:t)-YrHat_c).^2, 1))/t;
Where YrHat_c
is:
SrHatTemp = SrHat(:, 1:r); % r-truncation of the SVD
% SrHat in this instance is the previous block subspace estimation
YrHat_c = (SrHat*SrHat')*Y(:, 1:k*B);
The other metric is the Frobenius norm difference of the three Yr
approximation methods vs. the original values Y
normalised using the total number of columns seen (T
).
The formula is:
froT = n * Sum_{1}_{n} [ (Y_i - Yr_i)^2 ] / T
This metric shows us another view of how different these resulting matrices are when compared to the original ones.
A number of plots are generated while running the comparison and for
convenience they are printed into a generated directory under the
graph
directory. Each directory is named using the current
timestamp upon creation as its name and the timestamp format
follows the ISO-8601 standard.
Additionally, the printing function is flexible enough to able to export
in three commonly used formats concurrently -- namely png
, pdf
, and
fig
for easier processing. Of course, by toggling the appropriate flags
printing to pdf
and fig
can be disabled thus saving space. For
brevity these are the following:
% printing flags
pflag = 1; % print resulting figures to ./graphs/
pdf_print = 0; % print resulting figures as .pdf
fig_print = 1; % print resulting figures as .fig
A number of different plots are generated for the synthetic data, these are explained below:
-
The first one is a
subplot
containing two plots; the first one shows the scree plot of the singular values while the second one shows the mean Frobenius normalised error for the three methods over time for the total number of simulations run for that particular alpha value. -
The second plot is the second
subplot
from above without the scree plot. -
The third plot shows the averaged error over time for the number of trials performed but just for Power Method and MOSES; this is done in order to have a better comparison as FD and GROUSE results sometimes skewed the graph.
-
The fourth plot is again a
subplot
which the first one shows the resulting final error per iteration for Offline SVD, MOSES, Power Method, FD, and GROUSE; while the second one shows the error for just MOSES and Power Method as they are closer. -
The fifth, and final, plot for the synthetic data shows the final error per iteration for only the Power Method and MOSES.
We only plot the Frobenius normalised error in a streaming fashion as
the columns are revealed for Yr
vs Y
to the three methods
over time for each of the four datasets described previously;
for readability, all figures are generated in a logarithmic scale
using semilogy
. Before processing the datasets it is important to note
that the data are centered using the following:
% update the column number
cols = size(Y, 2);
% center the dataset by using Y = Y - ((Y*(vec_ones*vec_ones'))./cols)
vec_ones = ones(cols, 1);
Y = Y - ((Y*(vec_ones*vec_ones'))./cols);
For speed tests we tests how the algorithms perform as the ambient
dimension (n
) and rank (r
) change. For our experiments we used
three rank (r
) values: r = [1, 10, 50]
and an ambient dimension
range: n = 200:200:1200
. In this instance three plots are
generated for the tests, which are the following:
- Speed for all three method when using:
r=1
andn = 200:200:1200
- Speed for all three method when using:
r=10
andn = 200:200:1200
- Speed for all three method when using:
r=50
andn = 200:200:1200
To evaluate how MOSES scales with respect to block size (b
),
rank (r
), and ambient dimension (n
) we performed three tests
in which we fixed two out of the three parameters and plotted
the average error out of 10
trials. The three resulting
plots are the following:
- Scaling experiment 1: fixed
r=15
,b=2r
, andn = 200:200:1200
- Scaling experiment 2: fixed
r=15
,n=1200
, andb= r:1:15r
- Scaling experiment 3: fixed
n=1200
,b=2r
, andr= 5:5:25
The code is organised mainly in the following files:
comparison.m
: The main starting point of the experiments, initial parameters are defined there.fd.m
: Implementation of Frequent Directionsfd_rotate_sketch.m
: helper method for both Frequent Directions methodsfdr.m
: Implementation of Robust Frequent Directionsgrouse.m
: OriginalGROUSE
algorithm code as provided from the papermitliag_pm.m
: Implementation of Mitliagkas Power Method for Streaming PCAmoses_fast.m
: A more efficient implementation of MOSESmoses_scaling.m
: function that is responsible for running the scaling experiments for MOSESmoses_simple.m
: A simple implementation of MOSESmy_grouse.m
: Wrapper to rungrouse.m
which sets execution parameters (as seen here)my_toc.m
: function that processes thetoc
with better formattingonline_svds_real.m
: function which compares the three methods against a real datasetonline_svds_synthetic.m
: function which compares the three methods against a synthetic datasetprint_fig.m
: function that, if enabled, outputs the graphs generated for current run as images orpdf
README.md
: This file, a brief README file.real_dataset_eval.m
: function which is a wrapper foronline_svds_real
that processes its results and draws the plotssetup_vars.m
: function which sets the correct variables based on OSsynthetic_data_gen.m
: function which generates a matrix with random vectors from a power law distributionsynthetic_dataset_eval.m
: function which is a wrapper foronline_svds_synethic
that processes its results and draws the plots
This code is licensed under the terms and conditions of GPLv3 unless otherwise stated. The actual paper is governed by a separate license and the paper authors retain their respective copyrights.
If you find our paper useful or use this code, please consider citing our work as such:
@article{eftekhari2019moses,
title={MOSES: A Streaming Algorithm for Linear Dimensionality Reduction},
author={Eftekhari, Armin and Hauser, Raphael and Grammenos, Andreas},
journal={IEEE transactions on pattern analysis and machine intelligence},
year={2019},
publisher={IEEE}
}
Update: As of May 2019 MOSES has been published in TPAMI - yay!
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.