Description
Hey Rexy,
Your WRMF function looks really promising in terms of fast performance, but I've noticed some odd happenings.
First off, this is my use case:
RunNMF <- function(input, rank = NULL, max.iter = 100, rel.tol = 1e-4){
require(rsparse)
require(Matrix)
input <- Matrix(input, sparse = TRUE)
model <- WRMF$new(rank = rank, feedback = "explicit", non_negative = TRUE, solver = "cholesky")
result <- model$fit_transform(input, n_iter = max.iter, convergence_tol = rel.tol)
return(result)
}
1. WRMF converges when error loss increases, not when it begins to decrease more slowly
I noticed that your function has some instability that permits the error loss to increase from one iteration to the next. For example, here's output showing how this is the case:
> n <- RunNMF(A, rank = 10)
INFO [08:59:26.158] starting factorization with 6 threads
INFO [08:59:26.229] iter 1 loss = 51.5322
INFO [08:59:26.302] iter 2 loss = 4.6352
INFO [08:59:26.373] iter 3 loss = 2.5364
INFO [08:59:26.441] iter 4 loss = 2.6215
INFO [08:59:26.442] Converged after 4 iterations
It "Converged" at the fourth iteration, where the error loss went up from 2.5364 to 2.6215. Also, I'm not sure how you're measuring error loss. Typically it's calculated as 1- (loss.prev.iter - loss.current.iter)/loss.prev.iter
.
This observation implies there is some inherent instability in the algorithm.
2. Instability in WRMF is caused by consideration of only positive values
Here's your solver function, rewritten to highlight the specific use case above:
sparse.nnls <- function(A, X) {
res <- list()
for (i in 1:ncol(A)) {
p1 <- A@p[[i]] # index in X corresponding to start of column 1
p2 <- A@p[[i + 1]] # index in X corresponding to start of column 2, where X is only positive numbers
j <- p1 + 1:(p2 - p1) # all non-zero indices in A@x in A[,i]
A_pos <- A@x[j] # all positive values in A[,i]
ind_pos <- A@i[j] + 1 # all positive indices in A
X_pos <- X[, ind_pos] # corresponding values in X
res[[i]] <- nnls(tcrossprod(X_pos), tcrossprod(X_pos,t(A_pos)))$x # Fortran77 implementation of fnnls
}
do.call(cbind,res)
}
Note that the indices of positive values in A
are pulled from every column, and X
is then subset by those indices. NNLS is then only run on the positive indices. This is the problem. Running NNLS on only positive indices challenges the ability of NMF to infer missing values. For data where there are no missing values, this approach makes perfect sense and is awesomely fast. However, if your algorithm really is not meant to handle missing values, then that should be in big bold letters at the top of the package documentation so the theoretical implications are understood by all users immediately.
Indeed, if I rework the solver function above to a dense representation of the data:
dense.nnls <- function(A, X) {
res <- list()
for (i in 1:ncol(A)) {
res[[i]] <- nnls(tcrossprod(X), tcrossprod(X,t(A[,i])))$x # Fortran77 implementation of fnnls
}
do.call(cbind,res)
}
I no longer get instability in the error loss function.
Sorry I haven't taken the time to produce reproducible examples. I just wanted to raise this as an issue and see if you have any thoughts on the theoretical underpinnings. Maybe I'm wrong, but I'm definitely seeing something I don't see in other NMF algorithms. But... WRMF is so fast. It would be nice to make it work.