Skip to content

Latest commit

 

History

History
389 lines (345 loc) · 22.4 KB

TODO

File metadata and controls

389 lines (345 loc) · 22.4 KB

##-*- mode: org -*-

Very Urgent

0-dim sparseMatrices fail for all “Ops” –> ~/R/MM/Pkg-ex/Matrix/bug-0-length-Ops.R

as(m, “sparseMatrix”) must work when length(m) > max.int for ‘matrix’ m

–> see SM (3e6 x 1023) ex. in tests/Simple.R

Matrix-Bugs [#6729] 2021-06 ./tests/AAA_latest.R: provide .KhatriRao() for “general” (notably dense,complex,..) matrices

Using “long vectors” (i.e. 64 bit indices vectors) in CHOLMOD –> cholmod_l_*()

e.g. segfault in crossprod() Csparse_crossprod -> cholmod_att()

‘symmetricMatrix’ objects with *A*symmetric @Dimnames are formally “valid”, but really are not,

see FIXME in symmetricMatrix_validate() in src/dsyMatrix.c —> see TODO below: `forceCspSymmetric()`

Urgent in some sense ---------------------------------------------------

`unique()` and `duplicated()` methods for “sparseVector” & “(sparse)Matrix”; have “*.matrix” S3 methods

(partly DONE via workaround “round up” to 100): print() / show() for small options(max.print=<n>):

–> tests/Simple.R {‘max.print’}

API change: Should Matrix(diag(2), sparse=TRUE, doDiag=TRUE) not rather give “ddiMatrix” ??

Why change? Originally “ddiMatrix” etc extended denseMatrix but now sparseMatrix Currently ‘doDiag’s documentation starts as ‘doDiag: only when ‘sparse = FALSE’, ....’ This would change, and doDiag would be active also for ‘sparse = TRUE’

Do section in ./vignettes/Design-issues.Rnw (& man/symmetricMatrix-class.Rd ?) about dimnames

also mentioning forceSymmetric(); maybe that wd “inherit” arg. ‘symDimnames’ in {T,F,NA} from forceCspSymmetric().

sparse2int() using a X[…] * Y[…] construct which is too large –> Matrix bug #1330:

See FIXME in ./R/spModels.R and

https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1330&group_id=61&atid=294

S[sel,] and S[,sel] <- value should work for sparse S and NA-containing sel.

nnzero() is too slow for a large CsparseMatrix

sparse.model.matrix(.) bug with NA’s & na.action = “na.pass” => ~/R/MM/Pkg-ex/Matrix/sparse.model-bugs_EB.R

sparse.model.matrix(~ poly(x,3)) gives wrong column names => ~/R/MM/Pkg-ex/Matrix/sparse.model-bugs_EB.R

lu() should preserve dimnames in a way such that lu(A) ~= PLU . A can rebuild A.

R/

M[<sparse 2-column Matrix>] indexing should work (but with a warning: use dense!)

doxygen (seed inst/Doxyfile and ../../www/doxygen/UPDATE_me.sh) now fails partly, e.g., for

------- e.g., for src/Csparse.c, Csp_dense_products(…) around lines 600

src/CHOLMOD/MatrixOps/cholmod_symmetry.c is “cool” and fast;

Definitely should use it for solve(<dgCMatrix>) {it seems MATLAB does}; alternatively also is_sym() [in src/cs_utils.c], see below.

diagonalMatrix inherits from sparseMatrix, BUT

“ddiMatrix” does not inherit from “dsparseMatrix”, nor does “ldiMatrix” from “lparseMatrix”. Seems an undesirable inconsistency. Try changing setClass(“ddiMatrix”, contains = c(“diagonalMatrix”, “dMatrix”)) to setClass(“ddiMatrix”, contains = c(“diagonalMatrix”, “dsparseMatrix”))

Look at Paul Bailey’s problem – CHOLMOD error (even seg.fault for him)

–> ~/R/MM/Pkg-ex/Matrix/sparseOrderedLogit.R

Schur() should also get a “matrix” method,

so people like RP may stop whining about its non-availability in “base R” (2015-07-09)

BunchKaufman()’s result is not really useful yet, but it is used on C

level e.g. for solve(<dsyMatrix>). NB: is a generalized LDL’ [with pivoting!]. Should define expand() method or similar, see man/BunchKaufman-methods.Rd and R/dsyMatrix.R (at end).

src/cs_utils.c : I think is_sym() [only used in Matrix_cs_to_SEXP()] can be made sped up:

leave the for loops, as soon as is_lower == is_upper == 0.

kronecker(<symmetric>, <symmetric>) should return symmetricMatrix, notably

when one of the arguments is diagonal

as(<traditional_matrix>, “CsparseMatrix”) -> dense_to_Csparse() is inefficient:

it first copies the matrix to a dgeMatrix {re-allocating!}, then goes to sparse via cholmod_(l_)dense_to_sparse. ==>

Do this directly in C (also working around “too long // segfault problem we have there):

matrix_to_Csparse() plus .m2dgC() etc R short cuts

extend C’s matrix_to_Csparse() to optionally check for diagonal, (upper or lower) triangular, and/or symmetric case

<nsparseMatrix> %*% <sparseMatrix>, crossprod() & tcrossprod() often return

a pattern, i.e., nsparseMatrix as well because cholmod_ssmult() just does that even if only one of the two matrices is a pattern matrix. The latter case is really wrong. The above behavior seems many years old.. and sometimes is interesting and good, using Boolean arithmetic: T+T := T|T = T

For 1.2-0, changed the result to return numeric when one of the two matrices is not nsparse. ==> Provide the previous functionality via a Matrix package R function: ==> We’ve introduced ‘%&%’ for Matrix 1.2-0 and ‘boolArith = TRUE’ for crossprod/tcrossprod.

(%*% (t)crossprod, see above) Should we always return numeric, i.e.,

behave the same as for ‘ndenseMatrix’ or ‘lsparseMatrix’ or traditional logical matrices?

norm(matrix(1:4,2), type=”2”) should work as in base __AND__ we shold support type=”2” (-> svd())

[t]crossprod() could/should become more lenient with *vector*s: adapt R-devel (= R 3.2.0)’s rules:

see misc/products-Mv.R and *.Rout – now tests/matprod.R (“3.2.0”)

for sparseVector o (sparse)vector

consider analagous changes to base-R

m %*% M (crossprod, ..) with 0-dim. result give garbage

M[i,j] should not drop dimnames (R-forge bug 2556, see ~/R/MM/Pkg-ex/Matrix/dimnames-prod.R)

“Math”/”Math2” fail entirely for sparseVectors

rbind2(<sparse>, <dense>) did not work, now is completely wrong !! (e.g. <dgC>, <dge>)

qr.coef() has wrong (column)names, even in full-rank case: see man/qr-methods.Rd (“FIXME”); maybe related to

qr.R(), qrR() etc have wrong currently lose column/row names {compared to base R’s qr.R}, see,

drop0(R. <- qr.R(qx), tol=1e-15) # columns are int b1 c1 c2 b2 c3 {in man/qr-methods.Rd}

should as.matrix() eventually become a no-op, as for Rmpfr::”mpfrMatrix” ?? – NO!

Big advantages:

1) Functions such as base::scale.default() will work automagically

2) If sM <- as.matrix(<sparseVector>) .. then identical(as.matrix(sM) , sM) – not currently !!

Bigger drawbacks: Really I have to define Matrix methods for base functions that just worked perfectly via as.matrix

1a) eigen() base::eigen uses as.matrix() = asRbasematrix(); is not generic; called from nearPD()

==> I’ve introduced “Matrix” S4 methods (and hence made eigen() S4 generic)

1b) svd() same as eigen(); also called from norm(*, “2”)

{as eigen()} would also need “Matrix” S4 methods

1c) qr() needs additional dgeMatrix methods (as base::qr.default uses as.matrix())

and now warns, also, e.g., from rcond(<non-square dgeMatrix>)

2) base :: lower.tri() and upper.tri() also use as.matrix() but are not generic => would need to make them S4 genric

for now: just redefinition in inst/test-tools-Matrix.R notably for CheckMatrix(), but also

for use in diverse ./tests/*.R.

For R-devel (-> 3.5.0 in April 2018): lower.tri() / upper.tri() do not use as.matrix()

3) Documented in more than one place that base functions work thanks to as.matrix()

How to go there: For a while as.matrix() should give deprecation warning: use as(*,”matrix”) and

---- give substitute .asmatrix(), but that’s not faster; simply calls as(*,”matrix”)

In R/Auxiliaries.R .Matrix.avoiding.as.matrix <- TRUE – for experiments only

turn off warning via options(Matrix.quiet.as.matrix = TRUE)

BunchKaufman() got a “matrix” method.

New smallish ideas, relatively urgent for MM -----------------------------

qr1() as non-pivoting rank-correcting – .Call(lapack_qr, ..) in src/dense.c

generalize new “indMatrix” class, to allow 0 repetitions

of some samples, i.e., columns of all 0 s. It’s mathematically more natural –> typically will be useful.

polnish translation (e-mail!)

FIXME(2) and (3) in R/products.R: t(.Call(Csparse_dense_*))

cor(<sparseMatrix>) and cov(<sparseMatrix>) at least for y=NULL (“no y”).

-> ~/R/MM/Pkg-ex/Matrix/cor_sparse-propos.R <- http://stackoverflow.com/questions/5888287/ -> ~/R/MM/Pkg-ex/Matrix/cor_cos.R and ~/R/MM/Pkg-ex/Matrix/cor_cos_testing Provide cor.sparse() and other association measures for sparse matrices.

Add larger collection of random matrix generator functions,

typically sparse ones: Have rsparseMatrix() [exported] already; then rspMat(), rUnitTri(), mkLDL() [!] in inst/test-tools-Matrix.R ; then, e.g., rBlockTri() in man/bdiag.Rd. (man/* ?; tests/* )

port isSeq() to C [ R/Auxiliaries.R ]

Investigate the “band changing (and getting) ideas ‘band<-’ etc,

from Jeremy D Silver, per posts to R-devel on Aug.26,2011 {MM: ~/R/MM/Pkg-ex/Matrix/bands-Jeremy_Silver-ex.R }

Similarly (maybe covered by his suggestion?): provide inverse of bandSparse()

in the sense that if ‘dg.mat’ is a (“LINPACK/EISPACK”-format) dense (n x K) matrix containing K diagonals, and BS <- bandSparse(.., diagonals=dg.mat); dg.m <- getbands(BS,..) would exactly return the ‘dg.mat’ matrix.

finalize and activate the unused code in src/t_sparseVector.c

cbind2() / rbind2() for sparseMatrices: dimnames propagation should

happen in C, see R/bind2.R and src/Csparse.c (Csparse_horzcat etc).

use getOption(“Matrix.quiet”) in more places [–> less messages/warnings]

Check for DimNames propagation in coercion and other operations.

for (%*%, crossprod, tcrossprod), now systematically checked in tests/matprod.R

For colSums(), rowSums() [R-forge bug # 6018] –> ‘FIXME’ in R/colSums.R

Report the problem in the Linux ldexp manual page. The second and

third calls in the Synopsis should be to ldexpf and ldexpl.

provide methods for “dspMatrix” and “dppMatrix”!

2012-07: DONE with Ops, etc, also pack() / unpack(); not yet: “Math”

“corMatrix” extends “dpoMatrix”.. – but we miss a packed corMatrix: “copMatrix” or “crpMatrix”

(well, this is “related to” the fact that we do not have too many packed matrix methods).

combine the C functions for multiplication by special forms and

solution wrt special forms by using a ‘right’ argument and a ‘classed’ argument. [done with dgeMatrix_matrix_mm(); not yet for other classes; and for _crossprod()]

Cache ‘@factors’ components also from R, e.g., for “Tsparse..”

via .set.factors()

chol() and Cholesky() caching unfinished: the name [Ss][Pp][Dd]Cholesky

depends on (perm, LDL, super) arguments:

.chkName.CHM(name, perm, LDL, super) and .CHM.factor.name()

use the above

partly DONE; new arg ‘cache=FALSE’: allow cache=FALSE to disable the caching

0-based vs 1-based indexing: grep -nHE -e ‘[01]-(orig|ind|base)’ *.R

Can I find a uniform language ‘1-based indexing’ or ‘0-origin indexing’ ?

More systemtic possible via new argumnet ‘orig_1’ in m_encodeInd(), m_encodeInd2()

-> src/Mutils.c

Generalization of Existing Classes and Methods ---------------------------

“Math2” , “Math”, “Summary”: keep diagonal, triangular and symmetric Matrices

when appropriate: particularly desirable for “Math2”: round(), signif()

“Arith” (and Ops ?): keep diagonal, triangular and symmetric Matrices where appropr.

For triangular matrices, ensure the four rules of “triangular matrix algebra”

(Golub+Van Loan 1996, 3.1.8, p.93)”

since 2008-03-06 for Csparse

since 2010-07-23 for <dtr> %*% <dtr>

e.g. for <ltr> %*% <dtC>

“d” <-> “l” coercion for all “[TCR]” sparse matrices is really trivial:

“d” -> “l” : drops the ‘x’ slot “l” -> “d” : construct an ‘x’ slot of all ‘1’ We currently have many of these conversions explicitly, e.g. setAs(“dsTMatrix”, “lsTMatrix”, function(from) new(“lsTMatrix”, i = from@i, j = from@j, uplo = from@uplo, Dim = from@Dim, Dimnames = from@Dimnames)) but I would rather want to automatically construct all these coercion methods at once by a “method constructor”, i.e., for all “dsparse*” -> “lsparse*” and vice versa. How can one do this {in a documented way} ?

Think of constructing setAs(…) calls automatically in order to

basically enable all “sensible” as(fromMatrix, toMatrix) calls, possibly using canCoerce(.)

When we have a packed matrix, it’s a waste to go through “full” to “sparse”:

==> implement setAs(“dspMatrix”, “sparseMatrix”) setAs(“dppMatrix”, “sparseMatrix”) setAs(“dtpMatrix”, “sparseMatrix”) and the same for “lsp” , “ltp” and “nsp” , “ntp” !

tcrossprod(x, y) : do provide methods for y != NULL

calling Lapack’s DGEMM for “dense” [2005-12-xx: done for dgeMatrix at least]

Factorizations: LU done; also Schur() for sparse Matrices.

use .Call(Csparse_drop, M, tol) in more places,

both with ‘tol = 0.’ to drop “values that happen to be 0” and for zapsmall() methods for Csparse*

implement .Call(Csparse_scale, ....) interfacing to cholmod_scale()

in src/CHOLMOD/Include/cholmod_matrixops.h : for another function specifically for multiplying a cholmod_sparse object by a diagonal matrix. Use it in %*% and [t]crossprod methods.

make sure all group methods have (maybe “bail-out”) setMethod for “Matrix”.

e.g. zapsmall(<pMatrix>) fails “badly”

<sparse> %*% <dense> {also in crossprod/tcrossprod} currently always

returns <dense>, since –> Csparse_dense_prod –> cholmod_sdmult and that does only return dense. When the sparse matrix is very sparse, i.e. has many rows with only zero entries, it would make much sense to return sparse.

! <symmetricMatrix> loses symmetry, both for dense and sparse matrices.

!M where M is “sparseMatrix”, currently always gives dense. This only makes sense when M is “really sparse”.

diag(m) <- val currently automatically works via m[cbind(i,i)] <- val

This (`[<-` method) is now “smart” for diagonalMatrix, but needs also to be for triangularMatrix, and probably also “dense*general*Matrix” since the above currently goes via “matrix” and back instead of using the ‘x’ slot directly; in particular, the triangular* “class property” is lost! [current ??]

”[<-” now uses src/t_Csparse_subassign.c (no memory explosion). However it’s still too slow

when the replacement region is large, or also when do many millions of one-element assignments (say in a 100’000^2 Matrix).

Cholesky(), chol() etc ---------------------------------------------------

chol() should “work”: proper result or “good” error message.

(mostly done ?)

example(Cholesky, echo=FALSE) ; cm <- chol(mtm); str(cm); str(mtm)

shows that chol() does not seem to make use of an already present factorization and rather uses one with more ‘0’ in x slot.

examples for solve( Cholesky(.), b, system = c(“A”, “LDLt”....))

probably rather in man/CHMfactor-class.Rd than man/Cholesky.Rd

LDL(<CHMsimpl>) looks relatively easy; via “tCsparse_diag()”

{diagonal entries of triangular Csparse} –> see comment in determinant(<dsC>) in R/dsCMatrix.R, will give faster determinant

Allow Cholesky(A,..) when A is not symmetric AND

we really mean to factorize AA’ ( + beta * I)

update(Cholesky(..), ): make *also use of the possibility to update

with non-symmetric A and then AA’ + mult * I is really meant. .updateCHMfactor() ## allows that already(?)

add examples (and tests!) for update(<CHMfactor>, ..) and

Cholesky(......, Imult), also tests for hidden {hence no examples} ldetL2up() { R/CHMfactor.R }; see ex in man/wrld_1deg.Rd MM: See e.g. ~/R/MM/Pkg-ex/Matrix/CholUpdate.R – for solve(<CHM>, <type>)

implement fast diag(<triangularCsparse>) via calling new

src/Csparse.c’s diag_tC_ptr() .

  • diag_tC_ptr() functionality now exported via R/dsCMatrix.R .diag.dsC() : the name is silly, but functionality nice. See (hidden) example in man/Cholesky.Rd

chol(<nsCMatrix>) gives “temporarily disabled”

but should give the symbolic factorization; similarly Cholesky(.) is not enabled

“Basic” new functionality – “nice to have” (non-urgent) -----------------

tr(A %*% B) {and even tr(A %*% B %*% C) …} are also needed

frequently in some computations {conditional normal distr. …}. Since this can be done faster than by sum(diag(A %*% B)) even for traditional matrices, e.g. sum(A * t(B)) or {sometimes even faster for “full” mat} crossprod(as.vector(A), as.vector(t(B))) and even more so for, e.g. <sparse> %*% <dense> {used in Soeren’s ‘gR’ computations}, we should also provide a generic and methods.

diag(A %*% B) might look like a “generalization” of tr(A %*% B),

but as the above tricks show, is not really. Still, it’s well worth to provide diag.prod(A, B):

Well, if A %*% B is square, diag(A %*% B) === colSums(t(A) * B) and we should probably teach people about that !

eigen() should become generic, and get a method at least for diagonal,

but also for symmetric -> dsyMatrix [LAPACK dsyev() uses UPLO !], but also simply for dgeMatrix (without going via tradition matrices). What about Sparse? There’s fill-in, but it may still be sensible, e.g. mlist <- list(1, 2:3, diag(x=5:3), 27, cbind(1,3:6), 100:101) ee <- eigen(tcrossprod(bdiag(lapply(mlist, as.matrix)))) Matrix( signif(ee$vectors, 3) )

Everything else aka “Miscellaneous” --------------------------------------

qr.R(qr(x)) may differ for the “same” matrix, depending on it being

sparse or dense: “qr.R(<sparse>) may differ from qr.R(<dense>) because of permutations”

column names are not produced, whereas dense qr.R(.) has column names.

We provide `qrR()` .. but not entirely happily:

Users are still a bit frustrated and it currently influences rcond() as well.

rcond(<sparseMatrix>) for square currently goes via dense – BAD –

can we go via qr() in any case? In some cases, e.g. lmer()’s “Lambda” (block triangular, small blocks) rcond(L) := 1 / (norm(L) * norm(solve(L))) is simple {and remains sparse, as solve(L) is still block triangular}

facmul() has no single method defined; it looks like a good idea though

(instead of the infamous qr.qy, qr.qty,.... functions)

symmpart() and skewpart() for sparse matrices still use (x +/- t(x))/2

and could be made more efficient. Consider going via asTuniq() or something very close to .Arith.Csparse() in R/Ops.R For a traditional “matrix” object, we should speedup, using C code ..

many setAs(*, “[dl]..Matrix”) are still needed, as long as e.g.

replCmat() uses as_CspClass() and drop0(.) which itself call as_CspClass() quite a bit. –> try to replace these by as(*, “CsparseMatrix”); forceSymmetric, etc.

writeMM(obj, file=stdout()) creates file “1” since file is silently

assumed to be a string, i.e. cannot be a connection. An R (instead of C) version should be pretty simple, and would work with connections automatically [“lsparse” become either “real” or “pattern”, “depending if they have NAs or not].

<diagMatrix> o <ddenseMatrix> still works via sparse in some cases, but

could return <diagMatrix> in the same cases where <diagMatrix> o <numeric> does.

look at solve.QP.compact() in \pkg{quadprog} and how to do that using

our sparse matrices. Maybe this needs to be re-implemented using CHOLMOD routines.

We allow “over-allocated” (i,x)-slots for CsparseMatrix objects,

as per Csparse_validate() and the tests in tests/validObj.R. This is as in CHOLMOD/CSparse, where nzmax (>= .@p[n]) corresponds to length(.@i), and makes sense e.g. for M[.,.] <- v assignments which could allocate in chunks and would not need to re-allocate anything in many cases. HOWEVER, replCmat() in R/Csparse.R is still far from making use of that.

Thanks to base::rbind, cbind now doing S4 dispatch on C level

In all(M1 == M2) for sparse large matrices M1, M2 (e.g. M2 <- M1 !),

the intermediate ‘M1 == M2’ typically is dense, hence potentially using humongous amount of memory. We should/could devise something like allCompare(M1, M2, `==`) which would remain sparse in all its computations.


Reconsider the linkages in the include files for the SuiteSparse

packages. It may be better simply to add all the src/<nm>/Include directories to the include path for all compilations. I don’t think there is a big overhead. Right now we need to modify the include file src/SPQR/Include/SuiteSparseQR_C.h so that it does not expect to have src/UFsparse and src/CHOLMOD/Include on the include path. Maybe just those two should be added to the include path.

(systematically check that LAPACK-calling functions check for

0-dimensional input themselves; LAPACK gives an integer error code)

the f[,5762] <- thisCol now go via Csparse_subassign() call …

[ in tests/indexing.R ]. Still would be nice to be able to use abIndex (see replTmat in R/Tsparse.R)

{IS THIS CURRENT?}

Sept. 2009: Subject: chol2inv() |-> solve(<CHMfactor>)

when testing and documenting chol2inv(), I found that it’s pretty simple to also define a method for “CHMfactor” objects, namely simply the solve(*, Diagonal(.) “A”) method. This is not particularly exciting, and also does not, I think help for defining a chol2inv() method for sparse (upper) triangular matrices.

sort(<sparseVector>, partial=..), needed, for mean(*, trim = .) or median().

Note that defining xtfrm() does not “help” (as sort() then goes via dense index). See “mean” in R/Matrix.R

How can we ensure that inst/include/cholmod.h remains

correct and equivalent to src/CHOLMOD/Include/cholmod_core.h and siblings ??? {currently need to do this manually (Emacs M-x compare-windows) for the typedefs}

SMALL_4_Alloca := 10000; check all uses of alloca()/Alloca() in src/*.[ch]

ensuring that the size allocated cannot grow with the vector/matrix/nnzero sizes of the input. [see the change needed in svn r2770 in src/dtCMatrix.c !]