Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This approach enables PCA for populations with large pedigrees.
To install the package from GitHub (it's not yet on CRAN), run:
```
devtools::install_github("HighlanderLab/RandPedigreePCA", subdir = "randPedPCA",
ref="v0.9.7", build_vignettes = T)
ref="v0.9.8", build_vignettes = T)
```
## First steps
For a demonstration, check out
Expand Down
4 changes: 2 additions & 2 deletions randPedPCA/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
Package: randPedPCA
Type: Package
Title: Fast PCA for large pedigrees
Version: 0.9.7
Version: 0.9.8
Authors@R: c(
person("Hanbin", "Lee", email = "hblee@umich.edu", role = "aut"),
person("Hannes", "Becher", email = "h.becher@ed.ac.uk", role = "cre"),
person("Ros", "Craddock", email="", role= "aut"),
person("Gregor", "Gorjanc", email="", role= "aut"))
Description: randPedPCA is for computing PCAs of large pedigrees such as found
in breeding populations. randPedPCA exploits sparse matrices and random
in breeding populations. randPedPCA exploits sparse matrices and randomised
linear algebra to deliver a gazillion-times speedup compared to naive SVD
(and eigen decomposition).
License: MIT + file LICENSE
Expand Down
1 change: 1 addition & 0 deletions randPedPCA/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ export(dspc)
export(hutchpp)
export(importLinv)
export(plot3D)
export(plot3DWithProj)
export(randRangeFinder)
export(randSVD)
export(rppca)
Expand Down
14 changes: 7 additions & 7 deletions randPedPCA/R/downsample.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@
#' as a proportion or integer or a index vector, see details.
#'
#' @details
#' The parameter `to` is used to specify and possibly which individuals are sampled.
#' If NA, all individuals are retained. If `to` is of length one and is between 0 and 1,
#' The parameter \code{to} is used to specify and possibly which individuals are sampled.
#' If NA, all individuals are retained. If \code{to} is of length one and is between 0 and 1,
#' then it is interpreted as a proportion. If it is greater than 1, it is taken to be
#' the number of individuals to be sampled (possibly rounded by sample.int). If
#' `to` is a logical or an integer vector, it is used for logical or integer indexing, respectively.
#' The integer indices of the sample individuals are written to the `ds` slot.
#' If `ds` exists, it is overwritten with a warning.
#' the number of individuals to be sampled (possibly rounded by \code{sample.int}). If
#' \code{to} is a logical or an integer vector, it is used for logical or integer indexing, respectively.
#' The integer indices of the sample individuals are written to the \code{ds} slot.
#' If \code{ds} exists, it is overwritten with a warning.
#'
#' @return An (invisible) object of class `rppca` with a slot `ds` added.
#' @return An (invisible) object of class \code{rppca} with a slot \code{ds} added.
#'
#' @export
dspc <- function(pc, to=10000){
Expand Down
6 changes: 3 additions & 3 deletions randPedPCA/R/generateSpam.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@

#' Generate spam object from L inverse file
#'
#' RandPedPCA relies in the `spam` onject format. But matrices are commonly
#' RandPedPCA relies in the \code{spam} onject format. But matrices are commonly
#' stored in other formats.
#'
#' @param pth path to matrix market file for L inverse matrix in dgTMatrix format
#'
#' @return A `spam` sparse matrix
#' @return A \code{spam} sparse matrix
#' @export
#' @importFrom methods as
#' @importFrom Matrix readMM
Expand All @@ -26,7 +26,7 @@ importLinv <- function(pth){
#'
#' @param sprs A sparse matrix.
#'
#' @return A `spam` sparse matrix
#' @return A \code{spam} sparse matrix
#' @export
#' @importFrom methods as
#' @importFrom spam as.spam.dgCMatrix
Expand Down
4 changes: 2 additions & 2 deletions randPedPCA/R/hutchpp.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ makeRandMat = function(m, n) 2*matrix(sample(1:2, size = m*n, replace = TRUE, p
#' matrix B which is related to A through some function. The oracle function has
#' to be chosen so that oracle(B, G) returns the product A %*% G. By default,
#' the oracle function is set to work on a pedigree's L inverse matrix. But this
#' inplementation is genral and should work - given a custom oracle function -
#' implementation is general and should work - given a custom oracle function -
#' on other input too.
#'
#' In the context of pedigree PCA, this is used to estimate the trace of an
#' (implicitly) centred additive relationship matrix.
#'
#' There logical paramter center allows for a pedigree's L matrix to be
#' There logical parameter center allows for a pedigree's L matrix to be
#' (implicitly) centred. This is important because centring changes the total
#' variance of the data and thus the trace of A.
#'
Expand Down
79 changes: 76 additions & 3 deletions randPedPCA/R/plot3D.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@
#' @param ... additional arguments passed to rgl::plot3d
#'
#' @details
#' Note, different to `plot.rppca`, which is relatively slow, `plot3D` does
#' not down-sample the principal components and it ignores the `ds` slot of an
#' `rppca` object if present.
#' Note, different to \code{plot.rppca}, which is relatively slow, \code{plot3D} does
#' not down-sample the principal components and it ignores the \code{ds} slot of an
#' \code{rppca} object if present.
#'
#'
#' @export
Expand Down Expand Up @@ -58,3 +58,76 @@ plot3D <- function(x, dims=c(1,2,3), xlab=NULL, ylab=NULL, zlab=NULL, ...) {
}
}
}



# internal
makeProjList <- function(A, offsets, ff=0.1){
x <- A[,1:3]
if(missing(offsets)) {
rr <- apply(A, 2, range)
dd <- apply(rr, 2, diff)

offsets <- c(max(rr[,1]) + dd[1] * ff,
min(rr[,2]) - dd[2] * ff,
min(rr[,3]) - dd[3] * ff)
}
xyProj = x[,1:3]
xyProj[,3] <- offsets[3]
xzProj = x[,1:3]
xzProj[,2] <- offsets[2]
yzProj = x[,1:3]
yzProj[,1] <- offsets[1]
list(xyProj,
xzProj,
yzProj)
}



#' 3D plot of PC scores with projections on coordinate planes
#'
#' @param pc An \code{rppca} object
#' @param dims \code{integer} \code{vector}, which PCs to plot, defauts to 1:3
#' @param plotProj \code{logical}, whether to plot the projections
#' @param grid \code{logical}, wheter to plot grids
#' @param col the dot colours, integer or string, scalar or vector
#' @param ff \code{numeric}, offset for projection (proportion of the orthogonal axis's range)
#' @param theta,phi polar coordinates in degrees. \code{theta} rotates round the
#' vertical axis. \code{phi} rotates round the horizontal axis.
#'
#' @return nothing
#' @export
#'
#' @examples
#' ped <- pedigree(pedMeta$fid,
#' pedMeta$mid,
#' pedMeta$id
#' )
#' pc <- rppca(ped)
#' plot3DWithProj(pc, col=as.numeric(factor(pedMeta$population)))
#' @export
plot3DWithProj <- function(pc,
dims=c(1,2,3),
plotProj=T,
grid=T,
col=1,
ff=0.5,
theta=-45,
phi=25){

plot3D(pc, dims=dims, col=col, size=10)
if(plotProj) {
sh <- makeProjList(pc$x[,dims], ff=ff)
rgl::points3d(sh[[1]], col=col, size=3)
rgl::points3d(sh[[2]], col=col, size=3)
rgl::points3d(sh[[3]], col=col, size=3)
}
if(grid){
rgl::grid3d("x+")
rgl::grid3d("y-")
rgl::grid3d("z-")

}
rgl::view3d(theta=theta, phi=phi)
}
87 changes: 46 additions & 41 deletions randPedPCA/R/rppca.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@



#' Generat range matrix for SVD
#' Generate range matrix for SVD
#'
#' @param L a pedigree's L inverse matrix in sparse 'spam' format
#' @param rank `integer` how many principal components to return
#' @param depth `integer` number of iterations for generating the range matrix
#' @param numVectors `integer` > `rank` to specify the oversampling for the
#' @param cent `logical` whether or not to (implicitly) centre the additive
#' relationship matrix
#' @param rank An \code{integer}, how many principal components to return
#' @param depth \code{integer}, number of iterations for generating the range matrix
#' @param numVectors An \code{integer > rank}, to specify the oversampling for the
#' @param cent \code{logical} whether or not to (implicitly) 'centre' the additive
#' relationship matrix, or more precisely, its underlying 'data matrix' L
#'
#' @return The range matrix for `randSVD`
#' @return The range matrix for \code{randSVD}
#' @export
#'
#' @importFrom spam backsolve
Expand All @@ -37,17 +37,17 @@ randRangeFinder <- function(L, rank, depth, numVectors, cent=F){

#' Singular value decomposition in sparse triangular matrix
#'
#' Uses random linear algebra, see Halko et al. (2010). Singular value
#' Uses randomised linear algebra, see Halko et al. (2010). Singular value
#' decomposition (SVD) decomposes a matrix \eqn{X=U\Sigma W^T}
#'
#' @param L a pedigree's L inverse matrix in sparse 'spam' format
#' @param rank `integer` how many principal components to return
#' @param depth `integer` number of iterations for generating the range matrix
#' @param numVectors `integer` > `rank` to specify the oversampling for the
#' @param cent `logical` whether or not to (implicitly) centre the additive
#' @param rank An \code{integer}, how many principal components to return
#' @param depth \code{integer}, the number of iterations for generating the range matrix
#' @param numVectors An \code{integer > rank} to specify the oversampling for the
#' @param cent \code{logical}, whether or not to (implicitly) centre the additive
#' relationship matrix
#'
#' @return A list of three: u (=U), d (=Sigma), and v (=W^T)
#' @return A list of three: \code{u} (=U), \code{d} (=Sigma), and \code{v} (=W^T)
#' @export
#'
#' @importFrom spam backsolve
Expand All @@ -67,14 +67,14 @@ randSVD <- function(L, rank, depth, numVectors, cent=F){
return(list(u=U[,1:rank], d=D[1:rank], v=V[1:rank,]))
}

#' Compute the number of vectors used for trace estimation
#' Compute the number of vectors to use for Hutchinson trace estimation
#'
#' Follows Skorski, M. (2021). Modern Analysis of Hutchinson’s Trace Estimator.
#' 2021 55th Annual Conference on Information Sciences and Systems (CISS), 1–5.
#' https://doi.org/10.1109/CISS50987.2021.9400306
#'
#' @param e `numeric` denoting the relative error margin
#' @param d `numeric`. 1-d is the probability the the relative error is bounded
#' @param e A \code{numeric} denoting the relative error margin
#' @param d A \code{numeric}. 1-d is the probability the the relative error is bounded
#' by e.
#'
#' @return a scalar
Expand All @@ -88,10 +88,12 @@ getNumVectorsHutchinson <- function(e, d){
#' Using Hutchinson's method
#'
#' @param L A pedigree's L inverse matrix
#' @param numVectors, an `integer` specifying how many random vectors to use
#' @param numVectors, an \code{integer} specifying how many random vectors to use
#'
#' If you do not have a good reason to do otherwise, use the function \code{hutchpp} instead.
#'
#' The higher `numVectors`, the higher the accuracy and the longer the runtime.
#' Accuracy can be estimated with the function `getNumVectorsHutchinson`.
#' The higher \code{numVectors}, the higher the accuracy and the longer the running time.
#' Accuracy can be estimated with the function \code{getNumVectorsHutchinson}.
#'
#' @return a scalar
randTraceHutchinson <- function(L, numVectors){
Expand All @@ -107,41 +109,44 @@ randTraceHutchinson <- function(L, numVectors){
return(mean(Ests))
}

#' Fast pedigree PCA using sparse matrices and random linear algebra
#' Fast pedigree PCA using sparse matrices and randomised linear algebra
#'
#' @param pdg A representation of a pedigree, see Details.
#' @param method `string` only randSVD (the default) is implemented
#' @param rank `integer` how many principal components to return
#' @param depth `integer` number of iterations for generating the range matrix
#' @param numVectors `integer` > `rank` to specify the oversampling for the
#' @param method \code{string} only randSVD (the default) is implemented
#' @param rank \code{integer} how many principal components to return
#' @param depth \code{integer} number of iterations for generating the range matrix
#' @param numVectors \code{integer > rank} to specify the oversampling for the
#' range matrix
#' @param totVar `scalar` (optional) the total variance, required for
#' @param totVar \code{scalar} (optional) the total variance, required for
#' computation of variance proportions when using an L-inverse matrix a input
#' @param center `logical` whether or not to (implicitly) centre the additive
#' @param center \code{logical} whether or not to (implicitly) centre the additive
#' relationship matrix
#' @param ... optional arguments passed to methods
#'
#' @details
#' The output slots are named like those of R's built in `prcomp` function.
#' The output slots are named like those of R's built in \code{prcomp} function.
#' Rotation is not returned by default as it is the transpose of the PC scores,
#' which are returned in `x`. `scale` and `center` are set to `FALSE`.
#' which are returned in \code{x}. \code{scale} and \code{center} are set to \code{FALSE}.
#'
#' @returns
#' A `list` containing:
#' * the principal components
#' * sdev, the variance components of each PC. Note that the total variance is
#' A \code{list} containing:
#' \describe{
#' \item{\code{x}}{the principal components}
#' \item{\code{sdev}}{the variance components of each PC. Note that the total variance is
#' not known per se and this these components cannot be used to compute the
#' proportion of the total variance accounted for by each PC. However, if
#' `nVecTraceEst` is specified, `rppca` will estimate the total variance and
#' return variance proportions.
#' * vProp, the estimated variance proportions accounted for by each PC.
#' Only returned if `totVar` is set.
#' * scale, always `FALSE`
#' * center, `logical` whether or not the implicit input matrix was centred
#' * rotation, the right singular values of the (implicit) relationship matrix.
#' Only returned if `returnRotation == TRUE`
#' * varProps, proportion of the total variance explained by each PC. Only
#' returned if starting from a pedigree object.
#' \code{nVecTraceEst} is specified, \code{rppca} will estimate the total variance and
#' return variance proportions.}
#' \item{\code{vProp}}{the estimated variance proportions accounted for by each PC.
#' Only returned if \code{totVar} is set.}
#' \item{\code{scale}}{always \code{FALSE}}
#' \item{\code{center}}{\code{logical} indicating whether or not the implicit data matrix was centred}
#' \item{\code{rotation}}{the right singular values of the relationship matrix.
#' Only returned if \code{returnRotation == TRUE}}
#' \item{\code{varProps}}{proportion of the total variance explained by each PC. Only
#' returned if starting from a pedigree object without centring, or if \code{totVar} is supplied.
#' }
#' }
#' @export rppca
#' @rdname rppca
#'
Expand Down
Loading