Skip to content

Unnecessary overwriting in mice.impute.2l.bin()? #686

Open
@devincaughey

Description

@devincaughey

Forgive me if I am missing something, but I think I found a very minor issue in the following section of mice.impute.2l.bin():

    clmis <- x[wy, clust]
    for (i in clmis) {
        bi.star <- t(MASS::mvrnorm(n = 1L, mu = rep(0, nrow(psi.star)), 
            Sigma = psi.star))
        idx <- wy & (x[, clust] == i)
        logit <- X[idx, , drop = FALSE] %*% beta.star + Z[idx, 
            , drop = FALSE] %*% matrix(bi.star, ncol = 1)
        vec <- rbinom(nrow(logit), 1, as.vector(1/(1 + exp(-logit))))
        if (is.factor(y)) {
            vec <- factor(vec, c(0, 1), levels(y))
        }
        y[idx] <- vec
    }

Each i in the for loop is a level-1 observations value of the level-2 cluster variable clust. However, the result of each iteration is to assign a vector of imputed values vec to y[idx], which contains every level-1 observation in cluster i. If there are, say, 1000 observations in i, then y[idx] is unnecessarily overwritten 999 times. I think this could be avoided by changing the first line to clmis <- unique(x[wy, clust]). Not a big deal, but perhaps a little more efficient.

As an aside, I noticed this issue because I rewrote this section of mice.impute.2l.bin() so that instead of drawing each cluster intercept bi.star anew from its marginal distribution, it only does so for clusters that were dropped from the imputation model because they have no observed values of y. For clusters with observed values, the posterior prediction conditions on the random effect estimate from the imputation model. This is contrary to what is described in this paper (p. 163), which does not include random effects among the parameters conditioned on. However, it struck me as more natural than acting as if the missing observations in a partially observed cluster were independent of the observed observations in the same cluster. Of course, if this idea is wrong-headed, I'd be happy to be corrected. Here's the relevant code implementing my approach:

    for (i in clmis) {
        ri <- rownames(rancoef) == i    # Which random coefficient is cluster i's?
        if (any(ri)) {                  # If any coefficient is i's...
            bi.star = t(rancoef[ri, ])  # ...use it for imputation.
        } else {                        # If not...
            bi.star <-                  # ...draw a new one.
                t(MASS::mvrnorm(n = 1L, mu = rep(0, nrow(psi.star)), 
                                Sigma = psi.star))
        }

Thanks,
Devin

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions