Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombie fix attempts #1704

Merged
merged 3 commits into from
Nov 14, 2024
Merged

Zombie fix attempts #1704

merged 3 commits into from
Nov 14, 2024

Conversation

paul-buerkner
Copy link
Owner

@paul-buerkner paul-buerkner commented Nov 13, 2024

In this PR, I will try several things to fix #1658. @avehtari or @n-kall I would be gratefull if you could check it out on your systems because I currently don't have access to Ubuntu.

List of things tried out in this PR so far:

  • copy the plapply code directly into log_lik (don't call plapply itself anymore)
  • always use PSOCK clusters even on unix systems

@n-kall
Copy link

n-kall commented Nov 13, 2024

Edit: I don't get zombies on my ubuntu machine with the current master branch.

I think that first try fixed it. I don't seem to get zombies with the minimal example anymore. @avehtari how about you?

@avehtari
Copy link
Contributor

Using the github master I didn't get zombies with the minimal example, but after running at least 333 lines of brms_demo.R I get zombies when calling brms::log_lik() also with this PR

@paul-buerkner
Copy link
Owner Author

paul-buerkner commented Nov 13, 2024 via email

@avehtari
Copy link
Contributor

A new minimal example in the issue

@paul-buerkner
Copy link
Owner Author

paul-buerkner commented Nov 13, 2024

Thanks. Okay, here is my next try. It contains two changes.

First, log_lik will no longer automatically use the mc.cores option, at least for now. Instead, it will only use parallel execution if cores is explicitely specified in log_lik. That is, your latest example

library(brms)
options(brms.backend = "cmdstanr", mc.cores = 2)

m0 <- brms::brm(yield ~ 0, npk, refresh=0)
ll<-brms::log_lik(m0) # no zombies

m1 <- brms::brm(yield ~ 1, npk, refresh=0)
ll<-brms::log_lik(m1) # yes zombies

will now almost surely not created zombies anymore simply because parallel execution of log_lik is not enabled via options(mc.cores = 2) anymore. This at least prevents the issue when log_lik is more or less accidentally executed in parallel.

However,

library(brms)
options(brms.backend = "cmdstanr", mc.cores = 2)

m0 <- brms::brm(yield ~ 0, npk, refresh=0, cores = 2)
ll<-brms::log_lik(m0) # no zombies

m1 <- brms::brm(yield ~ 1, npk, refresh=0, cores = 2)
ll<-brms::log_lik(m1) # yes zombies

may still created zombies.

In an attempt to "solve" this as well, I now force the user of PSOCK clusters in plapply. Perhaps when using PSOCK clusters, zombies will no longer occur? This is what I would like you to try out next with the latest version of the log-lik-zombies branch.

@n-kall
Copy link

n-kall commented Nov 14, 2024

I confirm that with master and previous fix, @avehtari's new minimal example was producing zombies.
Now with the new patch, zombies are not created even if I add cores = 2 or cores = 16 to the brm call or the log_lik call, and I do see additional R processes spawning as expected

@n-kall
Copy link

n-kall commented Nov 14, 2024

Edit: Aside note, I accidentally passed the mc.cores and cores args to brm call and with rstan I get the following uninformative error. With cmdstanr backend I get no error

devtools::load_all()
options(brms.backend = "rstan")

m0 <- brms::brm(yield ~ 0, npk, refresh = 0)
ll<-brms::log_lik(m0) # no zombies

m1 <- brms::brm(yield ~ 1, npk, refresh = 0, mc.cores = 4, cores = 4) # here mc.cores is not an accepted arg
#> Error in x@mode :
#>   no applicable method for `@` applied to an object of class "try-error"
#> In addition: Warning message:
#> In parallel::mclapply(1:chains, FUN = callFun, mc.preschedule = FALSE,  :
#>   4 function calls resulted in an error

@paul-buerkner
Copy link
Owner Author

paul-buerkner commented Nov 14, 2024 via email

@avehtari
Copy link
Contributor

No more zombies (cmdstanr/rstan) with

m0 <- brms::brm(yield ~ 0, npk, refresh=0)
ll<-brms::log_lik(m0, cores=4) 

m1 <- brms::brm(yield ~ 1, npk, refresh=0)
ll<-brms::log_lik(m1, cores=4) 

I can also see 4 cores being used.

Also brms_demo.R does not produce zombies, but I guess it is now using just one core for log_lik computations inside powerscaling

@paul-buerkner
Copy link
Owner Author

Lovely, thank you both! I will clean up the PR and then merge :-)

@paul-buerkner paul-buerkner merged commit ed43494 into master Nov 14, 2024
0 of 5 checks passed
@paul-buerkner
Copy link
Owner Author

Merged now :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

brms::log_lik creating zombies if cmdstanr is loaded
3 participants