Skip to content

Will cmdstanr error out if model compilation/initialization hangs? #1044

Open
@emstruong

Description

@emstruong

I've been running cmdstanr a few tens of thousands of times via brms on a HPC node. Something I've noticed is that very rarely, I'll get an error message like

! Native call to `processx_exec` failed
Caused by error in `chain_call(c_processx_exec, command, c(command, args), pty, pty_options, …` at initialize.R:138:3:
! cannot start processx process '/tmp/RtmpF0MfcO/model_a748e80cd26feeedb6f52f5458fcda4b' (system error 13, Permission denied) @unix/processx.c:611 (processx_exec)

Or

! Native call to `processx_exec` failed
Caused by error in `chain_call(c_processx_exec, command, c(command, args), pty, pty_options, …` at initialize.R:138:3:
! cannot start processx process '/tmp/RtmpF0MfcO/model_c245fc9080e08ffee6f5db7c5de9e950' (system error 2, No such file or directory) @unix/processx.c:611 (processx_exec)

Because this does not happen every time I compile a model through brms and because it doesn't happen very frequently, I do not think it is an issue with the data I'm feeding the model or with any other aspect of the code. However, I noticed that these errors seem to happen when many thousands of models have been fitted within one session and when the system load is very high. I suspect that the compilation of the model is hanging when the system load is too high. Or perhaps it's not even compilation, but the initialization of the model that is taking too long.

So my question is: How does cmdstanr react when it's waited for the compilation of the model for too long? Is it possible that the processx library/function will give the errors I got above when the system has hanged for too long?

I found the some associated processx code used in cmdstanr here

cmdstanr/R/run.R

Lines 755 to 760 in c681d32

poll = function(ms) { # time in milliseconds
processx::poll(private$processes_, ms)
},
wait = function(s) { # time in seconds
Sys.sleep(s)
},

The HPC node is a linux system, so I don't think it's related to how cmdstanr uses processx for mac_os or wsl systems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions