Skip to content

Use withTimeout when resolving remote future may corrupt its state #266

@renkun-ken

Description

@renkun-ken

I need to execute some machine learning algorithms on a remote GPU server. It is likely that some heavy training is going and all CPU and GPU resources are occupied. In this case, starting a remote process and communicating with it are both likely to hang. Therefore I need an external timeout (rather than an remote call on withTimeout as suggested in #169) to control the time it takes and if timeout occurs other measures will be taken.

library(future)

v <- R.utils::withTimeout({
  p <- remote({
    Sys.sleep(10)
    1
  }, workers = "<remote-ip>", user = "<remote-user>", persistent = FALSE, earlySignal = TRUE)

  value(p)
}, timeout = 1)

However, the behavior can be quite random as I run it repeatedly. In some cases, it works as expected, but sometimes it directly returns 1 without a sleep, and in other cases, it ends up in the following error:

Error: Unexpected result (of class ‘NULL’ != ‘FutureResult’) retrieved for ClusterFuture future (label = ‘<none>’, expression = ‘{; Sys.sleep(10); 1; }’): 

which implies that the internal state of the future seems corrupted somehow.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions