Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of threads decreased to 1 after re-entering RStudio Server session #112

Closed
renkun-ken opened this issue Dec 8, 2017 · 9 comments
Closed
Assignees

Comments

@renkun-ken
Copy link
Contributor

renkun-ken commented Dec 8, 2017

I'm using the latest development version of fst and I find it quite mysterious that after re-entering my RStudio session, the number of threads indicated by threads_fst() is changed to 1 from 40.

Steps to reproduce:

  1. Start a new session in RStudio
  2. Run fst::threads_fst() which, on my server, returns 40
  3. Refresh the webpage of RStudio, leading to re-entering the session
  4. Run fst::threads_fst() again and the number of threads becomes 1

My session info:

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.3 parallel_3.4.3 tools_3.4.3    yaml_2.1.15    Rcpp_0.12.14   fst_0.7.3     

I'm using RStudio Server 1.1.383.

@renkun-ken renkun-ken changed the title Number of threads decreased to 1 after re-enter RStudio session Number of threads decreased to 1 after re-entering RStudio Server session Dec 8, 2017
@MarcusKlik
Copy link
Collaborator

Hi @renkun-ken, thanks for reporting that!

If I understand correctly you have a web interface to RStudio Server and the actual R session is running remotely on the server.

So what happens exactly when you refresh the web-page, the servers starts up a completely new R` session and kills the previous one?

The only way that the number of cores would be set to 1 would be if fst can't detect OpenMP when re-entering. That would be strange but can be tested with:

fst:::hasopenmp()  # TRUE if OpenMP detected
#> [1] TRUE

would you be so kind to test that?
The other reason for the number of threads to be set to 1 is when fst thinks it's in a forked session. The logic used there is comparable to that used in the data.table package. Would it be possible to test if data.table has the same problem using:

data.table::getDTthreads()
#> [1] 8

Thanks!

@MarcusKlik MarcusKlik self-assigned this Dec 8, 2017
@MarcusKlik MarcusKlik added the bug label Dec 8, 2017
@MarcusKlik MarcusKlik added this to the Fst package v0.9.0 milestone Dec 8, 2017
@renkun-ken
Copy link
Contributor Author

renkun-ken commented Dec 8, 2017

I do some tests with both fst::threads_fst() and data.table::getDTthreads() and it seems that RStudio Server re-entered R session may be a forked one. Here's my test code:

while (TRUE) {
  cat("[", format(Sys.time()), "] fst::threads_fst() = ", fst::threads_fst(), 
    ", data.table::getDTthreads() = ", data.table::getDTthreads(), "\n", sep = "")
  Sys.sleep(1)
}

On 21:58:00 I close the webpage. A while later I re-enter the session and see the logging:


[2017-12-08 21:58:00] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:01] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:02] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:03] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:04] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:05] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:06] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:07] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:08] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:09] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:10] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:11] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:12] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:13] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:14] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:15] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:16] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:17] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:18] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:19] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:20] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:21] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:22] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:23] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:24] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:25] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:26] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:27] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:28] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:29] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:30] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:31] fst::threads_fst() = 1, data.table::getDTthreads() = 1

It's quite clear that the R session is not suspended but the moment I re-enter the session at 21:58:23 I may have entered a forked session so that the threads decreased to 1.

I'm not sure why it behaves in this way. Maybe it's not an issue of fst and data.table but this behavior surely makes it less predictive to use RStudio Server with fork-detecting packages. I'll consider raising issues on both data.table and RStudio.

@MarcusKlik
Copy link
Collaborator

MarcusKlik commented Dec 8, 2017

Hi @renkun-ken, that's a smart way of testing that, nice work!

In data.table's code, there is an explanation why an OpenMP should not switch back to multi-threaded mode after parallel's fork has completed (that causes problems on the Intel compiler), so it is left to the user to switch to more threads again. I followed that advice for fst, so therefore we can't really determine from your experiment whether the fork was very brief (perhaps only to facilitate entering) or stays also after the re-entering.

I could add some code to check that or make it the user's choice to switch back to multi-threaded mode after the fork was ended, say:

fst::threads_fst(8, reset_after_fork = TRUE)
#> [1] 8

That would be an option at the users own risk however :-)

@renkun-ken
Copy link
Contributor Author

renkun-ken commented Dec 9, 2017

Thanks for referring to the data.table's code and clarify. I'd prefer not making it more complex. I'll use threads_fst() before calling fst functions if I want multi-threading at the moment.

@renkun-ken
Copy link
Contributor Author

renkun-ken commented Dec 19, 2017

After some intensive use, I prefer adding threads= to both read_fst and write_fst becase it's too easy to let threads fall back to 1 using RStudio Server or calling any mclapply. @MarcusKlik what do you think?

@MarcusKlik
Copy link
Collaborator

MarcusKlik commented Dec 19, 2017

Hi @renkun-ken, thanks, yes that would be better than setting with fst::threads_fst every time before you call fst::write_fst. Especially because fst also switches back to single threaded mode after some other code or package produces a fork (the user might not even notice as with the RStudio server setup).

Judging from the data.table issues, we have to switch back to prevent problems in some cases. Perhaps a dual option would be most useful, so when the user does:

# set number of threads to 10
fst::write_fst(dt, "myfile.fst", theads = 10)

that amount of threads is set regardless of any other setting. And with:

fst::write_fst(dt, "myfile.fst")

the default thread behavior is used. That default can be set with:

fst::threads_fst(8, single_threaded_on_fork = TRUE, reset_after_fork = FALSE)

That specifies the threading during and after a fork.
Would that be a good option?

thanks

@renkun-ken
Copy link
Contributor Author

@MarcusKlik, it is definitely a good option. Thanks!

@MarcusKlik
Copy link
Collaborator

Hi @renkun-ken, with the latest dev version, the default behavior of fst after a fork can now be set with parameter reset_after_fork in threads_fst(). When reset_after_fork = TRUE, the number of threads will be restored to the number of active threads before the fork.

On the data.table repository, some problems have been reported with the Intel compiler when threads are restored after a fork. For those cases, reset_after_fork = FALSE can be used or the fst_restore_after_fork option can be set to FALSE.

I'm very interested to see if this solves your issues with RStudio Server as well!

Thanks

@MarcusKlik
Copy link
Collaborator

Hi @renkun-ken, I believe we can close this issue, the default behavior of fst is now to restore the number of threads to the original setting after a fork has ended.

Please let me know if re-entering a RStudio session still disables multi threading and I'll re-open.

thanks for testing and submitting the issue to RStudio!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants