Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind package version into .dll/.so #3056

Closed
mattdowle opened this issue Sep 20, 2018 · 36 comments
Closed

Bind package version into .dll/.so #3056

mattdowle opened this issue Sep 20, 2018 · 36 comments
Milestone

Comments

@mattdowle
Copy link
Member

mattdowle commented Sep 20, 2018

So that R code in a package calling an old version of that package's .dll on Windows can't happen. Almost every time we upgrade, someone somewhere on Windows hits this problem; e.g. #3055.
The worry is that it happens more than we know, but users get wrong results silently. It's quite fortunate when a mismatch is severe enough to cause an error, so at least the user knows something is wrong.

But how to achieve this? It might mean checking the version matches between R package and its .dll on each and every .Call() call, which might be expensive, involve a lot of code changes at every interface point, and might be insufficient if we miss some interface points either now or in the future.

Maybe it's better to fix the issue upstream in R itself (install.packages) or RStudio if it's an RStudio-only problem. Since it must affect other packages too. We need someone with i) the will and ii) Windows, to solve it once and for all, and we need precise information from users on Windows who hit this problem to tell us the conditions under which they upgraded and the problem happens. (My guess is that they upgrade in one R session while another R session has the package loaded using the .dll).

@shrektan
Copy link
Member

shrektan commented Sep 21, 2018

I have some experience on this and it's true I have to be very careful when updating the R package that uses compiled codes on Windows.

Whenever the R package with compiled code is being loaded, the DLL file will be locked on Windows and can't be unlinked. If you are trying to upgrade the package via install.package() at the same time, R will complain that the DLL file can't be unlinked but the package itself will be installed anyway, which causes a lot of pain for R users because we normally open multiple sessions... So whenever I want to upgrade my package, I have to close all of the R scripts that were running in advance.

UPDATE Installed the binary version will only show a warning but the package can't be used anymore in a new session. However, I remember installing the package from the source will success but with a wrong DLL because I've fallen into this pit several times before (I will try later and report).

UPDATE-again I can confirm my previous UPDATE that: install from binary will only show a warning but the package won't be installed. However, install from source will show a warning but the installing will succeeds and leads to a broken package that uses the wrong DLL file. (See the screenshot below)

It's not an RStudio only issue

I do the following tests without using RStudio.

INSTALL FROM THE BINARY

  • R will throw a warning
  • The old package folder will be removed except the locked DLL file
  • You are not able to load the package since only a DLL file is kept

installpackage

INSTALL FROM THE SOURCE

  • Throws a warning, which might be difficult to notice, because:
    • you probably install multiple packages at the same time
    • or you probably are using an x64 Windows. R will compile both of the i386 and x64 DLL files by default with i386 first. If it's the i386 DLL being locked, you have minimal chances to notice because it's in the middle of the long compiling messages.
  • The package installed successfully anyway.
  • End up with using a wrong DLL file (depends on it's the i386 DLL file or the x64 DLL file that's being locked).

fromsource1

fromsource2

The Steps to Reproduce

  • Find a Windows computer with R installed.
  • Run remove.packages("data.table") to remove data.table.
  • Run library("data.table") and it throws errors to ensure data.table has been truely removed.
  • Run install.packages("data.table", type = "binary") to install the binary version (right now the latest binary version of data.table for R3.3.3 is 1.10.4-3). Run packageVersion("data.table") to ensure the right version is installed.
  • Run library("data.table") to lock the datatable.dll. Record the DLL's time by file.info("r-lib-path/data.table/libs/x64/datatable.dll")$mtime.
  • Run install.packages("data.table", type = "source") in a new session. Run packageVersion("data.table") shows 1.10.6 - ensure the latest version of data.table is installed. Run file.info("r-lib-path/data.table/libs/x64/datatable.dll")$mtimeagain and find that the time is the same as previous (should've been changed).
  • Run install.packages("data.table", type = "binary") again, since the DLL is still locked in the first R session, you are expecting the installing won't succeed.
  • Run library("data.table") and R will throw errors.

@shrektan
Copy link
Member

I think we have two options on this:

  1. Report it to R Core and hope they will fix this in install.packages()
  2. As you said, bind the version info in the dll file. However, I think we only need to check it whenever the data.table package is being loaded (e.g. check the version in .onLoad(libname, pkgname))

@jangorecki
Copy link
Member

jangorecki commented Sep 21, 2018

@shrektan thanks for confirming in R session. Your proposed solution looks interesting!
Alternatively, probably not very optimal way would be to copy dll on package load, and use a fresh copy, so original dll is never locked. Copied dll could be re-used by multiple sessions. Copy would append pkg version like data.table_1.11.6.dll then any session that attempts to load that version of pkg will not need to copy dll but only re-use already existing dll. So locks would be made only on pkg_ver.dll files, and install.packages would always update pkg.dll. This seems to be safe, but as said might not be optimal.

@MichaelChirico
Copy link
Member

Is this not an issue for, say Rcpp? I don't really see anything in Rcpp:::.onLoad/Rcpp:::.onAttach... Any insights @eddelbuettel?

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Sep 21, 2018

I am fairly certain I recall snarky comments from Duncan Murdoch in the direction of Redmond, WA. This is AFAIK very much an OS-level issue R cannot do anything about [unless you consider the guerilla hacks outlines above legit; I am not sure I'd recommend going there].

(For what it is worth we get minor DLL hell on Linux too. Try loading package c (depending on a) when b (depending on a) is already loaded -- and you upgraded a. Happend to me yesterday when zoo and data.table updated at CRAN and I tried to load new packages (pulling a new .so) when I already had long-running sessions (yay, ESS!) containing an older build.)

@jangorecki
Copy link
Member

jangorecki commented Sep 21, 2018

Both solutions mentioned above should be cross platform. The issue is pretty serious because it can lead to wrong results silently, so some solution should be implemented. The one with suffixing so/dll with version numbers seems to be more unix-like way of handling linux packages. I would go that way. @eddelbuettel any suggestion about next action point about it? filling bugzilla issue?

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Sep 21, 2018

Principle of least surprise: don't do anything.

Leaving dangling randome copies of shared libraries on my system is not something I would want an R package I respect to do, sorry.

@eddelbuettel
Copy link
Contributor

As a follow-up, I really don't want to sound like the Grinch but if it cannot be done reliably then maybe it should not be done. Consider eg what Duncan just posted about trying to debug rgl by loading / unloading: https://stat.ethz.ch/pipermail/r-package-devel/2018q3/003150.html

I still say "Don't do it". But that maybe me. And how I work with littler on the command-line.

@mattdowle
Copy link
Member Author

mattdowle commented Sep 21, 2018

Reading @shrektan's comment above, it seems R's install.packages() does know if the dll is in use or not, but somehow still results in a situation that the new version of R code can be used with the old version of dll, at least in some circumstances. So it seems like a relatively straightforward improvement to install.packages could be made, since the hard work of detecting if the dll is in use has already been done.

@nilescbn Did you install from source or from binary? @shrektan tests show that if you installed from binary then the mismatch should not have occurred. You must have installed from source for this mismatch to happen. Did you see the message that @shrektan underlined in red in the screenshot? It would make sense you did install from source because you were very quick to report the problem just after 1.11.6 went to CRAN. So quick that it was likely that binaries weren't available yet and that's the very reason you installed from source. We need you to confirm so that we can suggest the correct fix to r-core.

@eddelbuettel
Copy link
Contributor

Do we know if Python, Ruby, ... manage this better? AFAIK (and I know little Windoze) this is a documented defiency in in the OS. Dancing around seems crazy.

But then again, more power to you all as you have done some plainly "impossible" things before too.

@mattdowle
Copy link
Member Author

I'll be dancing if this gets fixed. That's for sure.

@shrektan
Copy link
Member

shrektan commented Sep 22, 2018

@mattdowle tasklist /m thelocked.dll won't work. I've opened an R session, executed library("data.table") and checked manually that the datatable.dll can't be removed. However, tasklist /m datatable.dllsays something like Info: No matched running tasks (translated from Chinese). Moreover, I can see no datatable.dll being attached to R.exe by running tasklist /m alone.

R.exe                         6080 ntdll.dll, wow64.dll, wow64win.dll,         
                                   wow64cpu.dll 

Anyway, I think the better solution is, as you said, file.remove() the old package installed files first and stop as soon as possible if exists locked DLL files.

@mattdowle
Copy link
Member Author

mattdowle commented Sep 22, 2018

@shrektan I've removed my misguided comment above and replaced it. I hadn't read your original comment properly when I wrote mine. Based on your earlier comment above with the screenshots, I think base R is already doing a pretty good job then, but there's a glitch in the case when installing from source.

In other words, there's already a warning: R knows this dll-in-use problem has happened! It's not that the warning should be an error, because people will still miss it whether it's a warning or an error. The problem is that the package is left in a mismatching R/dll state when installed from source. When it is installed from binary, it is ok in the sense that the mismatch R/dll state does not occur (it's left in a non-working state instead). The difference being to do with type=source or type=binary is not directly to do with whether the dll was created locally or downloaded from CRAN, it's just that a different logic path happens inside install.packages due to the type and there's a logic weakness in one of those code paths. Just talking out loud. Relying completely on your comment earlier above with the screen shots.

@jangorecki
Copy link
Member

So the easiest solution would be for base R to raise error instead of warning in such cases. Then it won't be possible to upgrade if you are using pkg in another session, and this is not that big problem as wrong answer silently.

@mattdowle
Copy link
Member Author

mattdowle commented Sep 22, 2018

No. That's why I wrote :

It's not that the warning should be an error, because people will still miss it whether it's a warning or an error.

In other words, if install.packages() fails with error but still leaves the package in a mismatch R/dll state, that's still not good enough. You may be thinking that the warning message is at the beginning of install.packages() before it starts to upgrade and so turning it into error will solve the problem. Maybe that is the case but I doubt it. It feels to me that the R code gets installed first and then the .dll later. If that's the case and the .dll can't be installed, the just-upgraded R code should be removed or something like that so the package is left in a non-working state that can't be loaded. So that there's no possibility of users getting silent wrong results due to still using the old version's dll. Or the dll could be upgraded first by install.packages() and only if successful would it proceed to upgrade the R code.

This seems to be this line and it happens at the end by the looks of it :
https://github.com/wch/r-source/blob/trunk/src/library/utils/R/windows/install.packages.R#L184

with an ominous comment :

restorePrevious <- TRUE # Might not be used

@shrektan
Copy link
Member

@mattdowle I've modified my previous comment and made it clearer. Actually, I double checked my steps (see the end of my first comment) and I'm pretty sure this is what happens (at least on my computer).

INSTALL FROM THE BINARY

  • R will throw a warning
  • The old package folder will be removed except the locked DLL file
  • You are not able to load the package since only a DLL file is kept

INSTALL FROM THE SOURCE

  • Throws a warning, which might be difficult to notice, because:
    • you probably install multiple packages at the same time
    • or you probably are using an x64 Windows. R will compile both of the i386 and x64 DLL files by default with i386 first. If it's the i386 DLL being locked, you have minimal chances to notice because it's in the middle of the long compiling messages.
  • The package installed successfully anyway.
  • End up with using a wrong DLL file (depends on it's the i386 DLL file or the x64 DLL file that's being locked).

I agree with your options:

  • Not only R should throw errors directly rather than an easily-overlooked warning.
  • But also R should either leave the package in a non-working state (a.k.a, installing from source should have the same behavior as binary) or simply restore the previous package files.

The key point is to prohibit users from using a mismatched DLL file silently.

@eddelbuettel
Copy link
Contributor

Maybe install.packages() could get a new option "safeMode" or "checkFirst" or something similar and "pre-test" in that case and not do it, or do it in another process, or ...

@mattdowle
Copy link
Member Author

@eddelbuettel why (on earth) optional?

@eddelbuettel
Copy link
Contributor

Fair point. I was just wary of any change in behavior. For another classic on that from just yesterday, see here.

@mattdowle
Copy link
Member Author

I filed this issue with r-core :
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17478

@nilescbn
Copy link

Looks like you all have moved well into the issue here... If still relevant, @mattdowle, I can't remember for certain but would bet I did install from source. I say this because if RStudio prompts me with the question of whether to install from source because the source version is "later" than the binary, I always choose source (unless it's a package like XML which fails repeatedly when doing that).

In trying to reproduce the error again this morning, RStudio did not prompt me. I understand that the binary "catches up" to the source version on CRAN after a few days, so perhaps that's what has happened.

Also this morning, when attempting to install v 1.11.6 without restarting the session, I got this warning:

Warning in install.packages :
  package ‘data.table’ is in use and will not be installed

I don't recall ever seeing that message before. And I definitely didn't see it last week.

@mattdowle
Copy link
Member Author

@nilescbn Thanks for confirming. What you're describing all fits.

@mattdowle mattdowle added this to the 1.12.0 milestone Oct 1, 2018
@mattdowle
Copy link
Member Author

mattdowle commented Oct 1, 2018

Thanks to @gaborcsardi suggesting in R#17478 to use tools::checkMD5sums('data.table'). I've added that now.

@shrektan When PR #3088 is merged, could you regenerate the broken state condition as you did before (having another session open with data.table loaded) but this time upgrading to v1.11.9 by setting type="source". You should see the warning as before about the in-use dll problem. But this time when you try to load v1.11.9 afterwards, data.table should emit the new error and halt asking you to reinstall data.table.

I'm not sure this will work because the MD5 file is not created when installing from source, is it? At least, it's not created for me on Linux when I install from source. So if the user installs from the source on Windows (as they are prompted to do just after a new release to CRAN when CRAN source version > CRAN binary version) I guess there is no MD5 file. In that case, tools::checkMD5sums('data.table') doesn't find the MD5 file and returns NA. Whether the new check works or not depends on whether install.packages() removes the old MD5 file before ending up in a broken state, I guess.

We could emit a warning on load on Windows if there is no MD5 file. To ensure Windows users only use binaries from CRAN. But is that too strong?

@jangorecki
Copy link
Member

jangorecki commented Oct 2, 2018

Definitely too strong. Many people uses Rtools and latest version of data.table. We could emit warning once, or add some function to disable that warning easily, so user don't have to set options in Rprofile. Maybe we can have c file storing just version, and compare version from C with DESCRIPTION. And eventually automate that in during build, but not sure how portable that would be.

@HughParsonage
Copy link
Member

HughParsonage commented Oct 3, 2018

One possibility would be to error on load if

(a) on Windows
(b) no or mismatching MD5
(c) a specific environment variable is unset

onLoad <- function(libname, pkgname) {
  # Runs when loaded but not attached to search() path; e.g., when a package just Imports (not Depends on) data.table
  if (.Platform$OS.type == "windows" &&
      !isTRUE(tools::checkMD5sums("data.table")) &&
      !identical(Sys.getenv("_R_DATATABLE_REQUIRES_BINARY_"), "FALSE")) {
    # checkMD5sums outputs messages using cat() and returns NA when MD5 file is not available. The MD5 file is included in the
    # binary builds that CRAN produces.
    stop("Using data.table on Windows but the MD5 checksum is invalid.\n\n",
         "On Windows, data.table should be installed from a binary ",
         "with a valid MD5 checksum. If you are an advanced user and ",
         "are certain the DLL will be properly updated, you can install ",
         "from source and use data.table by setting the following ",
         "environment variable before loading.\n\t",
         'Sys.setenv("_R_DATATABLE_REQUIRES_BINARY_" = "FALSE").\n')
    # https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17478
  }

This way the unusual case of installing from source can still be used by setting an environment variable but for the majority of cases it's caught.

@jangorecki
Copy link
Member

But it seems that MD5 is not present when installing from source thus it is pretty much useless to depend on that. And setting ENV var is even more annoying then setting R option. Next time user will boot machine and try to run script it will see error again, and then user will need to google (and learn very useful stuff of course, but user might not appreciate it) how to set it permanently. I would emit warning once (per version), and mark somewhere internally that user has been warned. Or go for more reliable solution, as the one mentioned above, to store version in C and compare in onLoad.

@mattdowle
Copy link
Member Author

Warning (not error) on load would be better, agreed. I don't think it's 'pretty much useless' to use MD5 though. Vast majority of WIndows users install and use the binary from CRAN. When they upgrade the day after a new release and CRAN source version > CRAN binary version, they are prompted to install from source, which many of them do. The absence of MD5 is probably (not yet confirmed) an indicator that they did that. It was a temporary measure for them to install from source. When the binary version is available on CRAN a day or two later, they should reinstall using the binary to remove the startup message/warning. I think they will appreciate that. Most users will never see the startup warning.
In addition, the root cause should be fixed in R too. But folks won't benefit from that until they upgrade to the future version of R.
Note that I asked if the MD5 is created when installing from source. Nobody has actually confirmed that yet so we should not assume it isn't. When I look at the AppVeyor artifact .zip, it contains an MD5, for example. If install.packages(type="source") does not create an MD5 in the installation directory on Windows, is there a way to make it create one? That could be a way for advanced Windows users who routinely compile from source to avoid the startup warning: have them create the MD5 file rather than set an environment variable or option.

@HughParsonage
Copy link
Member

I disagree jangorecki.

But it seems that MD5 is not present when installing from source thus it is pretty much useless to depend on that.

I claim that installing data.table from source is very unusual for Windows users. I for one have not been able to compile data.table from source because of the openmp dependency. I'd say that only a very few advanced users would require it. One very important type of user might be badly affected: the actual creation of CRAN binaries -- that would be a problem, obviously. Otherwise installing the binary is almost always the correct decision.

You can set system or user level environment variables, so if a user really really wants to install from source he only needs to set the environment variable once. It should persist between boots. And the error message I provided makes it clear how to install immediately.

A big problem with a single warning for each install is that users might automatically install via update.packages() and the warning will be missed. I have a few scripts that run update.packages(ask = FALSE) every hour. Obviously once data.table is updated the package source is available well before the binary so that is what is installed; once the binary becomes available it's too late -- the package is already 'up-to-date' as far as R is concerned.

I think the cost of inconvenience is worth the risk.

@jangorecki
Copy link
Member

OK, it was just me compiling data.table frequently. Back then (when I was on Windows) there were no openmp in data.table, so that might have been easier, just Rtools. It would be best to rely on md5 actually, but that requires R to compute it always. Or even better R should verify md5 after installation.

@shrektan
Copy link
Member

shrektan commented Oct 5, 2018

Actually, I've never had difficulties when compiling from source on Windows (openmp is enabled by default?) but always on OSX (have to modify ~/.R/Makevars first)...

@mattdowle I've tried installing 1.11.9 (since not on CRAN yet I compiled a source file and use install.package("xxx.tar.gz", type = "source", repos = NULL)). As expected, it can be loaded despite of using a wrong DLL file, since there's no MD5 file to compare.

@aadler
Copy link

aadler commented Dec 11, 2018

@mattdowle It appears that when building a source package from packgage files, the file with the MD5 hashes (named MD5) is added to the .tar.gz in the top directory. When installing from that source, the MD5s are checked, but not saved into the library.

@kevinushey
Copy link
Contributor

kevinushey commented Dec 16, 2018

What about renaming the data.table shared object so that it uses the current package version in the name? That way, there's no risk for collisions at all when updating data.table to a new version. It could even just be some unique identifier rather than the version (e.g. a UUID) so that name collisions are no longer even possible.

The library name would need to be updated in the following places:

  1. The useDynLib() call in the NAMESPACE file;
  2. The src/Makevars usages where the built shared object is renamed;
  3. The R_init_datatable() method would need to change to accommodate the version.

This could probably be automated with a configure script if need be. data.table could clean up old versions of the .dll on package load as necessary, as well (assuming the package was installed into a library that the user can read / write to, which is typically true on Windows)

@mattdowle
Copy link
Member Author

Thanks for the suggestion. I seem to remember something like that (version number in .so/.dll name) being suggested before but there were some reservations.

How about adding this to the C:
const char *version="1.11.8";
and then passing the R level version through to a C checker on init/load to ensure they are equal. Would still need to be updated manually but that's easy to be done in release procedures.

(I never got a response from R-core to the issue I raised.)

@mattdowle
Copy link
Member Author

@shrektan Can you try once more to reproduce the mismatch state please to see if PR #3237 catches it? Since the dllVersion is a new function call you should actually see an error saying that dllVersion can't be found in the old dll for 1.11.8. At least that's an error and the package won't load, which is something. In time when 1.12.0 becomes the old dll, the new long version mismatch message will kick in, which I tested locally.

@shrektan
Copy link
Member

@mattdowle I can confirm it works. Installing from source leads to the error saying object dllVersion can't be found.

@mattdowle
Copy link
Member Author

mattdowle commented Dec 20, 2018

Thanks @shrektan ! Let's close for now then and monitor closely. As we see reports of the new data.table check, we can follow up with R-core on the root cause which affects all packages (which have compiled code) in the issue I raised : https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17478.
If R does improve the package installer on Windows, it will only be improved in a future version of R. So this data.table check will still be useful for at least 3 years for those using current versions of R. As far as I know, as I explained in the bugzilla report, the problem only occurs just after release to CRAN in the few days before the CRAN binary is available since this is when install.packages() installs from source on Windows. But its status is still "unconfirmed" on bugzilla so it needs attention to confirm this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants