Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGZILLA #17478] install.packages() on Windows can upgrade R code but leave old dll in place #6652

Closed
MichaelChirico opened this issue May 19, 2020 · 14 comments

Comments

@MichaelChirico
Copy link
Owner

install.packages() on Windows can leave the package in an apparently functional state where the new R code is installed but the old dll is left in place. This mismatch can cause errors if the user is lucky, silent wrong results if they are not lucky.

The problem does not happen when the user installs the Windows binary from CRAN. It only happens when the user installs from source. Which install.packages() suggests to them when the source version on CRAN is later than CRAN's Windows binary. This typically happens just after the package is updated on CRAN.

Reproducible example is discussed and confirmed here : Rdatatable/data.table#3056


METADATA

  • Bug author - Matt Dowle
  • Creation time - 2018-09-25 00:43:21 UTC
  • Bugzilla link
  • Status - CLOSED FIXED
  • Alias - None
  • Component - Installation
  • Version - R 3.5.0
  • Hardware - Other Windows 64-bit
  • Importance - P5 normal
  • Assignee - R-core
  • URL -
  • Modification time - 2019-06-20 07:30 UTC
@MichaelChirico
Copy link
Owner Author

This is a known issue, I believe. The old dll file is locked if the package is loaded in some R session, either the current one, or another one.

install.packages() refuses to reinstall the package if it is attached in the current session, but it tries to reinstall it if it is only loaded, but not attached. If the DLL file is locked, then you get a "warning":

Warning: cannot remove prior installation of package ‘ps’

One "workaround" is to always run install.packages() with options(warn = 2), then at least you don't get a broken package. The problem with this is that it also stops for other warnings, e.g. for missing metadata files in the CRANextras repository. (See https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17467)

The checkMD5sums() tool is also useful:

tools::checkMD5sums("data.table")

file ‘libs/i386/datatable.dll’ has the wrong MD5 checksum

Btw. of course this is also happening when installing binary packages. It happens whenever the package is loaded in some R session.


METADATA

  • Comment author - Gabor Csardi
  • Timestamp - 2018-10-01 09:13:24 UTC

@MichaelChirico
Copy link
Owner Author

Thanks Gabor, this was really helpful.

One "workaround" is to always run install.packages() with options(warn = 2),
then at least you don't get a broken package.

Did you try this and confirm it avoids the broken package state? If so then perhaps it is an easy fix to change that warning to an error in R then! I'll be surprised because the R code has already been upgraded by that point but I hope it's true. I have likely misunderstood the source code of install.packages in this area.

R does a great job in detecting the in-use dll when installing from binary. It halts with error and leaves the package in an unusable state. This is the desired behaviour when installing from source too, as users are prompted to when CRAN source version > CRAN binary version. Rather than leaving the package in a broken state; i.e. new R code calling old C code in the old dll.

On checkMD5sums(), that's great, thanks. Should every package that has a dll add this check to their .onLoad(), and/or could it be added into R itself so that every package doesn't have to? I'll add it to data.table's .onLoad() so that users with older versions of R can benefit from the extra check without needing to upgrade to latest R if and when it gets an improvement in this area.


METADATA

  • Comment author - Matt Dowle
  • Timestamp - 2018-10-01 22:23:44 UTC

@MichaelChirico
Copy link
Owner Author

Did you try this and confirm it avoids the broken package state?

I don't remember, but you are probably right. Nevertheless, even if it is broken, at least you get an error, so you can act on it. Not sure what else you can do, really.

R does a great job in detecting the in-use dll when installing from binary.
It halts with error and leaves the package in an unusable state. 

Maybe sometimes it does, but AFAIR it did not do that for me. I think binary installation is broken as well, the warning I copied above was from a binary installation.

The only case the installation was refused was when the package was attached in the current session. A harder problem is when the DLL is locked by another R process, that can't be detected without trying the installation.

Should every package that has a dll add this check to their .onLoad(), and/or
could it be added into R itself so that every package doesn't have to?

That would make a lot of sense, but I would say fixing the installer is also important.

Btw. sessioninfo::session_info() now also marks the packages with DLL mismatch.


METADATA

  • Comment author - Gabor Csardi
  • Timestamp - 2018-10-01 23:02:23 UTC

@MichaelChirico
Copy link
Owner Author

Nevertheless, even if it is broken, at least you get an error, so you can act
on it. 

However, the consequences for the user in not acting on the error can be disastrous (wrong results silently) and persists until they next upgrade.

Not sure what else you can do, really.

As I've already suggested, what else can be done is leave the package in a non-working state so it can't be loaded or used until it is fixed by reinstalling it. It is possible to do because install.packages() already does that when installing from binary, at least sometimes. This is better than leaving it in a mismatch R/dll faulty state. We know this faulty state persists because users keep reporting problems caused by it and the nature of the reports are because of R code calling old version C code.

A harder problem is when the DLL is locked by another R process, that can't
be detected without trying the installation.

Yes it's hard. But R's install.packages() already has code that is good at solving this problem. It already does that very well. The problem is that piece of very good logic is not being run in all cases, it seems.

Maybe it's not that hard. If the first thing install.packages() did was try to remove the DLL, wouldn't that be quite easy? If it couldn't remove the DLL it wouldn't proceed further, leaving the package not upgraded but working correctly. The root problem perhaps is that install.packages() does the DLL last after it already upgraded the R code in the package. Although I am rather guessing here based on looking at the source a few days ago.

Btw. sessioninfo::session_info() now also marks the packages with DLL
mismatch.

I see that sessioninfo is your package. It looks excellent and too important to be a separate package. Could it be promoted into utils::sessionInfo()?


METADATA

  • Comment author - Matt Dowle
  • Timestamp - 2018-10-02 01:06:20 UTC

@MichaelChirico
Copy link
Owner Author

Created attachment 2432 [details]
minimal reproducible example terminal output


METADATA

  • Comment author - Toby Hocking
  • Timestamp - 2019-06-05 23:40:59 UTC

INCLUDED PATCH

@MichaelChirico
Copy link
Owner Author

Hi I am having the same issue. Several others on R-devel http://r.789695.n4.nabble.com/R-pkg-install-should-fail-for-unsuccessful-DLL-copy-on-windows-td4757281.html have expressed a desire for an error in this case, instead of a warning.

At the very least I would expect that if I set options(warn=2) then R should convert that to an error, but it does not. Attached is a minimal reproducible example command line.


METADATA

  • Comment author - Toby Hocking
  • Timestamp - 2019-06-05 23:41:38 UTC

@MichaelChirico
Copy link
Owner Author

(In reply to Toby Hocking from comment #6)
[...]

At the very least I would expect that if I set options(warn=2) then R should
convert that to an error, but it does not. Attached is a minimal
reproducible example command line.

options(warn=2) is not ideal, because there are other warnings, possibly. E.g. if a PACKAGES file is not found, if tar has unknown pax headers, etc. There are a bunch of them, and they tend to be (mostly) harmless. FWIw the remotes package does this, i.e. it injects options(warn = 2) into the install process, but we had to create an escape hatch to work around the other warnings.

I think install.packages() should just fail in this case.

There are actually two different implementations to patch, because the binary package installation is different, that's just az unzip call essentially. But that is problematic as well, because it does not error, either, only warns. The end result in this case is that the package's directory is wiped out in the library, except the locked dll file.

This is the related issue in remotes, it also includes an investigation of the various cases (binary, sources, etc.): r-lib/remotes#368


METADATA

  • Comment author - Gabor Csardi
  • Timestamp - 2019-06-06 08:22:19 UTC

@MichaelChirico
Copy link
Owner Author

In data.table I added a version check between the .dll and the R code by placing the version number as a compiled constant C string with a wrapper to check it matches the R level package version number, Rdatatable/data.table#3237. This ensures that the working-but-silently-wrong state cannot occur with data.table 1.12.0+ (Jan 2019), even with old versions R if and when it is fixed in R.

When the working-but-silently-wrong state occurs, data.table prints the following message :

"The datatable.dll version (1.12.0) does not match the package (1.12.2). Please close all R sessions to release the old DLL and reinstall data.table in a fresh R session. The root cause is that R's package installer can in some unconfirmed circumstances leave a package in a state that is apparently functional but where new R code is calling old C code silently: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17478. Once a package is in this mismatch state it may produce wrong results silently until you next upgrade the package. Please help by adding precise circumstances to 17478 to move the status to confirmed. This mismatch between R and C code can happen with any package not just data.table. It is just that data.table has added this check."


METADATA

  • Comment author - Matt Dowle
  • Timestamp - 2019-06-11 20:31:31 UTC

@MichaelChirico
Copy link
Owner Author

Addressed in 76692 and 76693. Now R on Windows refuses to install a source package using install.packages() if that package is in use by the R session invoking install.packages(). Previously this check only applied to binary packages. In addition, the installation from a source package will fail with error when any file of the existing installation is locked (delete operation fails). This may unfortunately impact some users who used to update loaded packages even on Windows without running into any problems.


METADATA

  • Comment author - Tomas Kalibera
  • Timestamp - 2019-06-14 07:53:46 UTC

@MichaelChirico
Copy link
Owner Author

Based on your description the binary install case is still not fixed. I.e. if the package is loaded in another R session, then install.packages() will still wipe out the previous installation, and leave only the locked DLL behind.


METADATA

  • Comment author - Gabor Csardi
  • Timestamp - 2019-06-14 08:13:45 UTC

@MichaelChirico
Copy link
Owner Author

(In reply to Gabor Csardi from comment #10)

Based on your description the binary install case is still not fixed. I.e.
if the package is loaded in another R session, then install.packages() will
still wipe out the previous installation, and leave only the locked DLL
behind.

Right, so far this has been addressed only in the source installs.


METADATA

  • Comment author - Tomas Kalibera
  • Timestamp - 2019-06-14 08:33:08 UTC

@MichaelChirico
Copy link
Owner Author

The problem with installing a binary package when an existing installation has a locked file can be solved via a per-directory lock. In released versions of R, one can pass "lock=TRUE" to install.packages() or change option "install.lock" to TRUE. See ?install.packages() for more. I've changed the default in R-devel to TRUE.


METADATA

  • Comment author - Tomas Kalibera
  • Timestamp - 2019-06-19 10:45:58 UTC

@MichaelChirico
Copy link
Owner Author

Many thanks, Tomas! It's fantastic news that this is resolved. I'll update data.table's message in the next release, and I'll update the twitter thread.


METADATA

  • Comment author - Matt Dowle
  • Timestamp - 2019-06-19 19:52:08 UTC

@MichaelChirico
Copy link
Owner Author

Thank you for the report. The patches have been ported to R-patched. It indeed remains important that packages being installed are not in use by any R session (even on a different machine when the library is shared), there are a number of ways this could go wrong.


METADATA

  • Comment author - Tomas Kalibera
  • Timestamp - 2019-06-20 07:30:41 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant