Calculate IPID from source before building #2752

vagrawal · 2015-08-06T02:24:31Z

This patch allows cabal to calculate IPID before building the package. The benefits to this method is that we can stop a duplicate compilation by looking at IPID and that IPID generated by this method is unique and changes with source (and also #2745, so now it changes only installation IPID and not inplace IPID when doing cabal build). Now this patch generates IPID strictly from source(by calculating hash of sdist), but there can still be mutations in compiled package without changing the source. It is not immediately clear as which thing is best to include in the hash but there are some candidates:

Command line flags + InstallPlan
Command line flags + InstallPlan + ProgramConfiguration
LocalBuildInfo (moving IPID calculation to after configuration step)
LocalBuildInfo + ConfigEx Flags + Install Flags

Also I had done some work to change the library path to IPID(not very hard), but then found out a recent change by @ezyang having a more broad plan including backpack, so I have reverted but I think it is a better way to go(and it will bring one step closer to #2745). Please comment if I can replace it. I can also do locking before building part as suggested by @ttuegel. Please let me know if it is a good way to go as @dcoutts said that it is not critical and can be left for now.

The only problematic thing is that when we call `cabal clean` or `cabal haddock` (and possibly others), _without_ first having called `configure`, we attempt to build the setup script without calling the solver at all. This means that if you do, say, cabal configure cabal clean cabal clean for a package with a custom setup script that really needs setup dependencies (for instance, because there are two versions of Cabal in the global package DB and the setup script needs the _older_ one), then first call to `clean` will succeed, but the second call will fail because we will try to build the setup script without the solver and that will fail.

This happened independently in a number of places, which was bad; and was about to get worse with the base 3/4 thing.

Never consider flag choices as independent from their package.

(previously the default was the topdown solver for GHC < 7). Also adds a deprecation warning when the topdown solver is selected.

Correct the link - package version policy.

This is consistent with the the current -auto-all behavior. This fixes haskell#2479.

cabal check will fail on -fprof-auto passed as a ghc-option

Fixed bash completion for sandbox subcommands

Slightly better error message in case of backjump limit reached

…r names appear unexpectedly in compiler output (fixes haskell#2542) This fixes cases where strings like "ld" appear in compiler output on lines that have nothing to do with the linker command invocation.

bootstrap.sh: fixes linker matching to avoid cases where tested linker names appear unexpectedly in compiler output (fixes haskell#2542)

@creswick

I was the one who introduced that bit, and it seems like well may have caused more trouble than it helped... so, sorry :( Anyway, clang has since then grown the ability to `-print-prog-name`, so I've done away with all that grepping through -### stuff. @creswick @23Skidoo, could you see if this works for you?

Rewrite linker check to avoid grepping "ld"s.

Add Functor, Foldable and Traversable instances to Condition

Fixes haskell#994.

Paths_ module: don't require base >= 4.

This reverts commit c9a7735.

Print a more friendly message when http_proxy is down.

ezyang · 2015-08-25T18:47:40Z

OK, a few more problems which I've noticed when testing this revision with GHC:

We should track the "optimization level" to make it possible for someone to install a library multiple times with --disable-optimisation and --enable-optimisation (we have a test case for this in GHC, and optimisation level DEFINITELY affects ABI).
We're not erroring if you 'Setup register' a package into a database that already has this package; we definitely should error by default.

I can contribute fixes for these.

This flag (through a terrible hack) also supports install directory variables (e.g. $pkgver). Depends on PR haskell#2752. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

ezyang · 2015-08-25T23:12:26Z

Another thing todo: need to make it possible for cabal-install to get the IPID. (For build paths and stuff.) I think I'm going to go ahead and implement this.

UPDATE: This doesn't seem to be critical path, cabal-install seems to work fine without the IPID.

ezyang · 2015-08-27T00:10:15Z

We're not erroring if you 'Setup register' a package into a database that already has this package; we definitely should error by default.

So, the easiest way to fix this is to change our invocation of 'ghc-pkg' from 'reregister' to 'register'. But this means we have to be careful to unregister a package from the inplace database before attempting to register it again.

While implementing this, I noticed that the UniqueIPID test was a bit bad: it was registering in the GLOBAL user database (it's passing the --inplace flag but this does not do what the author thought it does.) I also have a patch to fix this.

ezyang · 2015-08-27T01:45:59Z

Currently, this patch applies new IPID calculation to ALL versions of GHC (it's not feature gated.) This is desirable because it means that Cabal doesn't have to maintain two feature paths (sdist computed IPIDs and ABI hash IPIDs) and we can simplify code accordingly. But there are BC considerations...

GHC 7.8 and earlier, prior to this patch. There's no support for multiple instances in the package database; so we want ghc-pkg to complain if we install the same package name/version pair multiple times. The current behavior is that './Setup registeron a package ALWAYS works (by invokingghc-pkg update), and it will delete any installed packages that have the same package id (package name and version). This means that './Setup register' may silently break packages in your package database, and./Setup copy` may silently clobber files for an already existing package in your database. If you use cabal-install, Cabal will ALWAYS refuse to reinstall. If you force a reinstall, and are lucky and the ABIs are the same, nothing will be broken; if you do force a reinstall, some number of packages will become unavailable. One thing to note is that it is fairly difficult to end up with a segfaulting set of packages: the easiest way to cause a segfault is to './Setup copy' (clobbering interface/object files) without subsequently './Setup register'ing.

GHC 7.8 and earlier, after this patch. Let me first describe the state from this literal patch, then describe some possible alternative worlds we can live in. With this patch, ./Setup register and ./Setup copy can now cause a segfault whenever a pair of matching IPIDs doesn't imply a compatible API: this is because we invoke ghc-pkg update which will just overwrite an existing with the same IPID without invalidating any preexisting packages which depended on it (this is akin to the "being lucky and the ABI not changing" case.) GHC also (internally) assumes that IPIDs imply ABI compatibility (and will use this property to de-duplicate packages which show up in multiple package databases) and has no way of detecting this situation. If you force a Cabal reinstall, similar admonitions apply.

Now for alternatives. Earlier this thread I proposed that we refuse to register a package unless you --force if the IPID already existed. However, there is no direct ghc-pkg command which actually has this effect: ghc-pkg register will refuse to install if there is another package with the same package ID (package name + package version, reporting "package foo-0.1 is already installed"), ghc-pkg update will happily overwrite an existing IPID, and ghc-pkg register --multi-instance doesn't exist prior to 7.10. So what we would have to do is call ghc-pkg once before the registration to check for our IPID, and then register it. (Non-atomic, so there is a race condition.) And of course, we can't update ghc-pkg because it ships with GHC.

Another possibility is to just not bother; @ttuegle has suggested that ./Setup register and ./Setup copy should unconditionally emit warnings to make sure the user knows what they are doing (although, IMO, a warning but go ahead is kind of pointless because you may have already messed up your package database at that point.)

GHC 7.10, prior to this patch. Primary difference: Cabal computes a "package key", which is then passed to GHC for compilation to be used for symbol names and type equality. The default behavior of ghc-pkg is the same as pre-GHC 7.8, but there is a flag --enable-multi-instance which applies different behavior: ghc-pkg will no longer fail if there is a package already in the database with a matching package ID, as long as there is not something in the database with the same IPID. However, there is a BUG in the implementation, which causes it to raise spurious case-sensitivity errors: https://phabricator.haskell.org/D1177

GHC 7.10, after this patch. Well, technically this patch doesn't do it but the companion patch #2748 enables multi instances. With these two patches, it means that if we register an existing IPID, we will overwrite it, but otherwise if it doesn't already exist, it will just be added to the database (without deleting any existing packages).

As a refinement, if we want to prevent overwriting an existing IPID, we can just change our invocation of ghc-pkg update to ghc-pkg register. The non-atomic GHC 7.8 implementation will also work in this case, however.

My recommendation. It seems most uniform if we continue to use ghc-pkg update to register our packages, but to add an extra check to see if the exact IPID already exists in the database. We fail out in that case, but it can be overridden by passing a new --force flag to cabal register. Furthermore, we pass --enable-multi-instance to ghc-pkg whenever we can.

UPDATE: One complication: GHC 7.6's ghc-pkg does not support the --ipid flag, so you have to implement this indirectly by querying for the package name + version, and then checking if the returned IPID matches.

ezyang · 2015-08-27T06:46:52Z

OK implemented here #2792

ezyang · 2015-08-28T23:57:07Z

OK, another major bug: cabal-install needs to get the IPID so that it can compute the install directory correctly.

ezyang · 2015-08-29T00:34:27Z

Thought about this some more, as before there are delicate backwards compatibility considerations, stemming from the fact that a './Setup' executable (in the case of Custom build type) can be compiled with an older version of Cabal.

The ideal situation is that the installation directory is based off of the installed package ID. OK, but how do we get the IPID from the configure step where it is currently computed? There are two options:

cabal-install computes it directly (using it's version of Cabal), or
cabal-install gets it from calling Setup configure and having configure somehow report the IPID.

In fact, it seems that we'll have to do a combination of both. Since there's no support for getting an IPID from Setup configure with old versions of Cabal, in that case we are forced to use method (1) to get an IPID. (In fact, it may not have any relation to the IPID that register might end up registering the package as; if it really is an old version of Cabal, it will be installed with the ABI. But it is harmless if the IPID is our calculated one.)

EDIT: But method (2) doesn't work, because the library directory is configured during configure (which is when the IPID is calculated). So if cabal-install is always computing the IPID directly, it should also pass it to Cabal when possible, using the --ipid flag.

23Skidoo · 2015-08-29T01:34:28Z

Thought about this some more, as before there are delicate backwards compatibility considerations, stemming from the fact that a './Setup' executable (in the case of Custom build type) can be compiled with an older version of Cabal.

Well, you can require it to be compiled against a recent version. We do this for --allow-newer. The downside is that the resulting error can be surprising for users.

ezyang · 2015-08-29T01:41:39Z

Well, this code is going to apply to all packages being built with Cabal, so an admonition otherwise would basically prevent people from using Setup with old Cabal ever. Not good.

ezyang · 2015-08-29T03:00:33Z

I've pushed a fix for this problem. Seems to work OK.

ezyang · 2015-08-29T03:14:11Z

Haha! Testing caught a file handle exhaustion problem (only GHC7.8 and earlier). I know what the problem is, cooking up a fix.

ezyang · 2015-08-29T03:59:00Z

OK, so I fixed the FD exhaustion problem by forcing the fingerprint, but now there is a second problem where readFile is not guaranteed to work if the locale is not setup correctly. Now, ideally we'd like to just hash the underlying bytestring, so I guess we should switch the implementation to use fingerprintData

ezyang · 2015-08-29T04:38:24Z

OK, also fixed that by using fingerprintData: 3535425 and 209147f (I'll rebase soon).

BardurArantsson · 2015-09-09T17:30:59Z

QQ: If I want to test (including @ezyang's fixes), which commit should I be testing?

ezyang · 2015-09-10T01:03:02Z

Use my branch cabal-no-pks https://github.com/ezyang/cabal/tree/cabal-no-pks

BardurArantsson · 2015-09-12T05:08:58Z

I've tried that branch on a few of my own (smallish) projects, and all seems to work OK. (Not that I'm doing anything particularly advanced, mind you.)

Am I right in thinking that the "global" package database gets ignored if you've previously built dependencies for your project and you then create a sandbox to build a project in? (Hope that sentence makes sense, I'm on painkillers ATM. Damn teeth!) That seems to be the case for one of my multi-package projects. (It's not a big deal, just a little observation that I thought I'd get cleared up.)

Overall, I'm very excited to see this work (hopefully) go into master soon! Hopefully we can reengineer/reimagine sandboxes to be much much simpler once it's in.

BardurArantsson · 2015-09-12T06:11:35Z

... and of course, just as I say that, I hit a little snag:

[1 of 1] Compiling Main             ( src-test/Main.hs, dist/dist-sandbox-2bc1c660/build/tests/tests-tmp/Main.o )
Linking dist/dist-sandbox-2bc1c660/build/tests/tests ...
Creating package registration file:
/tmp/pkgConf-cqrs-memory-0.1011311762291653377373.0
Installing library in
/home/bardur/wd/cqrs/.cabal-sandbox/lib/x86_64-linux-ghc-7.10.2/cqrs-memory-0.10.0-1i9ETqIuXRojhGsgmdpxO1
Registering cqrs-memory-0.10.0...
cabal: identical key 1i9ETqIuXRojhGsgmdpxO1 already exists in DB; ghc-pkg
unregister it first.
ExitFailure 1cabal: Error: some packages failed to install:
cqrs-memory-0.10.0 failed during the final install step. The exception was:
ExitFailure 1

(FWIW, the scenario here is that I initially ran "cabal install ./cqrs-memory" and then ran "cabal install ./cqrs-memory --enable-tests". Apparently these "hash" to the same IPID.)

One can argue whether the unregister/reinstall shouldn't just happen automatically, but the error message could certainly be improved. I tried "the obvious thing", namely

ghc-pkg unregister 1i9ETqIuXRojhGsgmdpxO1

but that doesn't work because ghc-pkg doesn't understand the hash->pkg link. So I tried

ghc-pkg unregister cqrs-memory

but that doesn't work either. Interestingly, ghc-pkg list doesn't list cqrs-memory either.

What command am I supposed to be running to get rid of the already-registered "duplicate"?

(Incidentally, the error message itself should probably show the precise command that you can run, if any.)

ezyang · 2015-09-12T07:21:38Z

Oh, I bet it's because the sandbox logic isn't unregistering packages from the sandbox database. (That's also why ghc-pkg isn't showing it up: the package is in the sandbox. Use -f to select the sandbox db, but we should just fix this case.) The error message can be a lot better here; it should print an IPID instead of a package key, I guess?

BardurArantsson · 2015-09-12T09:50:27Z

Oh, right: Here's the command that works:

ghc-pkg -f .cabal-sandbox/x86_64-linux-ghc-7.10.2-packages.conf.d/ unregister cqrs-memory

The message should probably just print that command?

vagrawal · 2015-09-12T11:02:11Z

Hopefully we can reengineer/reimagine sandboxes to be much much simpler once it's in.

I had already done much of the work (at vagrawal@adf17a7 and https://phabricator.haskell.org/D1119). In short, it adds various nix-like facilities like using central package DB for all packages, garbage collection, allowing multiple instances (already in this patch), non-mutability without GC and also many other improvements like uninstalling and upgrading and creating lightweight sandbox manually with ghc-pkg. It still has some small issues like austin's comment in phabricator, but overall it seems to work nice. I think it will require like a day or two more to make the required changes. I was waiting for this patch to get merged as it could potentially change things in my patch and I will do the required changes after #2817 will be merged.

I had written a sort of concise specification for that patch for my mentor once. You can check it out at https://gist.github.com/fugyk/bea5e97e9f24c51c0174. The last commit was done in hurry so the current patch might not work as expected.

... and of course, just as I say that, I hit a little snag:

There are quite some known snags in this patch. I will try to list some of them here:

It is hard to reason weather same source hash(even with same command line flags) and same ghc version implies same ABI. There are so many other variables like programs(e.g. LLVM), OS version etc which are not included in hash.
Even full command line flags are not included in hash, including flags like --enable-optimization which can change ABI. We can include them by cherry-picking by hand but it does not seem a good idea.
It is not atomic, meaning it could calculate the hash which is not used for building. It could lead to some strange situations (described in earlier discussions).
As @BardurArantsson pointed out, error message is not good, but telling to manually unregister the package does not seem good either.
Maintaining Compatibility. @ezyang has done a great work to make this compatible with older ghc but that makes overall harder for other people to modify the code.

I have to say, I am not in favour of merging package key with IPID for the above reasons, but I guess it is needed for some simplicity in the infrastructure.

BardurArantsson · 2015-09-12T11:18:17Z

There are so many other variables like programs(e.g. LLVM), OS version etc which are not included in hash.

This particular thing is probably not a huge issue, I think. For most people who are not on a rolling release distribution (no idea about Windows here), their LLVM (or whatever) probably isn't going to change in any significant way.

23Skidoo · 2015-09-12T12:35:15Z

@BardurArantsson

ghc-pkg -f .cabal-sandbox/x86_64-linux-ghc-7.10.2-packages.conf.d/ unregister cqrs-memory

cabal sandbox hc-pkg unregister cqrs-memory should also work.

23Skidoo · 2015-09-12T13:40:01Z

Even full command line flags are not included in hash, including flags like --enable-optimization which can change ABI. We can include them by cherry-picking by hand but it does not seem a good idea.

It's not very elegant, but there's no way around it, I think.

BardurArantsson · 2015-09-12T13:56:04Z

@23Skidoo : Thanks, that's a bit more elegant :).

About the command line flags... what about just doing a really dumb thing and e.g. sorting + uniq'ing them and then hashing? I think it's probably better here to just assume that all command line flags have an effect on the output -- exceptions could then be explicitly removed from that list if deemed necessary. That way you'd at least always get rebuilds when appropriate at the cost of sometimes getting rebuilds when they actually technically weren't necessary. I think that's a better solution than one where you risk not getting rebuilds when you should. Does that make sense?

EDIT: Actually, command line flag order might even matter -- I'm not sure.

ezyang · 2015-09-23T22:48:49Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate IPID from source before building #2752

Calculate IPID from source before building #2752

vagrawal commented Aug 6, 2015

ezyang commented Aug 25, 2015

ezyang commented Aug 25, 2015

ezyang commented Aug 27, 2015

ezyang commented Aug 27, 2015

ezyang commented Aug 27, 2015

ezyang commented Aug 28, 2015

ezyang commented Aug 29, 2015

23Skidoo commented Aug 29, 2015

ezyang commented Aug 29, 2015

ezyang commented Aug 29, 2015

ezyang commented Aug 29, 2015

ezyang commented Aug 29, 2015

ezyang commented Aug 29, 2015

BardurArantsson commented Sep 9, 2015

ezyang commented Sep 10, 2015

BardurArantsson commented Sep 12, 2015

BardurArantsson commented Sep 12, 2015

ezyang commented Sep 12, 2015

BardurArantsson commented Sep 12, 2015

vagrawal commented Sep 12, 2015

BardurArantsson commented Sep 12, 2015

23Skidoo commented Sep 12, 2015

23Skidoo commented Sep 12, 2015

BardurArantsson commented Sep 12, 2015

ezyang commented Sep 23, 2015

vagrawal commented Sep 28, 2015

Calculate IPID from source before building #2752

Calculate IPID from source before building #2752

Conversation

vagrawal commented Aug 6, 2015

ezyang commented Aug 25, 2015

ezyang commented Aug 25, 2015

ezyang commented Aug 27, 2015

ezyang commented Aug 27, 2015

ezyang commented Aug 27, 2015

ezyang commented Aug 28, 2015

ezyang commented Aug 29, 2015

23Skidoo commented Aug 29, 2015

ezyang commented Aug 29, 2015

ezyang commented Aug 29, 2015

ezyang commented Aug 29, 2015

ezyang commented Aug 29, 2015

ezyang commented Aug 29, 2015

BardurArantsson commented Sep 9, 2015

ezyang commented Sep 10, 2015

BardurArantsson commented Sep 12, 2015

BardurArantsson commented Sep 12, 2015

ezyang commented Sep 12, 2015

BardurArantsson commented Sep 12, 2015

vagrawal commented Sep 12, 2015

BardurArantsson commented Sep 12, 2015

23Skidoo commented Sep 12, 2015

23Skidoo commented Sep 12, 2015

BardurArantsson commented Sep 12, 2015

ezyang commented Sep 23, 2015

vagrawal commented Sep 28, 2015