-
Notifications
You must be signed in to change notification settings - Fork 712
Clean up graph traversal code #2488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The fundamental difference between Cabal and cabal-install is that the former deals with installed libraries, and -- in principle -- knows about _library_ dependencies only, whereas the latters deals with setup, executable, test-suite and benchmark dependencies in addition to library dependencies. Currently we classify all of these simply as 'dependencies' but that will change shortly. In this commit we take a first step towards this by moving the PackageFixedDeps class, which deals with dependencies of packages rather than installed libraries, from Cabal to cabal-install. The commit is pretty simple; we just move the type class and update import statements where necessary.
Introduce a new superclass HasInstalledPackageId: class Package pkg => HasInstalledPackageId pkg where installedPackageId :: pkg -> InstalledPackageId class HasInstalledPackageId pkg => PackageInstalled pkg where installedDepends :: pkg -> [InstalledPackageId] Most functions that deal with the package index now just require HasInstalledPackageId; only the functions that actually require the dependencies still rely on PackageInstalled. The point is that a ConfiguredPackage/ReadyPackage/PlanPackage can reasonably be made an instance of HasInstalledPackageId, but not of PackageInstalled; that will be the topic of the next, much larger, pull request.
Both cabal-install and `Cabal` define a notion of a package index. `Cabal` defines data PackageIndex a = PackageIndex !(Map InstalledPackageId a) !(Map PackageName (Map Version [a])) whereas `cabal-install` defines newtype PackageIndex pkg = PackageIndex (Map PackageName [pkg]) Note that Cabal.PackageIndex is indexed by installed package IDs, whereas CabalInstall.PackageIndex is indexed by package names. There are a bunch of "graph traversal" functions that similarly duplicated between `Cabal` and `cabal-install`; in `Cabal` we have brokenPackages :: PackageInstalled a => PackageIndex a -> [(a, [InstalledPackageId])] dependencyClosure :: PackageInstalled a => PackageIndex a -> [InstalledPackageId] -> Either (PackageIndex a) [(a, [InstalledPackageId])] dependencyCycles :: PackageInstalled a => PackageIndex a -> [[a]] dependencyGraph :: PackageInstalled a => PackageIndex a -> (Graph.Graph, Graph.Vertex -> a, InstalledPackageId -> Maybe Graph.Vertex) dependencyInconsistencies :: PackageInstalled a => PackageIndex a -> [(PackageName, [(PackageId, Version)])] reverseDependencyClosure :: PackageInstalled a => PackageIndex a -> [InstalledPackageId] -> [a] reverseTopologicalOrder :: PackageInstalled a => PackageIndex a -> [a] topologicalOrder :: PackageInstalled a => PackageIndex a -> [a] which are mirrored in `cabal-install` as brokenPackages :: PackageFixedDeps pkg => PackageIndex pkg -> [(pkg, [PackageIdentifier])] dependencyClosure :: PackageFixedDeps pkg => PackageIndex pkg -> [PackageIdentifier] -> Either (PackageIndex pkg) [(pkg, [PackageIdentifier])] dependencyCycles :: PackageFixedDeps pkg => PackageIndex pkg -> [[pkg]] dependencyGraph :: PackageFixedDeps pkg => PackageIndex pkg -> (Graph.Graph, Graph.Vertex -> pkg, PackageIdentifier -> Maybe Graph.Vertex) dependencyInconsistencies :: PackageFixedDeps pkg => PackageIndex pkg -> [(PackageName, [(PackageIdentifier, Version)])] reverseDependencyClosure :: PackageFixedDeps pkg => PackageIndex pkg -> [PackageIdentifier] -> [pkg] reverseTopologicalOrder :: PackageFixedDeps pkg => PackageIndex pkg -> [pkg] topologicalOrder :: PackageFixedDeps pkg => PackageIndex pkg -> [pkg] This by itself makes a certain amount of sense, but here's where the situation gets confusing. `cabal-install` defines a `PlanIndex` as type PlanIndex = Cabal.PackageIndex PlanPackage Note that is using `Cabal`'s notion of a PackageIndex, not `cabal-install`'s; it makes sense that a PlanIndex is indexed by installed package IDs rather than package names (even if currently we have to fake installed package IDs. Almost all of the functions listed above, however, are only called on `PlanIndex`s. This means that we invoke the functions from `Cabal`, not the functions from `cabal-install`; in fact, almost all these functions in `cabal-install` are completely unused right now. In `cabal-install` but calls from `Cabal` ---------------------------------------------------------- closed brokenPackages acyclic dependencyCycles consistent dependencyInconsistencies problems brokenPackages', dependencyCycles', dependencyInconsistencies' This is more than just a code clean-up issue. As mentioned in the previous PR, the fundamental difference between Cabal and cabal-install is their view of dependencies: Cabal knows only about installed libraries and their library dependencies, whereas cabal knows about packages and the dependencies of their setup scripts, executables, test-suites, benchmarks, as well as their library dependencies. By calling the graph-traversal functions from `Cabal` rather than from `cabal-install`, any of these additional dependencies are either completely ignored, or else the distinction is lost (depending on how we implemented installedDepends for plan packages); and neither option is correct. For example, in `new` from Distribution.Client.InstallPlan (in `cabal-install`) we call `dependendyGraph` on the plan index; since the plan index is defined in terms of Cabal's plan index, we call Cabal's `dependencyGraph` here, but that means that this graph will completely lack any setup dependencies. The reverse graph is used in (only one place): `packagedThatDependOn`, which in turn is (only) used in `failed`. But this is wrong: if a package fails to install, if another package depends on it through a setup dependency, then that second package should also be marked as impossible to install. What needs to happen is that we modify the graph traversal functions from `cabal-install` to take a PackageIndex from `Cabal` (so that we can apply them to a PlanIndex), but use the dependencies from `FixedPackageDeps` rather than the flat or incomplete dependencies we get from `PackageInstalled`. In fact, the whole `PackageInstalled` instance for `ConfiguredPackage`, `ReadyPackage` and `PlanPackage` should go: returning only part of the dependencies, or else all dependencies flattened, is just too error prone. This first commit only documents the problem (this commit message) and moves the above functions to a new module called Distribution.Client.PlanIndex. Cleaning this up is complicated by the fact that we _do_ still call two of the above functions on a `CabalInstall.PackageIndex`: * `pruneInstallPlan` from `Distribution.Client.Freeze` calls `dependencyClosure` * The top-down solver calls `dependencyGraph` If we change the above functions to work on a `Cabal.PackageIndex` instead these two exceptions will break, so we need to look at that first.
Introduce dependencyClosure :: InstallPlan -> [PackageIdentifier] -> Either (PackageIndex PlanPackage) [(PlanPackage, [InstalledPackageId])] And use this in the definition of `pruneInstallPlan` in `freeze`, to avoid first converting an install plan from a `Cabal.PackageIndex` to a `CabalInstall.PackageIndex`. This resolves the first of the two irregularities mentioned in the previous commit.
Give the top-down solver it's own copy of `dependencyGraph`. This means that we now have three independent implementations of `dependencyGraph`: - `dependencyGraph` in `Cabal` takes a package index indexed by installed package IDs and only has access to library dependencies. - `dependencyGraph` in `Distribution.Client.PlanIndex` in `cabal-install` takes a package index indexed by installed package IDs and has access to all dependencies. - `dependencyGraph` in the top-down solver in `cabal-install` takes a package index indexed by package _names_, and has access to all dependencies. Ideally we would switch the top-down solver over to use a package indexed by installed package IDs, so that this duplication could be avoided, but that's a bit of work and the top-down solver is legacy code anyway. Can still do that later, of course. Moreover, this makes the top-down solver monomorphic where possible, and introduce its own SourceDeps class so that it is independent of the FixedDeps class (which we will change over to use InstalledPackageIds instead).
Introduce ConfiguredId: -- | A ConfiguredId is a package ID for a configured package. -- -- Once we configure a source package we know it's InstalledPackageId (at -- least, in principle, even if we have to fake it currently). It is still -- however useful in lots of places to also know the source ID for the -- package. We therefore bundle the two. -- -- An already installed package of course is also "configured" (all it's -- configuration parameters and dependencies have been specified). -- -- TODO: I wonder if it would make sense to promote this datatype to Cabal -- and use it consistently instead of InstalledPackageIds? data ConfiguredId = ConfiguredId { confSrcId :: PackageId , confInstId :: InstalledPackageId } And use it for ConfiguredPackage. As the comment says, though, I wonder if we should use this in more places. One slightly tricky thing here is that the output of both solvers had to be modified to keep installed package IDs where possible; in the modular solver this was easy enough, as it does this properly, but in the top-down solver this is a bit of a hack; however, I’ve documented the hack in detail inline in the code. NOTE: Although this change is currently mostly cosmetic, in the future, once we drop the single instance restriction, it is very important that we don't convert from installed package IDs to source IDs and then back to installed package IDs, as this conversion will be lossy.
Take advantage in cabal-install of the new HasInstalledPackageId/PackagedInstall split in Cabal. The graph traversal functions in cabal-install, previously redundant, are now back in use. Their types match the ones in Cabal, with only the difference in the PackageInstalled (Cabal) versus PackageFixedDeps (cabal-install) type class. The only PackageInstalled instance left in Cabal is for InstalledPackage, which is a thin wrapper around InstalledPackageInfo; with these refactorings in place, InstalledPackage is there only to support the TopDown solver. The fact that we won't have PackageInstalled instances anymore for PlanPackage and co means that we are forced to call the correct graph traversal functions (from cabal-install, rather than from Cabal).
Build fails on Travis, it looks like you forgot to update |
I am assuming you mean |
/cc @ezyang |
Oof, now I feel a bit bad about myself, because the Cabal/cabal-install package index unification was my fault, and it looks like I did get it wrong (for precisely the reasons described in 1beba1b). It would be good if we had some tests for those dependency cases, they're easy to forget about. One thing I would like to know (for my own curiosity): does this patchset fix bugs, or is it purely refactoring? It sure looks like it fixes bugs... |
@ezyang So far in this patch set no real behavioural changes, I think -- except from "Avoid forgetting installed IDs" edsko@daecdef perhaps, but even there we were okay so far because the roundtrip installed ID -> fake ID -> installed ID is lossless as long as the single instance restriction applies. Real changes coming up soon in the next patch set, and that will also include (the start of) a unit test suite for the solver. |
Looks like various |
|
@23Skidoo done. |
@tibbe Thanks! |
This addresses commit commemts * #2488 (comment) * #2488 (comment) * #2488 (comment) from @23Skidoo.
@23Skidoo I've addressed your comments in edsko@0ff6341. This leaves:
and potentially (less important, I think)
I will deal try to remove As regards this comment:
Yes, so far, but that will change in the next PR (or perhaps the one after that) once we start changing |
Uh, no idea why Travis is failing, I didn't change nuthin' :) |
Sorry, the Travis breakage was my fault. Should be fixed now.
Sure, if that's your preference. You may want to create a ticket and assign yourself to it so that you don't forget about the |
It's now only used in cabal-install.
I'm ok with this. |
Clean up graph traversal code
OK, let's merge. Potential comments from @dcoutts will be addressed later. |
Thanks! Note that @dcoutts will be unavailable for the next two weeks but I have discussed this PR, as well as the most of the ones that will follow, with him in detail. |
This pull request is the second of a set of pull requests working towards an implementation of setup dependencies, and depends on #2487 .
There is a lot of duplication between Cabal and cabal-install, and in fact, a lot of dead code in cabal-install. There is still duplication, but no dead code anymore, and at least the situation is clarified now. Details in the commit messages.
/cc @dcoutts @kosmikus