Skip to content

Clean up graph traversal code #2488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 26, 2015
Merged

Clean up graph traversal code #2488

merged 10 commits into from
Mar 26, 2015

Conversation

edsko
Copy link
Contributor

@edsko edsko commented Mar 21, 2015

This pull request is the second of a set of pull requests working towards an implementation of setup dependencies, and depends on #2487 .

There is a lot of duplication between Cabal and cabal-install, and in fact, a lot of dead code in cabal-install. There is still duplication, but no dead code anymore, and at least the situation is clarified now. Details in the commit messages.

/cc @dcoutts @kosmikus

edsko added 7 commits March 21, 2015 12:37
The fundamental difference between Cabal and cabal-install is that the former
deals with installed libraries, and -- in principle -- knows about _library_
dependencies only, whereas the latters deals with setup, executable, test-suite
and benchmark dependencies in addition to library dependencies. Currently we
classify all of these simply as 'dependencies' but that will change shortly.

In this commit we take a first step towards this by moving the PackageFixedDeps
class, which deals with dependencies of packages rather than installed
libraries, from Cabal to cabal-install.

The commit is pretty simple; we just move the type class and update import
statements where necessary.
Introduce a new superclass HasInstalledPackageId:

    class Package pkg => HasInstalledPackageId pkg where
      installedPackageId :: pkg -> InstalledPackageId

    class HasInstalledPackageId pkg => PackageInstalled pkg where
      installedDepends :: pkg -> [InstalledPackageId]

Most functions that deal with the package index now just require
HasInstalledPackageId; only the functions that actually require the
dependencies still rely on PackageInstalled.

The point is that a ConfiguredPackage/ReadyPackage/PlanPackage can reasonably
be made an instance of HasInstalledPackageId, but not of PackageInstalled; that
will be the topic of the next, much larger, pull request.
Both cabal-install and `Cabal` define a notion of a package index.
`Cabal` defines

    data PackageIndex a = PackageIndex !(Map InstalledPackageId a) !(Map PackageName (Map Version [a]))

whereas `cabal-install` defines

    newtype PackageIndex pkg = PackageIndex (Map PackageName [pkg])

Note that Cabal.PackageIndex is indexed by installed package IDs, whereas
CabalInstall.PackageIndex is indexed by package names.

There are a bunch of "graph traversal" functions that similarly duplicated
between `Cabal` and `cabal-install`; in `Cabal` we have

    brokenPackages            :: PackageInstalled a => PackageIndex a -> [(a, [InstalledPackageId])]
    dependencyClosure         :: PackageInstalled a => PackageIndex a -> [InstalledPackageId] -> Either (PackageIndex a) [(a, [InstalledPackageId])]
    dependencyCycles          :: PackageInstalled a => PackageIndex a -> [[a]]
    dependencyGraph           :: PackageInstalled a => PackageIndex a -> (Graph.Graph, Graph.Vertex -> a, InstalledPackageId -> Maybe Graph.Vertex)
    dependencyInconsistencies :: PackageInstalled a => PackageIndex a -> [(PackageName, [(PackageId, Version)])]
    reverseDependencyClosure  :: PackageInstalled a => PackageIndex a -> [InstalledPackageId] -> [a]
    reverseTopologicalOrder   :: PackageInstalled a => PackageIndex a -> [a]
    topologicalOrder          :: PackageInstalled a => PackageIndex a -> [a]

which are mirrored in `cabal-install` as

    brokenPackages            :: PackageFixedDeps pkg => PackageIndex pkg -> [(pkg, [PackageIdentifier])]
    dependencyClosure         :: PackageFixedDeps pkg => PackageIndex pkg -> [PackageIdentifier] -> Either (PackageIndex pkg) [(pkg, [PackageIdentifier])]
    dependencyCycles          :: PackageFixedDeps pkg => PackageIndex pkg -> [[pkg]]
    dependencyGraph           :: PackageFixedDeps pkg => PackageIndex pkg -> (Graph.Graph, Graph.Vertex -> pkg, PackageIdentifier -> Maybe Graph.Vertex)
    dependencyInconsistencies :: PackageFixedDeps pkg => PackageIndex pkg -> [(PackageName, [(PackageIdentifier, Version)])]
    reverseDependencyClosure  :: PackageFixedDeps pkg => PackageIndex pkg -> [PackageIdentifier] -> [pkg]
    reverseTopologicalOrder   :: PackageFixedDeps pkg => PackageIndex pkg -> [pkg]
    topologicalOrder          :: PackageFixedDeps pkg => PackageIndex pkg -> [pkg]

This by itself makes a certain amount of sense, but here's where the situation
gets confusing. `cabal-install` defines a `PlanIndex` as

    type PlanIndex = Cabal.PackageIndex PlanPackage

Note that is using `Cabal`'s notion of a PackageIndex, not `cabal-install`'s; it
makes sense that a PlanIndex is indexed by installed package IDs rather than
package names (even if currently we have to fake installed package IDs.

Almost all of the functions listed above, however, are only called on
`PlanIndex`s. This means that we invoke the functions from `Cabal`, not the
functions from `cabal-install`; in fact, almost all these functions in
`cabal-install` are completely unused right now.

    In `cabal-install`     but calls from `Cabal`
    ----------------------------------------------------------
    closed                 brokenPackages
    acyclic                dependencyCycles
    consistent             dependencyInconsistencies
    problems               brokenPackages', dependencyCycles',
                             dependencyInconsistencies'

This is more than just a code clean-up issue. As mentioned in the previous PR,
the fundamental difference between Cabal and cabal-install is their view of
dependencies: Cabal knows only about installed libraries and their library
dependencies, whereas cabal knows about packages and the dependencies of their
setup scripts, executables, test-suites, benchmarks, as well as their library
dependencies.

By calling the graph-traversal functions from `Cabal` rather than from
`cabal-install`, any of these additional dependencies are either completely
ignored, or else the distinction is lost (depending on how we implemented
installedDepends for plan packages); and neither option is correct.

For example, in `new` from Distribution.Client.InstallPlan (in `cabal-install`)
we call `dependendyGraph` on the plan index; since the plan index is defined in
terms of Cabal's plan index, we call Cabal's `dependencyGraph` here, but that
means that this graph will completely lack any setup dependencies. The reverse
graph is used in (only one place): `packagedThatDependOn`, which in turn is
(only) used in `failed`. But this is wrong: if a package fails to install, if
another package depends on it through a setup dependency, then that second
package should also be marked as impossible to install.

What needs to happen is that we modify the graph traversal functions from
`cabal-install` to take a PackageIndex from `Cabal` (so that we can apply them
to a PlanIndex), but use the dependencies from `FixedPackageDeps` rather than
the flat or incomplete dependencies we get from `PackageInstalled`. In fact,
the whole `PackageInstalled` instance for `ConfiguredPackage`, `ReadyPackage`
and `PlanPackage` should go: returning only part of the dependencies, or else
all dependencies flattened, is just too error prone.

This first commit only documents the problem (this commit message) and moves the
above functions to a new module called Distribution.Client.PlanIndex.

Cleaning this up is complicated by the fact that we _do_ still call two of the
above functions on a `CabalInstall.PackageIndex`:

* `pruneInstallPlan` from `Distribution.Client.Freeze` calls `dependencyClosure`
* The top-down solver calls `dependencyGraph`

If we change the above functions to work on a `Cabal.PackageIndex` instead these
two exceptions will break, so we need to look at that first.
Introduce

    dependencyClosure :: InstallPlan
                      -> [PackageIdentifier]
                      -> Either (PackageIndex PlanPackage) [(PlanPackage, [InstalledPackageId])]

And use this in the definition of `pruneInstallPlan` in `freeze`, to avoid
first converting an install plan from a `Cabal.PackageIndex` to a
`CabalInstall.PackageIndex`.

This resolves the first of the two irregularities mentioned in the previous
commit.
Give the top-down solver it's own copy of `dependencyGraph`. This means that we
now have three independent implementations of `dependencyGraph`:

- `dependencyGraph` in `Cabal` takes a package index indexed by installed
  package IDs and only has access to library dependencies.
- `dependencyGraph` in `Distribution.Client.PlanIndex` in `cabal-install` takes
  a package index indexed by installed package IDs and has access to all
  dependencies.
- `dependencyGraph` in the top-down solver in `cabal-install` takes a package
  index indexed by package _names_, and has access to all dependencies.

Ideally we would switch the top-down solver over to use a package indexed by
installed package IDs, so that this duplication could be avoided, but that's a
bit of work and the top-down solver is legacy code anyway. Can still do that
later, of course.

Moreover, this makes the top-down solver monomorphic where possible, and
introduce its own SourceDeps class so that it is independent of the FixedDeps
class (which we will change over to use InstalledPackageIds instead).
Introduce ConfiguredId:

    -- | A ConfiguredId is a package ID for a configured package.
    --
    -- Once we configure a source package we know it's InstalledPackageId (at
    -- least, in principle, even if we have to fake it currently). It is still
    -- however useful in lots of places to also know the source ID for the
    -- package.  We therefore bundle the two.
    --
    -- An already installed package of course is also "configured" (all it's
    -- configuration parameters and dependencies have been specified).
    --
    -- TODO: I wonder if it would make sense to promote this datatype to Cabal
    -- and use it consistently instead of InstalledPackageIds?
    data ConfiguredId = ConfiguredId {
        confSrcId  :: PackageId
      , confInstId :: InstalledPackageId
      }

And use it for ConfiguredPackage. As the comment says, though, I wonder if we
should use this in more places.

One slightly tricky thing here is that the output of both solvers had to be
modified to keep installed package IDs where possible; in the modular solver
this was easy enough, as it does this properly, but in the top-down solver this
is a bit of a hack; however, I’ve documented the hack in detail inline in the
code.

NOTE: Although this change is currently mostly cosmetic, in the future, once we
drop the single instance restriction, it is very important that we don't
convert from installed package IDs to source IDs and then back to installed
package IDs, as this conversion will be lossy.
Take advantage in cabal-install of the new
HasInstalledPackageId/PackagedInstall split in Cabal.  The graph traversal
functions in cabal-install, previously redundant, are now back in use. Their
types match the ones in Cabal, with only the difference in the PackageInstalled
(Cabal) versus PackageFixedDeps (cabal-install) type class.

The only PackageInstalled instance left in Cabal is for InstalledPackage, which
is a thin wrapper around InstalledPackageInfo; with these refactorings in
place, InstalledPackage is there only to support the TopDown solver. The fact
that we won't have PackageInstalled instances anymore for PlanPackage and co
means that we are forced to call the correct graph traversal functions (from
cabal-install, rather than from Cabal).
@23Skidoo
Copy link
Member

Build fails on Travis, it looks like you forgot to update exposed-modules in cabal-install.cabal.

@edsko
Copy link
Contributor Author

edsko commented Mar 23, 2015

I am assuming you mean other-modules; sorry about that. Added now.

@23Skidoo
Copy link
Member

/cc @ezyang

@ezyang
Copy link
Contributor

ezyang commented Mar 24, 2015

Oof, now I feel a bit bad about myself, because the Cabal/cabal-install package index unification was my fault, and it looks like I did get it wrong (for precisely the reasons described in 1beba1b). It would be good if we had some tests for those dependency cases, they're easy to forget about.

One thing I would like to know (for my own curiosity): does this patchset fix bugs, or is it purely refactoring? It sure looks like it fixes bugs...

@edsko
Copy link
Contributor Author

edsko commented Mar 24, 2015

@ezyang So far in this patch set no real behavioural changes, I think -- except from "Avoid forgetting installed IDs" edsko@daecdef perhaps, but even there we were okay so far because the roundtrip installed ID -> fake ID -> installed ID is lossless as long as the single instance restriction applies.

Real changes coming up soon in the next patch set, and that will also include (the start of) a unit test suite for the solver.

@23Skidoo
Copy link
Member

There is still duplication, but no dead code anymore

Looks like various FakeMap-related functions from D.S.PackageIndex (fakeLookupInstalledPackageId, brokenPackages', dependencyClosure', ...) are now either no longer used or only used inside cabal-install. Are they used inside GHC? If not, can we move all FakeMap-related machinery to PlanIndex?

@ezyang
Copy link
Contributor

ezyang commented Mar 24, 2015

FakeMap machinery was specifically for cabal-install, no they are not used in GHC.

@23Skidoo
Copy link
Member

@ezyang
Thanks, makes sense.

@edsko
LGTM modulo some comments. Will wait for a comment from @kosmikus or @dcoutts before merging.

@tibbe
Could you please give @edsko commit rights to the Cabal repo?

@tibbe
Copy link
Member

tibbe commented Mar 24, 2015

@23Skidoo done.

@23Skidoo
Copy link
Member

@tibbe Thanks!

@edsko
Copy link
Contributor Author

edsko commented Mar 25, 2015

@23Skidoo I've addressed your comments in edsko@0ff6341. This leaves:

  • Getting rid of the FakeMap stuff from Cabal

and potentially (less important, I think)

  • Change the names of PackageInstalled and PackageFixedDeps to make it more obvious that these classes are related
  • Use ConfiguredId instead of InstalledPackageId throughout
  • Use Data.Set or ordNub in the definition of resolveInstalledIds

I will deal try to remove FakeMap from Cabal, but I'm keen to get the rest of my changes merged in as well, all the above is just preliminary to the real work :) So I suggest I submit those other two PRs first and then once we're all happy with that I will try to remove FakeMap from Cabal. I suggest we leave the rest until later.

As regards this comment:

brokenPackages is just copy-pasted brokenPackages' from D.S.PackageIndex, correct? And analogously for the rest of the functions in this module that now accept FakeMap?

Yes, so far, but that will change in the next PR (or perhaps the one after that) once we start changing PackageFixedDeps to actually have more fine-grained dependencies.

@edsko
Copy link
Contributor Author

edsko commented Mar 25, 2015

Uh, no idea why Travis is failing, I didn't change nuthin' :)

@23Skidoo
Copy link
Member

Sorry, the Travis breakage was my fault. Should be fixed now.

So I suggest I submit those other two PRs first and then once we're all happy with that I will try to remove FakeMap from Cabal. I suggest we leave the rest until later.

Sure, if that's your preference. You may want to create a ticket and assign yourself to it so that you don't forget about the FakeMap issue.

It's now only used in cabal-install.
@edsko
Copy link
Contributor Author

edsko commented Mar 26, 2015

@23Skidoo, @ezyang Ok, removed FakeMap completely from Cabal, it's now only used in cabal-install. So are we okay to merge this PR (and #2487), pending OK from @kosmikus or @dcoutts ?

@23Skidoo
Copy link
Member

@edsko

So are we okay to merge this PR (and #2487), pending OK from @kosmikus or @dcoutts ?

Yes, everything looks good to me.

@kosmikus
Copy link
Contributor

I'm ok with this.

@edsko
Copy link
Contributor Author

edsko commented Mar 26, 2015

@23Skidoo Since @kosmikus is okay with this, will we merge?

23Skidoo added a commit that referenced this pull request Mar 26, 2015
@23Skidoo 23Skidoo merged commit 26017fe into haskell:master Mar 26, 2015
@23Skidoo
Copy link
Member

OK, let's merge. Potential comments from @dcoutts will be addressed later.

@edsko
Copy link
Contributor Author

edsko commented Mar 26, 2015

Thanks!

Note that @dcoutts will be unavailable for the next two weeks but I have discussed this PR, as well as the most of the ones that will follow, with him in detail.

@edsko edsko deleted the pr/graph-traversal branch March 27, 2015 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants