Skip to content

Conversation

@mpizenberg
Copy link
Member

No description provided.

@Eh2406
Copy link
Member

Eh2406 commented Aug 10, 2021

I think the interesting case is:

  • Root depends on A@* and B@*
  • A depends on X@1.0andC@*`
  • B depends on X@2.0andC@*`
  • C depends on X@*
  • X comes in v1.0 and v2.0 and nether have dependencies

@mpizenberg
Copy link
Member Author

I think the interesting case is: ...

And which of these are public/private?

When running this with everything public:

        index.add_deps("root", (1, 0, 0), &[("a", Public, ..)]);
        index.add_deps("root", (1, 0, 0), &[("b", Public, ..)]);
        index.add_deps("a", (1, 0, 0), &[("x", Public, (1, 0, 0)..(1, 0, 1))]);
        index.add_deps("a", (1, 0, 0), &[("c", Public, ..)]);
        index.add_deps("b", (1, 0, 0), &[("x", Public, (2, 0, 0)..(2, 0, 1))]);
        index.add_deps("b", (1, 0, 0), &[("c", Public, ..)]);
        index.add_deps("c", (1, 0, 0), &[("x", Public, ..)]);
        index.add_deps::<R>("x", (1, 0, 0), &[]);
        index.add_deps::<R>("x", (2, 0, 0), &[]);

The solver fails with the following error message:

Because b$root@1.0.0 seeded depends on x$root@1.0.0 seeded 2.0.0 and a$root@1.0.0 seeded 1.0.0 depends on x$root@1.0.0 seeded 1.0.0, b$root@1.0.0 seeded ∗, a$root@1.0.0 seeded 1.0.0 are incompatible.
And because root$root@1.0.0 seeded 1.0.0 depends on a$root@1.0.0 seeded and root$root@1.0.0 seeded 1.0.0 depends on b$root@1.0.0 seeded, root$root@1.0.0 seeded 1.0.0 is forbidden.

So in less strict language, it says: "Because b depends on x @ 2 and a depends on x @ 1, b and a are incompatible. And because root depends on both a and b, root is forbidden".

@mpizenberg
Copy link
Member Author

If I make the c-x dependency private, it also fails saying

Because b$root@1.0.0 seeded depends on x$root@1.0.0 seeded 2.0.0 and a$root@1.0.0 seeded 1.0.0 depends on x$root@1.0.0 seeded 1.0.0, b$root@1.0.0 seeded ∗, a$root@1.0.0 seeded 1.0.0 are incompatible.
And because root$root@1.0.0 seeded 1.0.0 depends on a$root@1.0.0 seeded and root$root@1.0.0 seeded 1.0.0 depends on b$root@1.0.0 seeded, root$root@1.0.0 seeded 1.0.0 is forbidden.

Traduction: "Because b depends on x @ 2 and a depends on x @ 1, b and a are incompatible. And because root depends on both a and b, root is forbidden"

@Eh2406
Copy link
Member

Eh2406 commented Aug 10, 2021

And which of these are public/private?

Oops, yes I forgot to say that. All private, except the C on X witch is public. To have it in one place:

  • Root privately depends on A@* and B@*
  • A privately depends on X@1.0andC@*`
  • B privately depends on X@2.0andC@*`
  • C publicly depends on X@*
  • X comes in v1.0 and v2.0 and nether have dependencies

@mpizenberg
Copy link
Member Author

It seems to find the following solution, is that what you were expecting?

        let mut index = Index::new();
        index.add_deps("root", (1, 0, 0), &[("a", Private, ..)]);
        index.add_deps("root", (1, 0, 0), &[("b", Private, ..)]);
        index.add_deps("a", (1, 0, 0), &[("x", Private, (1, 0, 0)..(1, 0, 1))]);
        index.add_deps("a", (1, 0, 0), &[("c", Private, ..)]);
        index.add_deps("b", (1, 0, 0), &[("x", Private, (2, 0, 0)..(2, 0, 1))]);
        index.add_deps("b", (1, 0, 0), &[("c", Private, ..)]);
        index.add_deps("c", (1, 0, 0), &[("x", Public, ..)]);
        index.add_deps::<R>("x", (1, 0, 0), &[]);
        index.add_deps::<R>("x", (2, 0, 0), &[]);
        let solution = resolve(&index, "root$root@1.0.0 seeded", (1, 0, 0));
        assert_map_eq(
            &solution.unwrap(),
            &select(&[
                ("root$root@1.0.0 final", (1, 0, 0)),
                ("a$root@1.0.0 final", (1, 0, 0)),
                ("b$root@1.0.0 final", (1, 0, 0)),
                ("c$a@1.0.0 final", (1, 0, 0)),
                ("c$b@1.0.0 final", (1, 0, 0)),
                ("x$a@1.0.0 final", (1, 0, 0)),
                ("x$b@1.0.0 final", (2, 0, 0)),
            ]),
        );

@Eh2406
Copy link
Member

Eh2406 commented Aug 11, 2021

So that is a solution. It implies that c will be built twice. Once depending on x v1 for a and once depending on x v2 for b .
I have been assuming that this case will lead to an error instead of duplication. But now that I am writing this up, I don't know if that is clear cut in the RFC or just my assumption. Cargo had no support for duplication when this RFC was written, but it has added some with the resolver="2" work.

Of course, the cost of duplication vs the cost of error is a engineering decision for each community to make. Even if this does not work for what Cargo wants, it is worth documenting in case it is what some other package manager wants.

@mpizenberg
Copy link
Member Author

mpizenberg commented Aug 11, 2021

So that is a solution. It implies that c will be built twice

Isn't it already the case with cargo today? I mean if the two versions of "x" span different major buckets I thought cargo was ok with this? Of course if they were within the same major bucket, cargo would not have a solution, while this would still has one. To me this can change by simply imposing the bucket constraint in addition to the public/private constraint. Meaning, instead of each Final package having no dependency, it would have a dependency to the corresponding bucket. And this means than even accross private boundaries, there would be only one version by major bucket allowed. To me this would be the behavior presented in the RFC, but that's going a bit further than the "pure" public/private behavior that I'm presenting in this section.

@Eh2406
Copy link
Member

Eh2406 commented Aug 11, 2021

I think the technique from this section is useful, and I am very happy you are documenting it! (Whether or not it is useful to Cargo.)

Isn't it already the case with cargo today?

Cargo ends up putting (name, leading_non_zero_ver) in a HashSet. So the two versions of X are accounted for.
It does not handle (at the dependency resolver level) having two versions of C at the same version, that only differ in what dependencies they have. So with stable Cargo that ignores public/private C gets built once with X v2.0 as a dep and A can get a type mismatch error. If we turn on the alfa public/private support, then we get a error (with a alfa quality useless message) because C can not export the same dependency on X as 1.0 and 2.0. I will need to reread the RFC to see if that error is in the text or just my interpretation. It may be duplicating more of the dependency tree is an ok interpretation of the RFC, and I will need to remove that HashMap.

@mpizenberg
Copy link
Member Author

It does not handle (at the dependency resolver level) having two versions of C at the same version

ohhh right. I mixed the fact that there can be two versions of x with the fact that there can be two compilations (with different deps) for the same version.

And now I remember that resolver v2 is especially useful for that situation arise quite often with cargo workspaces with different set of features.

@mpizenberg
Copy link
Member Author

mpizenberg commented Aug 11, 2021

One interesting thing with the technique described in the public/private section of the guide, is that it will tell you the version you need for the subgraph of each private dependency. So for example, with that dependency graph (equal signs are private deps):

           ==f1
          /
r---a1---b1---c1---d1
     \         \
      ==e1      ==g1
         \
          --h1---i1===d2

it will tell you the solution is:

("root$root@1.0.0", (1, 0, 0)),
("a$root@1.0.0", (1, 0, 0)),
("b$root@1.0.0$a@1.0.0", (1, 0, 0)),
("c$root@1.0.0$a@1.0.0$b@1.0.0", (1, 0, 0)),
("d$root@1.0.0$a@1.0.0$b@1.0.0$c@1.0.0", (1, 0, 0)),
("e$a@1.0.0", (1, 0, 0)),
("f$b@1.0.0", (1, 0, 0)),
("g$c@1.0.0", (1, 0, 0)),
("h$a@1.0.0", (1, 0, 0)),
("i$a@1.0.0", (1, 0, 0)),
("d$i@1.0.0", (2, 0, 0)),

So for the subgraph of "root", you need "a", "b", "c", and "d" at versions 1. For the subgraph of "a" you need "e" at version 1, ... for the subgraph "i" you need "d" at version 2.

@mpizenberg
Copy link
Member Author

It may be duplicating more of the dependency tree is an ok interpretation of the RFC, and I will need to remove that HashMap.

I would tend to believe that the intent was not to duplicate deps. Because when we think about it, it's like saying that there can be only one version per major bucket, and here we have two for "c". Well, two times the same version, but two versions anyway, and I suppose that's what the "max one version per major bucket" rule is trying to avoid?

@mpizenberg
Copy link
Member Author

It does not handle (at the dependency resolver level) having two versions of C at the same version, that only differ in what dependencies they have.

With the current seed markers implementation, it is not possible to prevent duplicate versions in two different public subgraphs, as if we wanted to prevent a duplicate C@1 in you example, one for each version of X. Or at least, I have tried to think of ways to add this restriction and I can't think of one that would work. It's possible to prevent different versions (as in one per bucket), but not the same ones.

That is because the conversion from Markers(Set<Seed>) with multiple seed markers to Constraint(Seed) with one constraint for each seed happens for the current package version for which we are asking dependencies.
That is because a package version that was picked with multiple seed markers cannot say both at the same time:

  • it depends on this package version for each of those seed markers (to make pubgrub check version uniqueness per seed / public subgraph)
  • and two of these markers for the same version are incompatible

since the two are contradictory.

But something would be possible if instead of generating Constraint variants for a package version picked, we were generating those for dependency ranges. Concretely, if "a" @ 1 depends on "b" in 1..2, the current approach is:

  • "a$seed1$seed2" @ 1 depends on "a$seed1" @ 1 and "a$seed2" @ 1 and "b$seed1$seed2" in 1..2
  • pick "b$seed1$seed2" @ 1
  • "b$seed1$seed2" @ 1 depends on "b$seed1" @ 1 and "b$seed2" @ 1

And with the dependency ranges approach, it would be instead:

  • "a$seed1$seed2" @ 1 depends "b$seed1$seed2" in 1..2 and "b$seed1" in 1..2 and "b$seed2" in 1..2

So now instead of requiring "b$seed1" and "b$seed2" at the exact same version 1, we require both in range 1..2. We could then pick version 1 for "b$seed1". Then when asking about its dependencies, instead of saying it has no dependencies, it could say it is incompatible with each of the other existing seed variants for that same version. I can see two problems though.

  1. First, it requires changing get_dependencies from results with ranges to result with terms. That's doable and in my opinion this change could also be useful in other places.
  2. Second, this is not backtrack-proof. Since a dependency provider would need to keep track of already existing seed variants to emit those incompatibilities. And after backtracking, we might need to ask again, but the pubgrub algorithm is caching those and will not ask twice for the same package. This could require another change, where instead of just returning the dependencies, the provider also says if it is safe to cache those or not. If not, pubgrub will always call get_dependencies for this package. But prevent caching could prevent one of our earliest big perf improvements.

Otherwise, we need an immutable way of expressing those incompatibilities. I think that could also be done via different virtual versions generated on the fly by the provider. For each package version, we create a unique "package@v" package. This package would exist in one virtual version per seed. I think that would be backtracking-proof and not even require terms.

@mpizenberg mpizenberg marked this pull request as ready for review August 13, 2021 16:10
@mpizenberg
Copy link
Member Author

I think this section is now ready for review. It does not cover exactly the public/private feature of the rust RFC but is interesting on its own.

@Eh2406
Copy link
Member

Eh2406 commented Aug 17, 2021

Sorry for the delayed review. This is definitely interesting and worth documenting in the guide!
My one comment is that the text could use an explanation of how the Markers part of PkgSeeds works and why it is needed.

@Ericson2314
Copy link

The "seed" method is sound, but overly conservative. The public dependency graph does can have multiple roots per connected component, and each of those roots induces via transitive closure something that must be coherent (single versions), but where those single-root closures only partially overlap, they need be only partially coherent.

Concretely the quasi-diamond:

b -> d public
b -> x  ==1 public
c -> d public
c -> x  ==2 public
a -> b private
a -> c private

is fine, even if a and b are resolved to get the "same" d.

@mpizenberg
Copy link
Member Author

I don't understand the "partial" overlap argument, could you reformulate?

The quasi-diamond described above does not sound fine to me since a can access both x@1 and x@2 which is not allowed by the semantics described in this section.

      --d
     /
  ==b---x1
a<
  ==c---x2
     \
      --d

@Ericson2314 I did not mention it explicitly in the rust RFC issue but this section does not correspond exactly to the behavior of the public/private described for cargo. This is more of a "pure" public/private scheme. Adjustments are possible, and some are discussed in messages above like this one to adapt it to some of cargo specifics.

@Ericson2314
Copy link

Ericson2314 commented Aug 19, 2021

Sorry, my diamond was wrong, but more indirection should fix it:

d -> f public
d -> x  ==1 public
e -> f public
e -> x  ==2 public
b -> d private
c -> e private
a -> b public or private, doesn't matter
a -> c public or private, doesn't matter

the "views" that must be consistent are (Private ∪ Id) . Public*, which I had messed up on you got right. Now a cannot witness d or e via b and c.

@Eh2406
Copy link
Member

Eh2406 commented Aug 20, 2021

I don't see the problem. We will end up with 2 independent copies of f one seeded at b and one seeded at c, and the same thing will happen to x. How is this an example of it being overly conservative?

@Ericson2314
Copy link

Well ti's the fate of f (formerly d) that i was trying to ask about. I think I can make a still more complex example to make that clearer:

d -> f public
d -> x  ==1 public
e -> f public
e -> x  ==2 public
b -> d private
c -> e private
b -> f public
c -> f public
a -> b public
a -> c public

OK, so now b and c depend publicly on f, so it is essential that f for those and d and e all be the same f.

I mentioned (Private ∪ Id) . Public* above. This relation induce the per-package views that must be consistent. Calling that relation V, that give us:

  • d V = {d, f, x}
  • e V = {e, f, x}
  • b V = {b, d, f, x}
  • c V = {c, e, f, x}
  • a V = {b, c, f}

Each of those views must be assigned consistent versions, and those views overlap. Solving that, we must have one globally unique f, but thankfully we never have {d, e, x} included in any one of those sets, so we can assign x two different ways as required.

@Ericson2314
Copy link

The underlying "Conflict-driven clause learning" I think can handle this fine, but trying to smash this into PubGrub as is strikes me as a awkward.

It's a non-trivial upfront investment, but I feel like the right approach is this:

  1. Observe that PubGrub is an SMT solver for a single theory, based on CDCL which is a SAT technique.
  2. Try to factor apart the theory-agnostic parts of pubrub-rs unto a CDCL crate.
  3. Formulate the theory of public and private deps more directly
  4. Plug that back in with the CLDL crate analogous to how PubGrup is CDCL + the theory of public deps.

@mpizenberg
Copy link
Member Author

I think I understand your point but I'm not sure having the minimal amount of clauses is that much efficient when building these clauses is more expensive (traversing the graph). It might be, then great, but another thing is I don't have any experience in SMT solver theory or CDCL theory except what I had to understand while building pubgrub-rs and writing its guide. Though if somebody implements a CDCL crate and shows us how to replace parts of pubgrub with it, while having comparable or better performances that would be very welcome!

@mpizenberg
Copy link
Member Author

By the way, from what I gathered, pugrub uses CDNL instead of CDCL. CDCL builds a solution to satisfy a conjunctive normal form (conjunction of clauses) while CDNL builds a solution to unsatisfy a disjunctive normal form (disjunction of nogoods).

In addition, pugrub is a lazy CDNL algorithm since the disjunction of nogoods (incompatibilities) is built on the fly with the solution.

@Eh2406
Copy link
Member

Eh2406 commented Aug 20, 2021

That is an interesting case! I think I will need to understand much better how Markers work in order to know what this proposal will do. More generally these kinds of complex examples are why I will not be convinced we have nailed the semantics correctly until we have prop/fuzz test that compare to a SMT/SAT definition. (I have the test in the Cargo repo, waiting for an implementation that works.) (To be clear it is worth documenting, even if I don't have confidence that I know the semantics.)

The talk of SMT theorys in the abstract tends to make my head spin. But I think I am agreeing with you if I say, a good solution for some of the harder resolvers will involve some kind of lower level API to the pubgrub algorithm, one based on "Incompatibilitys" directly instead of using dependencies. That is not surprising to me the point of the "Advanced usage and limitations" section of the guide is to explore what changes to the API may be needed to best accommodate all users.

@mpizenberg
Copy link
Member Author

@Eh2406 I have update a bit the explanation related to seed markers. Could you re-read the entire section https://github.com/pubgrub-rs/guide/blob/public-private/src/limitations/public_private.md and let me know if things are clearer or not?

@Eh2406
Copy link
Member

Eh2406 commented Aug 22, 2021

This looks good to me, lets merge it!

@Ericson2314
Copy link

@mpizenberg my concern was more soundness and completeness than efficiency. For that last example (assuming I didn't mess it up again :)) what should happen with the current algorithm?

@mpizenberg
Copy link
Member Author

@mpizenberg my concern was more soundness and completeness than efficiency. For that last example (assuming I didn't mess it up again :)) what should happen with the current algorithm?

So I've just added two tests to see how the algorithm reacts. In the first test I made everything depends on the same version of "f", and in the second test, "d" and "e" depend on different versions of "f".

    /// first test
    ///
    ///       --f --f
    ///      /   /
    ///   --b===d---x1
    /// a<
    ///   --c===e---x2
    ///      \   \
    ///       --f --f

    /// second test
    ///
    ///       --f*   --f1
    ///      /      /
    ///   --b======d
    /// a<
    ///   --c======e
    ///      \      \
    ///       --f*   --f2

When the same version of "f" is required, it happily finds a solution. However, if "d" depends on "f1" and "e" depends on "f2" then the solver fails with the following error message:

Because f$b@1.0.0 1.0.0 depends on f$b@1.0.0 constraint 1.0.0 and f$a@1.0.0$b@1.0.0 [ 0.0.0, 1.0.0 [  [ 1.0.1, ∞ [ depends on f$b@1.0.0 constraint 2.0.0, f$a@1.0.0$b@1.0.0 [ 0.0.0, 1.0.0 [  [ 1.0.1, ∞ [, f$b@1.0.0 1.0.0 are incompatible.
And because d$b@1.0.0 depends on f$b@1.0.0 1.0.0, f$a@1.0.0$b@1.0.0 [ 0.0.0, 1.0.0 [  [ 1.0.1, ∞ [, d$b@1.0.0 ∗ are incompatible.
And because b$a@1.0.0 1.0.0 depends on f$a@1.0.0$b@1.0.0 and b$a@1.0.0 1.0.0 depends on d$b@1.0.0, b$a@1.0.0 1.0.0 depends on f$a@1.0.0$b@1.0.0 1.0.0.
And because f$a@1.0.0$b@1.0.0 1.0.0 depends on f$a@1.0.0 constraint 1.0.0 and f$a@1.0.0$c@1.0.0 2.0.0 depends on f$a@1.0.0 constraint 2.0.0, b$a@1.0.0 ∗, f$a@1.0.0$c@1.0.0 2.0.0 are incompatible. (1)

Because f$c@1.0.0 2.0.0 depends on f$c@1.0.0 constraint 2.0.0 and f$a@1.0.0$c@1.0.0 [ 0.0.0, 2.0.0 [  [ 2.0.1, ∞ [ depends on f$c@1.0.0 constraint 1.0.0, f$c@1.0.0 2.0.0, f$a@1.0.0$c@1.0.0 [ 0.0.0, 2.0.0 [  [ 2.0.1, ∞ [ are incompatible.
And because e$c@1.0.0 depends on f$c@1.0.0 2.0.0, e$c@1.0.0 ∗, f$a@1.0.0$c@1.0.0 [ 0.0.0, 2.0.0 [  [ 2.0.1, ∞ [ are incompatible.
And because c$a@1.0.0 1.0.0 depends on e$c@1.0.0 and c$a@1.0.0 1.0.0 depends on f$a@1.0.0$c@1.0.0, c$a@1.0.0 1.0.0 depends on f$a@1.0.0$c@1.0.0 2.0.0.
And because b$a@1.0.0 ∗, f$a@1.0.0$c@1.0.0 2.0.0 are incompatible (1), b$a@1.0.0 ∗, c$a@1.0.0 1.0.0 are incompatible.
And because a$a@1.0.0 1.0.0 depends on c$a@1.0.0 and a$a@1.0.0 1.0.0 depends on b$a@1.0.0, a$a@1.0.0 1.0.0 is forbidden.

If I rewrite this error message in a slightly more readable versions, that gives:

Because f$b @ 1.0.0 is marked with the b seed @ 1.0.0 and f$a$b for all versions except 1.0.0 is marked with the b seed @ 2.0.0, f$b @ 1.0.0 is incompatible with all versions except 1.0.0 of f$a$b.
And because d$b depends on f$b @ 1.0.0, d$b is also incompatible with all except 1.0.0 of f$a$b.
And because b$a depends on both f$a$b and d$b, b$a depends on f$a$b @ 1.0.0.
And because f$a$b @ 1.0.0 is marked with the a seed @ 1.0.0 and f$a$c 2.0.0 is marked by the a seed @ 2.0.0, b$a and f$a$c @ 2.0.0 are incompatible (1)

Because f$c @ 2.0.0 is marked with the c seed @ 2.0.0 and f$a$c for all versions except 2.0.0 is marked with the c seed @ 1.0.0, f$c @ 2.0.0 is incompatible with all versions except 2.0.0 of f$a$c.
And because e$c depends on f$c @ 2.0.0, e$c is also incompatible with all except 2.0.0 of f$a$c.
And because c$a depends on both e$c and f$a$c, c$a depends on f$a$c @ 2.0.0.
And because b$a and f$a$c 2.0.0 are incompatible (1), b$a and c$a are incompatible.
And because a$a depends on both c$a and b$a, a$a is forbidden.

@Ericson2314
Copy link

Thanks! I'm surprised the first one works out after all. I guess don't wait to merge on my account :). I'll try to go play around with it.

@Ericson2314
Copy link

Ah, so I see there was already success_when_after_double_private_fork. This I think should fail, and explains why ericson_succeeds didn't either.

The current algorithm has views that are Public* not (Private ∪ Id) . Public*. But the latter is correct. In success_when_after_double_private_fork example, a will be able to witness x1 and x2 via b1 and c1, but this shouldn't be allowed to happen.

@Ericson2314
Copy link

I think the current algorithm is correct for Public* views. Those do seem to avoid these complicated interactions I was worried about. The rooted sets indeed can be solved independently because there are no "partial" coherency obligations -- either two views share a parent in which case they are merged and the whole thing must be coherent, or they don't and are completely independent.

It's only (Private ∪ Id) . Public* where we get the complicated interactions of views that partially entangle themselves. My examples were all too complex, trying and failing to find a completeness error in the current algorithm when we have a soundness one. By contrast, (Private . Public)* should be adaptable to the current algorithm --- it amounts to just first flipping some private edges public and then proceeding as usual --- but while this fixes soundness it looses completeness.

@mpizenberg
Copy link
Member Author

mpizenberg commented Aug 23, 2021

Sorry @Ericson2314 I don't agree with you. According to the semantics of public/private dependencies as presented in the guide section, it is normal that the first test passes. Just as it is normal that the success_when_after_double_private_fork test passes. This is because as the name suggests, there are two levels of private indirections: a depends privately on b1 which depends privately on d1 so a does not ever interacts with d. Same on c1 side, which also has a private dependency on d2. So both versions of d are completely independent.

either two views share a parent in which case they are merged and the whole thing must be coherent, or they don't and are completely independent

There always is a parent between two views, so please be more specific. Is that a direct parent you mean? I strongly believe the algorithm is correct.

@mpizenberg mpizenberg merged commit 81f3898 into main Sep 7, 2021
@mpizenberg mpizenberg deleted the public-private branch September 7, 2021 22:28
@Ericson2314
Copy link

Ericson2314 commented Sep 17, 2021

Wow, I am really sorry and embarrassed for repeatedly making these simple basic mistakes in this thread.

You're right about the tests, I misread the = vs - or something. Rather than trying to think from the documentation I looked at the code this time. I see there are sets of seeds in the general case; that's a good sign and something that I had missed reading the docs. I'll make sure to have a counter-example in Code to sanity-check myself if I think of anything else!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants