Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two equal priority queues with keys can compare as unequal #78

Open
konsumlamm opened this issue Apr 17, 2022 · 28 comments
Open

Two equal priority queues with keys can compare as unequal #78

konsumlamm opened this issue Apr 17, 2022 · 28 comments

Comments

@konsumlamm
Copy link
Collaborator

Two equal queues may compare as unequal, since duplicate keys are allowed and the Eq implementation takes only care of comparing the keys in order.

Example:

ghci> import qualified Data.PQueue.Min as PMin
ghci> PMin.fromAscList [(1, 'a'), (1, 'b')] == PMin.fromAscList [(1, 'b'), (1, 'a')]
False

This should be documented, as I don't think this can be fixed without making the implementation significantly slower.

@treeowl
Copy link
Collaborator

treeowl commented Apr 17, 2022

Hrmmmm..... Which operations don't respect ==?

@konsumlamm
Copy link
Collaborator Author

konsumlamm commented Apr 17, 2022

Hrmmmm..... Which operations don't respect ==?

I'm not sure what you mean, the issue is that entries with equal keys (but different values) may be in any order.

@treeowl
Copy link
Collaborator

treeowl commented Apr 17, 2022

The usual expectation is that if

f :: A -> B
x, y :: A
x == y = True

then

f x == f y = True

We say that f respects ==.

Typically, "safe" API functions should respect ==. If there's a relatively small and well defined set of operations that don't, that can be acceptable for the sake of practicality (see HashMap, for example, where two maps may compare equal even though converting them both to lists may produce differently ordered lists and they may be traversed in different orders). But if it turns out that the most essential priority queue operations don't respect ==, then that == will break everyone's intuition. In that case, it should preferably be changed, or if there's no way to do anything sensible then the Eq instance should likely be removed.

We should start by looking at how insertion and view interact with ==, since those are the most important operations.

@treeowl
Copy link
Collaborator

treeowl commented Apr 17, 2022

Similarly for binary operations. Given

a == a' = True
b == b' = True

users will want to be able to rely on

merge a b == merge a' b' = True

My bet is that the only ways to make sensible and abstract Eq instances for the Prio queues are to consider the results of minView (hence toList) and/or merge non-deterministic. Considering only minView non-deterministic is easy:

a == b = sort (toList a) == sort (toList b)

I believe we currently consider only merge non-deterministic in our Eq instance, which leads to very difficult/brittle code. So I think we should to the simple thing instead, if we want an Eq instance.

@konsumlamm
Copy link
Collaborator Author

So I did a bunch of (10000) QuickCheck tests and it seems that insert respects ==. I think before changing the Eq instance to (==) `on` (sort . toList), we should be sure that the current implementation is a problem.

@treeowl
Copy link
Collaborator

treeowl commented Apr 18, 2022

Try merges?

@konsumlamm
Copy link
Collaborator Author

With a better testing method, I actually found an example where insert doesn't respect ==:

ghci> import qualified Data.PQueue.Prio.Min as P
ghci> a = P.fromList [(0, 0), (0, 1), (-1, 0), (-1, 0)]
ghci> b = P.fromList [(-1, 0), (0, 0), (0, 1), (-1, 0)]
ghci> a == b
True
ghci> P.insert 0 0 a == P.insert 0 0 b
False

@treeowl
Copy link
Collaborator

treeowl commented Apr 18, 2022

Ouch. That's really nasty. I propose either:

  1. Define a == b = (==) `on` sort . toList (or equivalent), and document that minView should be considered non-deterministic relative to ==.

  2. Remove the Eq instance for Prio queues.

Can you think of any situation where it's essential to have an Eq instance for these things?

@konsumlamm konsumlamm mentioned this issue Apr 23, 2022
@konsumlamm
Copy link
Collaborator Author

Can you think of any situation where it's essential to have an Eq instance for these things?

Hmm, not really, though I feel a bit uneasy about removing an instance, as that would be a breaking change (which would be fine with a major version bump ig). I took a look at how other languages (C++, Java, Rust, Scala) handle this problem and it seems none of those implement custom equality for priority queues.

@treeowl
Copy link
Collaborator

treeowl commented Apr 14, 2023

Those languages are presumably using pointer equality. I believe the correct thing is the sorting ==, and documentation that these queues are "nondeterministic". That is, all the extraction and traversal functions may access values associated with a given key in any order.

@konsumlamm
Copy link
Collaborator Author

At least Rust and C++ don't, Scala 3 doesn't either afaict, Java always has a default implementation, so that's kinda unfair.

@treeowl
Copy link
Collaborator

treeowl commented Apr 26, 2023

So you prefer to remove the instance rather than implement the sorting one?

@konsumlamm
Copy link
Collaborator Author

I definitely prefer removing the instance over requiring Ord on the values, but I'm not sure that's my favourite option.

The alternative would be to fix the instance. We could compare the sizes and then iterate through the first argument and for each key-value pair look up if it also exists in the second argument. That would be slower, but it should work.

@treeowl
Copy link
Collaborator

treeowl commented Apr 26, 2023

That would give a pretty awful user experience, I think. We'd have p == q potentially much slower than compare p q == EQ.

@konsumlamm
Copy link
Collaborator Author

The Ord instance should be compatible anyway, so if we change the behaviour of the Eq instance, we also need to change the Ord instance.

@treeowl
Copy link
Collaborator

treeowl commented Apr 26, 2023

Right, but you won't object to Ord sorting, which will make equal-via-compare faster than ==

@konsumlamm
Copy link
Collaborator Author

Right. I just benched a (naive) implementation of my proposal and it's a lot slower. So I'm fine with removing the instance, though perhaps let's first add a deprecation comment (unfortunately, instances cannot be deprecated).

@treeowl
Copy link
Collaborator

treeowl commented Apr 26, 2023

Why don't we just do the sorting thing? It has clear semantics, which are almost obvious from the Ord constraint.

@konsumlamm
Copy link
Collaborator Author

I explained why I don't like that option in #106.

@treeowl
Copy link
Collaborator

treeowl commented May 2, 2023

So you don't want a strong constraint because it won't work for some things, and you'd rather remove the instance and have it work for nothing. I want the strong constraint. Whom should we ask to settle this disagreement?

@treeowl
Copy link
Collaborator

treeowl commented May 23, 2023

If we use the Ord a constraint, as I desire, we can certainly also include

slowSameMinPQueue :: (Ord k, Eq a) => MinPQueue k a -> MinPQueue k a -> Bool

And we can even provide a newtype to use it:

newtype SlowEqMinPQueue k a = SlowEqMinPQueue { unSlowEqMinPQueue :: MinPQueue k a }
instance (Ord k, Eq a) => Eq (SlowEqMinPQueue k a) where
  (==) = slowSameMinPQueue

This can be used as a DerivingVia target in some cases.

newtype Floop k a = Floop (Either Int (MinPQueue k a))
  deriving Eq via (Either Int (SlowEqMinPQueue k a))

@jappeace
Copy link

PMin.fromAscList [(1, 'a'), (1, 'b')] == PMin.fromAscList [(1, 'b'), (1, 'a')]

in my opinion these aren't equal queues. I believe a priority queue encodes a queue with a priority, if the priorities are all the same I'd expect it to behave like a queue, (also known as a list).

if you'd enforce these as equal you'd end up with something akin to a poset.

@konsumlamm
Copy link
Collaborator Author

I wanted to disagree at first, but now I think you're actually making a good point. If two priority queues produce (when calling toList or repeatedly calling minView) the same elements, but in a different order, they're not really equal. Unfortunately, if two priority queues are equal (by today's implementation), inserting an element may change that (see #78 (comment)). The "problem" is that inserting has no guarantees where the element is inserted with respect to equal keys. However, I think that's not a big problem, as long as we document it, e.g. HashMap has that problem too.

@treeowl
Copy link
Collaborator

treeowl commented May 24, 2023

@jappeace that's not the traditional definition. It's possible to implement such a structure (see the stable-heap package, or implement a priority queue using something like Hinze-Paterson monoidal finger trees), but that isn't how binomial queues or most other priority queue implementations work. The priority queue abstraction per se says nothing about which entry you get when, and we don't guarantee one in particular.

@treeowl
Copy link
Collaborator

treeowl commented May 24, 2023

@konsumlamm that's why I think it's best to view extraction and traversal operations as nondeterministic. They can take equals to non-equals.

@treeowl
Copy link
Collaborator

treeowl commented May 24, 2023

The only way to test "equal for all purposes, deterministically" is plain structural equality. But that's flipping useless.

@konsumlamm
Copy link
Collaborator Author

I started a discussion about this on Discourse: https://discourse.haskell.org/t/equality-for-priority-queues/6355.

@treeowl
Copy link
Collaborator

treeowl commented Jun 1, 2023

That's been good so far. It hasn't changed my extreme negative opinion of the quadratic instance, but I have warmed somewhat to the idea of removing the instance altogether (and adding separate functions to test equality and compare).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants