Add note on asymptotical optimality of compose implementations #960

alexfmpe · 2023-07-09T17:12:49Z

No description provided.

alexfmpe · 2023-07-12T11:00:24Z

containers/src/Data/IntMap/Internal.hs

@@ -769,6 +769,8 @@ disjoint t1@(Bin p1 m1 l1 r1) t2@(Bin p2 m2 l2 r2)
 -- 'compose' that forced the values of the output 'IntMap'. This version does
 -- not force these values.
 --
+-- __Note:__ This is not asymptotically optimal. See note at 'Data.Map.compose', keeping in mind \( \log(m) \) is \( O(\min(m,W)) \), but not the other way around.


Actually, I'm not sure what would even be asymptotically optimal here. The current impl is O(n * min(m,W), but one could write

compose :: IntMap c -> IntMap Int -> IntMap c compose bc !ab | null bc = empty | otherwise = mapMaybe (f bc !?) ab where f = Map.fromAscList . IntMap.toAscList

It seems to me this would be O(n * log m + m), and the extra m makes some sense given we can't directly use element comparisons on lookup on IntMap, so either we convert to Map or rely on the word size bound.

No clue how the constants here play out, would need to benchmark.
I also wonder if there's a faster way to implement f as I couldn't find any Map/IntMap conversion functions in their respective modules.

The question of optimality for IntMap doesn't seem to be well-posed, so it's probably not worth commenting on it. The cost of compose is the cost of n lookups, that seems a satisfactory enough answer so that formalizing the question of optimality is unnecessary, if possible at all..

It's easy to talk about the optimality of Map because it matches the information-theoretic lower bound, which is independent of the representation, that makes it easy to formalize. For IntMap, a meaningful lower bound would have to take into account the particular data type involved, that seems difficult to formalize.

treeowl · 2023-09-27T10:46:28Z

@alexfmpe, could you update to reflect @Lysxia's comment?

Lysxia · 2023-09-27T15:37:01Z

In other words, I think you can just drop the comment about optimality.

alexfmpe · 2024-07-01T14:52:14Z

Completely forgot about this. Took out the "this is not optimal" bit

meooow25 · 2024-08-02T19:14:25Z

I thought about the claim here a bit, and I'd like to understand this better.

Point 1
The proof of the lower bound seems to apply to two arbitrary collections of sizes $m$ and $n$. But we know that Map b c provides the sorted order of bs, which might decrease the lower bound. Does the lower bound still hold?

Point 2
This is perhaps less relevant to this library, but still related. Since the proof seems to apply to two arbitrary collections, how might one write an optimal compose :: (Ord a, Ord b) => [(b, c)] -> [(a, b)] -> [(a, Maybe c)] taking $O(n \log m)$?

@alexfmpe @Lysxia

Lysxia · 2024-08-02T20:13:51Z

Point 1
The lower bound applies independently of the data structure, as long as it's comparison-based. So knowing that a Map is sorted doesn't help. One way to bypass this lower bound for example is to replace Map b c with b -> Maybe c, then it's no longer guaranteed that the b -> Maybe c mapping is implemented using comparisons.

Point 2
This lower bound result only says that a comparison-based compose needs at least n log(m) comparisons. (a) An optimal algorithm may require much more (equivalently, there may be other lower bounds that are higher; for example for a non-sorted list, you have to look at all of the keys anyway). (b) An algorithm may do a lot more work besides doing comparisons. For example, if you use a sorted list, a lookup needs only O(log n) comparisons (by binary search), but it still takes linear time in the worst case because you have to traverse it.

meooow25 · 2024-08-06T15:35:56Z

Point 2
(a) That makes sense, thanks.
(b) Ah yes, I was not thinking of non-comparison work, so I really should have asked "..making $O(m \log n)$" comparisons. But this is addressed by (a).

Point 1

So knowing that a Map is sorted doesn't help.

I'm not convinced on this. Consider instead if we had to implement Map b c -> Map b a -> SomeCollection (a, c). Since both maps have bs sorted, I think we can do this in $O \left(\min(n,m) \cdot \log \frac{\max(n,m)}{\min(n,m)} \right)$ instead using the set intersection algorithm.

Lysxia · 2024-08-07T09:58:44Z

That's an interesting comparison that helps me see some implicit assumptions I made, thanks. There is a difference in the amount of information initially known about the input.

If we make abstraction of the shape of trees, a Map a b is the same as a sequence of pairs of a and bs, with the invariant that the as are sorted. The lower bound given for compose here doesn't depend on the representation of the map as long as it contains the same information about the as, bs, and cs it contains.

This lower bound uses the same information-theoretic idea used to prove the $n\log(n)$ bound for sorting. The algorithm is split in two logical steps (they can be interleaved in reality): first it chooses some comparisons to perform, then it constructs the output depending only on the information provided by the comparisons and the initial invariant that the map keys are sorted. If you're familiar with parametricity/theorems for free, such a "construction" can be encoded as a polymorphic function forall a b c. Map b c -> Map a b -> Map a c which has access to the Map constructors. The second step will choose the same construction for all inputs consistent with the comparisons done in the first step. For fixed input sizes $n$ and $m$, at least $(n+1)^m$ constructions are necessary, corresponding to each possible partial mapping from the positions of the a keys of the Map a b argument to the positions of the c values of the Map b c argument. So we need $\log((n+1)^m)$ comparisons to split the set of all inputs into that many "consistency classes".

alexfmpe · 2024-08-07T12:40:49Z

So knowing that a Map is sorted doesn't help.

I'd say it does, but for something else. The "number of comparisons to know which a goes with which c" is separate from any information about the ordering of a values. They are ordered in our output because we preserve it from the second argument. If we were instead doing [(b,c)] -> [(a,b)] -> Map a c then we'd at least need something like an extra |a| * log |a| comparisons, because we also "learn" the ordering of a values.

meooow25 · 2024-08-12T17:55:02Z

I'm on board that we need $\log ((n+1) ^m)$ bits of information, I'm just not sure that one of the maps being sorted does not provide a part of that information.

There is a difference in the amount of information initially known about the input.

Perhaps this is it. It is true that we know nothing about how bs in the unsorted map relate to bs in the sorted map.

Anyway, I will take your word for it and we can add this to the docs.

containers/src/Data/IntMap/Internal.hs

containers/src/Data/Map/Internal.hs

alexfmpe · 2024-08-20T23:28:15Z

meooow25 · 2024-08-21T17:53:08Z

After some tweaks:

meooow25 · 2024-08-23T17:20:09Z

Thanks!

alexfmpe mentioned this pull request Jul 9, 2023

Composing maps? #647

Closed

alexfmpe commented Jul 12, 2023

View reviewed changes

alexfmpe force-pushed the compose-asymptotical branch from 99aec6d to e08b262 Compare July 1, 2024 14:51

Lysxia approved these changes Jul 11, 2024

View reviewed changes

meooow25 requested changes Aug 12, 2024

View reviewed changes

containers/src/Data/IntMap/Internal.hs Outdated Show resolved Hide resolved

containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved

Add note on performance of compose implementations

199ba3a

alexfmpe force-pushed the compose-asymptotical branch from e08b262 to 199ba3a Compare August 20, 2024 23:27

Minor edits

c7a8f6c

meooow25 approved these changes Aug 21, 2024

View reviewed changes

Lysxia self-requested a review August 21, 2024 17:56

Lysxia approved these changes Aug 21, 2024

View reviewed changes

meooow25 merged commit 549d22b into haskell:master Aug 23, 2024
11 checks passed

Add note on asymptotical optimality of compose implementations #960

Add note on asymptotical optimality of compose implementations #960

Uh oh!

Conversation

alexfmpe commented Jul 9, 2023

Uh oh!

alexfmpe Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

Lysxia Jul 13, 2023

Choose a reason for hiding this comment

Uh oh!

treeowl commented Sep 27, 2023

Uh oh!

Lysxia commented Sep 27, 2023

Uh oh!

alexfmpe commented Jul 1, 2024

Uh oh!

meooow25 commented Aug 2, 2024

Uh oh!

Lysxia commented Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meooow25 commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lysxia commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexfmpe commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meooow25 commented Aug 12, 2024

Uh oh!

Uh oh!

Uh oh!

alexfmpe commented Aug 20, 2024

Uh oh!

meooow25 commented Aug 21, 2024

Uh oh!

Uh oh!

meooow25 commented Aug 23, 2024

Uh oh!

Uh oh!

Lysxia commented Aug 2, 2024 •

edited

Loading

meooow25 commented Aug 6, 2024 •

edited

Loading

Lysxia commented Aug 7, 2024 •

edited

Loading

alexfmpe commented Aug 7, 2024 •

edited

Loading