Skip to content

Add note on asymptotical optimality of compose implementations #960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 23, 2024

Conversation

alexfmpe
Copy link
Contributor

@alexfmpe alexfmpe commented Jul 9, 2023

No description provided.

@alexfmpe alexfmpe mentioned this pull request Jul 9, 2023
@@ -769,6 +769,8 @@ disjoint t1@(Bin p1 m1 l1 r1) t2@(Bin p2 m2 l2 r2)
-- 'compose' that forced the values of the output 'IntMap'. This version does
-- not force these values.
--
-- __Note:__ This is not asymptotically optimal. See note at 'Data.Map.compose', keeping in mind \( \log(m) \) is \( O(\min(m,W)) \), but not the other way around.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not sure what would even be asymptotically optimal here. The current impl is O(n * min(m,W), but one could write

compose :: IntMap c -> IntMap Int -> IntMap c
compose bc !ab
  | null bc = empty
  | otherwise = mapMaybe (f bc !?) ab
  where
    f = Map.fromAscList . IntMap.toAscList

It seems to me this would be O(n * log m + m), and the extra m makes some sense given we can't directly use element comparisons on lookup on IntMap, so either we convert to Map or rely on the word size bound.

No clue how the constants here play out, would need to benchmark.
I also wonder if there's a faster way to implement f as I couldn't find any Map/IntMap conversion functions in their respective modules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question of optimality for IntMap doesn't seem to be well-posed, so it's probably not worth commenting on it. The cost of compose is the cost of n lookups, that seems a satisfactory enough answer so that formalizing the question of optimality is unnecessary, if possible at all..

It's easy to talk about the optimality of Map because it matches the information-theoretic lower bound, which is independent of the representation, that makes it easy to formalize. For IntMap, a meaningful lower bound would have to take into account the particular data type involved, that seems difficult to formalize.

@treeowl
Copy link
Contributor

treeowl commented Sep 27, 2023

@alexfmpe, could you update to reflect @Lysxia's comment?

@Lysxia
Copy link
Contributor

Lysxia commented Sep 27, 2023

In other words, I think you can just drop the comment about optimality.

@alexfmpe alexfmpe force-pushed the compose-asymptotical branch from 99aec6d to e08b262 Compare July 1, 2024 14:51
@alexfmpe
Copy link
Contributor Author

alexfmpe commented Jul 1, 2024

Completely forgot about this. Took out the "this is not optimal" bit

@meooow25
Copy link
Contributor

meooow25 commented Aug 2, 2024

I thought about the claim here a bit, and I'd like to understand this better.

Point 1
The proof of the lower bound seems to apply to two arbitrary collections of sizes $m$ and $n$. But we know that Map b c provides the sorted order of bs, which might decrease the lower bound. Does the lower bound still hold?

Point 2
This is perhaps less relevant to this library, but still related. Since the proof seems to apply to two arbitrary collections, how might one write an optimal compose :: (Ord a, Ord b) => [(b, c)] -> [(a, b)] -> [(a, Maybe c)] taking $O(n \log m)$?

@alexfmpe @Lysxia

@Lysxia
Copy link
Contributor

Lysxia commented Aug 2, 2024

Point 1
The lower bound applies independently of the data structure, as long as it's comparison-based. So knowing that a Map is sorted doesn't help. One way to bypass this lower bound for example is to replace Map b c with b -> Maybe c, then it's no longer guaranteed that the b -> Maybe c mapping is implemented using comparisons.

Point 2
This lower bound result only says that a comparison-based compose needs at least n log(m) comparisons. (a) An optimal algorithm may require much more (equivalently, there may be other lower bounds that are higher; for example for a non-sorted list, you have to look at all of the keys anyway). (b) An algorithm may do a lot more work besides doing comparisons. For example, if you use a sorted list, a lookup needs only O(log n) comparisons (by binary search), but it still takes linear time in the worst case because you have to traverse it.

@meooow25
Copy link
Contributor

meooow25 commented Aug 6, 2024

Point 2
(a) That makes sense, thanks.
(b) Ah yes, I was not thinking of non-comparison work, so I really should have asked "..making $O(m \log n)$" comparisons. But this is addressed by (a).

Point 1

So knowing that a Map is sorted doesn't help.

I'm not convinced on this. Consider instead if we had to implement Map b c -> Map b a -> SomeCollection (a, c). Since both maps have bs sorted, I think we can do this in $O \left(\min(n,m) \cdot \log \frac{\max(n,m)}{\min(n,m)} \right)$ instead using the set intersection algorithm.

@Lysxia
Copy link
Contributor

Lysxia commented Aug 7, 2024

That's an interesting comparison that helps me see some implicit assumptions I made, thanks. There is a difference in the amount of information initially known about the input.

If we make abstraction of the shape of trees, a Map a b is the same as a sequence of pairs of a and bs, with the invariant that the as are sorted. The lower bound given for compose here doesn't depend on the representation of the map as long as it contains the same information about the as, bs, and cs it contains.

This lower bound uses the same information-theoretic idea used to prove the $n\log(n)$ bound for sorting. The algorithm is split in two logical steps (they can be interleaved in reality): first it chooses some comparisons to perform, then it constructs the output depending only on the information provided by the comparisons and the initial invariant that the map keys are sorted. If you're familiar with parametricity/theorems for free, such a "construction" can be encoded as a polymorphic function forall a b c. Map b c -> Map a b -> Map a c which has access to the Map constructors. The second step will choose the same construction for all inputs consistent with the comparisons done in the first step. For fixed input sizes $n$ and $m$, at least $(n+1)^m$ constructions are necessary, corresponding to each possible partial mapping from the positions of the a keys of the Map a b argument to the positions of the c values of the Map b c argument. So we need $\log((n+1)^m)$ comparisons to split the set of all inputs into that many "consistency classes".

@alexfmpe
Copy link
Contributor Author

alexfmpe commented Aug 7, 2024

So knowing that a Map is sorted doesn't help.

I'd say it does, but for something else. The "number of comparisons to know which a goes with which c" is separate from any information about the ordering of a values. They are ordered in our output because we preserve it from the second argument. If we were instead doing [(b,c)] -> [(a,b)] -> Map a c then we'd at least need something like an extra |a| * log |a| comparisons, because we also "learn" the ordering of a values.

@meooow25
Copy link
Contributor

I'm on board that we need $\log ((n+1) ^m)$ bits of information, I'm just not sure that one of the maps being sorted does not provide a part of that information.

There is a difference in the amount of information initially known about the input.

Perhaps this is it. It is true that we know nothing about how bs in the unsorted map relate to bs in the sorted map.

Anyway, I will take your word for it and we can add this to the docs.

@alexfmpe alexfmpe force-pushed the compose-asymptotical branch from e08b262 to 199ba3a Compare August 20, 2024 23:27
@alexfmpe
Copy link
Contributor Author

optimal

@meooow25
Copy link
Contributor

After some tweaks:

Screenshot

@Lysxia Lysxia self-requested a review August 21, 2024 17:56
@meooow25 meooow25 merged commit 549d22b into haskell:master Aug 23, 2024
11 checks passed
@meooow25
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants