Closed
Description
Sister issue to #949. I adapted the old algorithm from #658 into a fold to see if it improves things for IntSet like it did for Set.
fromDistinctAscList :: [Key] -> IntSet
fromDistinctAscList = finish . Foldable.foldl' next StateEmpty
where
next StateEmpty z = State (prefixOf z) (bitmapOf z) Nada
next (State kx bm stk) z
| kx == pz = State pz (bm .|. bmz) stk
| otherwise = State pz bmz (linkTop (branchMask z kx) kx (Tip kx bm) stk)
where
pz = prefixOf z
bmz = bitmapOf z
finish StateEmpty = Nil
finish (State kx bm stk0) = linkAll kx (Tip kx bm) stk0
{-# INLINE fromDistinctAscList #-}
linkTop :: Mask -> Prefix -> IntSet -> Stack -> Stack
linkTop !m !px !tx (Push py ty stk)
| shorter m mxy = linkTop m pxy (Bin pxy mxy ty tx) stk
where
mxy = branchMask px py
pxy = mask px mxy
linkTop _ px tx stk = Push px tx stk
{-# INLINABLE linkTop #-}
linkAll :: Prefix -> IntSet -> Stack -> IntSet
linkAll !_ tx Nada = tx
linkAll px tx (Push py ty stk)
| m < 0 = Bin p m tx ty
| otherwise = linkAll p (Bin p m ty tx) stk
where
m = branchMask px py
p = mask px m
{-# INLINABLE linkAll #-}
data FromDistinctMonoState = State {-# UNPACK #-} !Prefix {-# UNPACK #-} !BitMap !Stack | StateEmpty
data Stack = Push {-# UNPACK #-} !Prefix !IntSet !Stack | Nada
Benchmarking 4 scenarios
bench "fromDistinctAscList:dense" $ whnf IS.fromDistinctAscList elems -- elems = [1..n]
, bench "fromDistinctAscList:dense:fusion" $
whnf (\n -> IS.fromDistinctAscList [1..n]) bound
, bench "fromDistinctAscList:sparse" $ whnf IS.fromDistinctAscList elems_sparse -- elems_sparse = map (*64) [1..n]
, bench "fromDistinctAscList:sparse:fusion" $
whnf (\n -> IS.fromDistinctAscList (map (*64) [1..n])) bound
GHC 9.2.5, current
fromDistinctAscList:dense: OK (0.22s)
13.6 μs ± 701 ns, 4.0 KB allocated, 3 B copied, 7.0 MB peak memory
fromDistinctAscList:dense:fusion: OK (0.20s)
24.3 μs ± 1.5 μs, 291 KB allocated, 150 B copied, 7.0 MB peak memory
fromDistinctAscList:sparse: OK (0.20s)
52.2 μs ± 3.3 μs, 256 KB allocated, 16 KB copied, 7.0 MB peak memory
fromDistinctAscList:sparse:fusion: OK (0.12s)
68.3 μs ± 6.1 μs, 541 KB allocated, 17 KB copied, 7.0 MB peak memory
GHC 9.2.5, new
fromDistinctAscList:dense: OK (0.23s)
14.0 μs ± 1.1 μs, 6.0 KB allocated, 3 B copied, 7.0 MB peak memory, same as baseline
fromDistinctAscList:dense:fusion: OK (0.15s)
4.49 μs ± 349 ns, 6.0 KB allocated, 3 B copied, 7.0 MB peak memory, 81% less than baseline
fromDistinctAscList:sparse: OK (0.14s)
70.2 μs ± 5.5 μs, 383 KB allocated, 12 KB copied, 7.0 MB peak memory, 34% more than baseline
fromDistinctAscList:sparse:fusion: OK (0.13s)
60.9 μs ± 5.3 μs, 382 KB allocated, 12 KB copied, 7.0 MB peak memory, 10% less than baseline
GHC 9.6.2, current
fromDistinctAscList:dense: OK (0.13s)
17.0 μs ± 1.5 μs, 4.0 KB allocated, 4 B copied, 7.0 MB peak memory
fromDistinctAscList:dense:fusion: OK (0.21s)
24.7 μs ± 1.5 μs, 291 KB allocated, 150 B copied, 7.0 MB peak memory
fromDistinctAscList:sparse: OK (0.19s)
45.9 μs ± 2.9 μs, 256 KB allocated, 16 KB copied, 7.0 MB peak memory
fromDistinctAscList:sparse:fusion: OK (0.26s)
62.3 μs ± 2.7 μs, 542 KB allocated, 17 KB copied, 7.0 MB peak memory
GHC 9.6.2, new
fromDistinctAscList:dense: OK (0.29s)
16.9 μs ± 766 ns, 6.0 KB allocated, 3 B copied, 7.0 MB peak memory, same as baseline
fromDistinctAscList:dense:fusion: OK (0.14s)
4.19 μs ± 361 ns, 6.0 KB allocated, 3 B copied, 7.0 MB peak memory, 83% less than baseline
fromDistinctAscList:sparse: OK (0.22s)
51.8 μs ± 2.8 μs, 383 KB allocated, 12 KB copied, 7.0 MB peak memory, 12% more than baseline
fromDistinctAscList:sparse:fusion: OK (0.19s)
46.0 μs ± 3.0 μs, 383 KB allocated, 12 KB copied, 7.0 MB peak memory, 26% less than baseline
As expected
- It is more efficient in both time and memory when there is fusion
- It is less efficient in memory when there is no fusion
However, it is worse in time for fromDistinctAscList:sparse
, which is disappointing.
I'll try to figure out if it can be improved. Happy to hear if you have any ideas for that.
If it does not improve, I think this is still not a bad idea. The improvement on fromDistinctAscList:dense:fusion
looks great. We could use rewrite rules to try to fuse and fall back to the current implementation, as we considered in #949.