Skip to content

Fusible IntSet.fromDistinctAscList definition #951

Closed
@meooow25

Description

@meooow25

Sister issue to #949. I adapted the old algorithm from #658 into a fold to see if it improves things for IntSet like it did for Set.

fromDistinctAscList :: [Key] -> IntSet
fromDistinctAscList = finish . Foldable.foldl' next StateEmpty
  where
    next StateEmpty z = State (prefixOf z) (bitmapOf z) Nada
    next (State kx bm stk) z
      | kx == pz  = State pz (bm .|. bmz) stk
      | otherwise = State pz bmz (linkTop (branchMask z kx) kx (Tip kx bm) stk)
      where
        pz = prefixOf z
        bmz = bitmapOf z
    finish StateEmpty = Nil
    finish (State kx bm stk0) = linkAll kx (Tip kx bm) stk0
{-# INLINE fromDistinctAscList #-}

linkTop :: Mask -> Prefix -> IntSet -> Stack -> Stack
linkTop !m !px !tx (Push py ty stk)
  | shorter m mxy = linkTop m pxy (Bin pxy mxy ty tx) stk
  where
    mxy = branchMask px py
    pxy = mask px mxy
linkTop _ px tx stk = Push px tx stk
{-# INLINABLE linkTop #-}

linkAll :: Prefix -> IntSet -> Stack -> IntSet
linkAll !_ tx Nada = tx
linkAll px tx (Push py ty stk)
  | m < 0     = Bin p m tx ty
  | otherwise = linkAll p (Bin p m ty tx) stk
  where
    m = branchMask px py
    p = mask px m
{-# INLINABLE linkAll #-}

data FromDistinctMonoState = State {-# UNPACK #-} !Prefix {-# UNPACK #-} !BitMap !Stack | StateEmpty
data Stack = Push {-# UNPACK #-} !Prefix !IntSet !Stack | Nada

Benchmarking 4 scenarios

  bench "fromDistinctAscList:dense" $ whnf IS.fromDistinctAscList elems -- elems = [1..n]
, bench "fromDistinctAscList:dense:fusion" $
    whnf (\n -> IS.fromDistinctAscList [1..n]) bound
, bench "fromDistinctAscList:sparse" $ whnf IS.fromDistinctAscList elems_sparse -- elems_sparse = map (*64) [1..n]
, bench "fromDistinctAscList:sparse:fusion" $
    whnf (\n -> IS.fromDistinctAscList (map (*64) [1..n])) bound

GHC 9.2.5, current

  fromDistinctAscList:dense:         OK (0.22s)
    13.6 μs ± 701 ns, 4.0 KB allocated,   3 B  copied, 7.0 MB peak memory
  fromDistinctAscList:dense:fusion:  OK (0.20s)
    24.3 μs ± 1.5 μs, 291 KB allocated, 150 B  copied, 7.0 MB peak memory
  fromDistinctAscList:sparse:        OK (0.20s)
    52.2 μs ± 3.3 μs, 256 KB allocated,  16 KB copied, 7.0 MB peak memory
  fromDistinctAscList:sparse:fusion: OK (0.12s)
    68.3 μs ± 6.1 μs, 541 KB allocated,  17 KB copied, 7.0 MB peak memory

GHC 9.2.5, new

  fromDistinctAscList:dense:         OK (0.23s)
    14.0 μs ± 1.1 μs, 6.0 KB allocated,   3 B  copied, 7.0 MB peak memory,       same as baseline
  fromDistinctAscList:dense:fusion:  OK (0.15s)
    4.49 μs ± 349 ns, 6.0 KB allocated,   3 B  copied, 7.0 MB peak memory, 81% less than baseline
  fromDistinctAscList:sparse:        OK (0.14s)
    70.2 μs ± 5.5 μs, 383 KB allocated,  12 KB copied, 7.0 MB peak memory, 34% more than baseline
  fromDistinctAscList:sparse:fusion: OK (0.13s)
    60.9 μs ± 5.3 μs, 382 KB allocated,  12 KB copied, 7.0 MB peak memory, 10% less than baseline

GHC 9.6.2, current

  fromDistinctAscList:dense:         OK (0.13s)
    17.0 μs ± 1.5 μs, 4.0 KB allocated,   4 B  copied, 7.0 MB peak memory
  fromDistinctAscList:dense:fusion:  OK (0.21s)
    24.7 μs ± 1.5 μs, 291 KB allocated, 150 B  copied, 7.0 MB peak memory
  fromDistinctAscList:sparse:        OK (0.19s)
    45.9 μs ± 2.9 μs, 256 KB allocated,  16 KB copied, 7.0 MB peak memory
  fromDistinctAscList:sparse:fusion: OK (0.26s)
    62.3 μs ± 2.7 μs, 542 KB allocated,  17 KB copied, 7.0 MB peak memory

GHC 9.6.2, new

  fromDistinctAscList:dense:         OK (0.29s)
    16.9 μs ± 766 ns, 6.0 KB allocated,   3 B  copied, 7.0 MB peak memory,       same as baseline
  fromDistinctAscList:dense:fusion:  OK (0.14s)
    4.19 μs ± 361 ns, 6.0 KB allocated,   3 B  copied, 7.0 MB peak memory, 83% less than baseline
  fromDistinctAscList:sparse:        OK (0.22s)
    51.8 μs ± 2.8 μs, 383 KB allocated,  12 KB copied, 7.0 MB peak memory, 12% more than baseline
  fromDistinctAscList:sparse:fusion: OK (0.19s)
    46.0 μs ± 3.0 μs, 383 KB allocated,  12 KB copied, 7.0 MB peak memory, 26% less than baseline

As expected

  • It is more efficient in both time and memory when there is fusion
  • It is less efficient in memory when there is no fusion

However, it is worse in time for fromDistinctAscList:sparse, which is disappointing.
I'll try to figure out if it can be improved. Happy to hear if you have any ideas for that.

If it does not improve, I think this is still not a bad idea. The improvement on fromDistinctAscList:dense:fusion looks great. We could use rewrite rules to try to fuse and fall back to the current implementation, as we considered in #949.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions