unstreamM etc destroys performance; expose unstreamPrimM & document most-performant method

I was messing around with squeezing performance out of this library.

```haskell
-- utility function used in the rest of the code
fillOrderPart :: (Monad m, Ord a, Num a) => a -> StateT a m a
fillOrderPart c = state $ \r -> let x = min c r in (c - x, r - x)

fillOrder1V :: (Data.Vector.Generic.Vector v i, Ord i, Num i) => v i -> i -> (v i, i)
fillOrder1V = runState . Data.Vector.Generic.mapM fillOrderPart
{-
benchmarking static/vector-unboxed/fillOrder1
time                 53.16 μs   (53.02 μs .. 53.31 μs)
benchmarking static/vector-storable/fillOrder1
time                 52.89 μs   (52.59 μs .. 53.22 μs)
-}

fillOrder1VM :: (Data.Vector.Generic.Vector v i, Ord i, Num i) => v i -> i -> (v i, i)
fillOrder1VM book order = runST $ flip runStateT order $ do
  book' <- VG.thaw book
  VGM.mapM_ fillOrderPart book'
  VG.unsafeFreeze book'
{-
benchmarking static/vector-unboxed-copy-mut/fillOrder1
time                 11.20 μs   (11.19 μs .. 11.21 μs)
benchmarking static/vector-storable-copy-mut/fillOrder1
time                 11.51 μs   (11.46 μs .. 11.55 μs)
-}

fillOrder1VB2 :: (Data.Vector.Generic.Vector v i, Ord i, Num i) => v i -> i -> (v i, i)
fillOrder1VB2 book order =
  -- FIXME: unstreamM performs horribly to list construction, sadly
  runState (VG.unstreamM $ VFB.mapM fillOrderPart (VG.stream book)) order
{-
benchmarking static/vector-unboxed-bundle-unstreamM/fillOrder1
time                 26.30 μs   (26.24 μs .. 26.35 μs)
benchmarking static/vector-storable-bundle-unstreamM/fillOrder1
time                 46.10 μs   (45.85 μs .. 46.26 μs)
-}

-- bug in vector; defined but not exported. we re-defined it for our use here
unstreamPrimM :: (PrimMonad m, VG.Vector v a) => VFB.MBundle m u a -> m (v a)
{-# INLINE [1] unstreamPrimM #-}
unstreamPrimM s = VGM.munstream s >>= VG.unsafeFreeze

fillOrder1VB :: (Data.Vector.Generic.Vector v i, Ord i, Num i) => v i -> i -> (v i, i)
fillOrder1VB book order =
  runST $ flip runStateT order $ unstreamPrimM $ VFB.mapM fillOrderPart $ VG.stream book
{-
benchmarking static/vector-unboxed-bundle/fillOrder1
time                 1.426 μs   (1.421 μs .. 1.432 μs)
benchmarking static/vector-storable-bundle/fillOrder1
time                 855.9 ns   (852.2 ns .. 859.3 ns)
-}
```

By contrast, here is the equivalent rust code:

```rust
fn fill_order(book: &mut Vec<u64>, order: u64) -> u64 {
  let mut r = order;
  for c in book.iter_mut() {
    let x = r.min(*c);
    r -= x;
    *c -= x;
  }
  r
}
// fill_order              time:   [868.87 ns 870.21 ns 871.76 ns]
```

As you can see, the properly-written fusion version performs as well as rust. This was a very pleasant surprise for me. HOWEVER - the correct way is not documented anywhere!!! In particular, the convenience function `unstreamPrimM` is for some reason defined in the source code of `Vector.Generic`, used no-where else, **not exported nor advertised**, yet is **absolutely vital** for reaching this nirvana of performance.

By contrast, unstreamM is what's exposed in the API and destroys the performance so it performs even worse than the manual imperative mutable version. Even worse, all the utility functions are written in terms of unstreamM, e.g. VG.mapM, etc etc. Yes this means they have a convenient `Monad m =>` constraint, but any non-haskell-expert that cares about performance would benchmark it and write off the library as "Haskell is slow". Providing mirror utilities that have a `PrimMonad m =>` constraint that use `unstreamPrimM` instead of `unstreamM`, as well as a few examples, would help this effort.

For reference, the above took me about half a day. Not everybody exploring Haskell has that sort of time or patience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

unstreamM etc destroys performance; expose unstreamPrimM & document most-performant method #416

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unstreamM etc destroys performance; expose unstreamPrimM & document most-performant method #416

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions