Skip to content

Much faster count function in plain Haskell #109

Closed
@chrisdone

Description

@chrisdone

The count function is implemented in terms of some C code that looks at each byte: https://github.com/haskell/bytestring/blob/master/cbits/fpstring.c#L75

If you implement count with elemIndex (which uses C's memchr underneath), it's much faster on non-consecutive chars:

count1 :: Word8 -> ByteString -> Int
count1 w8 = go 0
  where
    go c str =
      case S.elemIndex w8 str of
        Nothing -> c
        Just off -> go (c + 1) (S.drop (off+1) str)
{-# INLINE count1 #-}

For non-consecutive characters, such as finding all < in an XML document, it's obviously way faster because memchr inspects more than one byte at a time.

For consecutive chars, this is slower. It might be worth including both, or mentioning it in the docs.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions