Skip to content

Notes on performance #144

Closed
Closed
@jamesdbrock

Description

@jamesdbrock

This is what the benchmarks currently look like:

Text.Parsing.StringParser.CodeUnits

StringParser.runParser parse23Units
mean   = 10.10 ms
stddev = 1.13 ms
min    = 9.46 ms
max    = 24.07 ms

Text.Parsing.Parser.String

runParser parse23
mean   = 44.20 ms
stddev = 6.38 ms
min    = 42.25 ms
max    = 113.16 ms

Data.String.Regex

Regex.match pattern23
mean   = 728.23 μs
stddev = 339.32 μs
min    = 613.72 μs
max    = 2.97 ms

I would like to reduce that 4× slowness between Parser.String and StringParser.CodeUnits .

The difference could be due to:

  1. CodePoint rather than Char. Everything goes through the anyCodePoint parser since Unicode correctness #119 , but I benchmarked it at the time and it didn't make any difference.
  2. String tail state. Every time the parser advances by one character, we uncons the input string and save the tail as the new state. I tried changing that to only keeping a codeunit index into the input string on this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor
  3. Parsing.Parser.String input position tracking with Pos { line :: Int, column :: Int}. I tried changing that to Pos Int on this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor
  4. Monad transformers. When I look at the benchmark profiling, it looks like most of the time is spent in Control.Monad.State.Trans.bind and Text.Parsing.Parser.Combinators.tryRethrow. So this might be the entire problem, but improving this won't be easy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions