Skip to content

Large character classes combined with {m,n} is very slow and memory-hungry #3

Open
@neongreen

Description

@neongreen

ChrisKuklewicz/regex-tdfa#14, originally reported by @jaspervdj


module Main where
import qualified Text.Regex.TDFA as Tdfa

main :: IO ()
main = do
    let pattern = "^[\x0020-\xD7FF]{1,255}$"
        input   = take 100 $ cycle "abcd"

        regex :: Tdfa.Regex
        regex = Tdfa.makeRegexOpts Tdfa.defaultCompOpt Tdfa.defaultExecOpt pattern

        matches :: [Tdfa.MatchArray]
        matches = Tdfa.match regex input

    print matches

This takes over 6 seconds on my machine and claims around 3GB(!) in memory. Removing the {m,n} part and using "^[\x0020-\xD7FF]+$" combined with an explicit length check (in Haskell) is a workaround.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions