Name	Name	Last commit message	Last commit date
Latest commit History 95 Commits
Text/Regex	Text/Regex
.gitignore	.gitignore
.travis.yml	.travis.yml
CHANGELOG.md	CHANGELOG.md
LICENSE	LICENSE
README.md	README.md
Setup.hs	Setup.hs
regex-tdfa.cabal	regex-tdfa.cabal
stack.yaml	stack.yaml

Name

Last commit message

Last commit date

Text/Regex

regex-tdfa

This is regex-tdfa which is a pure Haskell regular expression library (for POSIX extended regular expressions) originally written by Christopher Kuklewicz.

The name "tdfa" stands for Tagged-DFA.

The Relevant Links

Terse documentation is available in Text.Regex.TDFA haddock.

This was also documented at the Haskell wiki. The original Darcs repository was at code.haskell.org. When not updated, this was forked and maintained by Roman Cheplyaka as regex-tdfa-rc.

The new git repository is at github, which is primarily maintained by Artyom (neongreen).

Known Bugs and Infelicities

Regexes with large character classes combined with {m,n} are very slow and memory-hungry (#14).

An example of such a regex is ^[\x0020-\xD7FF]{1,255}$.
POSIX submatch semantics are broken in some rare cases (#12).

Testing

Tests for this package live in regex-tdfa-unittest. To run them, do:

stack build
stack build regex-tdfa-unittest
stack exec regex-tdfa-unittest

About this package

This was inspired by the algorithm (and Master's thesis) behind the regular expression library known as TRE or libtre. This was created by Ville Laurikari and tackled the difficult issue of efficient sub-match capture for POSIX regular expressions.

By building on this thesis and adding a few more optimizations, regex-tdfa matching input text of length N should have O(N) runtime, and should have a maximum memory bounded by the pattern size that does not scale with N. It should do this while returning well defined (and correct) values for the parenthesized sub-matches.

Regardless of performance, nearly every single OS and Libra for POSIX regular expressions has bugs in sub-matches. This was detailed on the Regex POSIX Haskell wiki page, and can be demonstrated with the regex-posix-unittest suite of checks. Test regex-tdfa-unittest should show regex-tdfa passing these same checks. I owe my understanding of the correct behvior and many of these unit tests to Glenn Fowler at AT&T ("An Interpretation of the POSIX regex Standard").

Other related packages

You can find several other related packages by searching for "tdfa" on hackage.

Document notes

This was written 2016-04-30.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

regex-tdfa

The Relevant Links

Known Bugs and Infelicities

Testing

About this package

Other related packages

Document notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 11

Uh oh!

Languages

License

haskell-hvr/regex-tdfa

Folders and files

Latest commit

History

Repository files navigation

regex-tdfa

The Relevant Links

Known Bugs and Infelicities

Testing

About this package

Other related packages

Document notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 11

Uh oh!

Languages

Packages