In this, implement grep like functionality
Inspired by codecrafters.
Tested with Python 3.12+
Supports:
- Basic character support, including \d, \w and wildcards
- Character Classes - both positive and negative
[abc]and[^abc] - Beginner and End Carats
(^, $) - Quantifiers like
*,+,? - Parenthesis support so
(a*b)+ops are possible - Support for Alternation
|
- We started with very basic recursive algorithm
- However, we soon realised that we needed to tokenise pattern since it make quantifier search pretty easy
For example, with no tokenisation, it is hard to check for
\\w+,[abc]*or2?since each is different length But with tokens, you can check that iftoken[n+1]is quantifier, thentoken[n]would be thing where this applies - This approach scaled very well for lot of use cases
- However, once we tried to handle parenthesis - we started running into issues. It quickly became apparant that our token array would quickly became nested structure
- Looking and investigating into that, it turned out that we would be better served by converting Token Arrays wholesale into AST-like structure
- At this stage, we converted Parser, AST and Matcher into their own separate modules