Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interegular integration, allowing checks for intersections between regexes. #715

Closed
wants to merge 721 commits into from

Conversation

MegaIng
Copy link
Member

@MegaIng MegaIng commented Oct 7, 2020

Fixes #76.

This is a first attempt.

You can install interegular via pip: pip install interegular. If you want, you can also take a look at the interegular source code.

@erezsh
Copy link
Member

erezsh commented Oct 9, 2020

A few notes:

  1. Basically, this is the real functional code (which resides in interregular):
        for a, b in combinations(keys, 2):
            if not self.isdisjoint(a, b):
                yield a, b

I see you already cache the FSMs, so all that's required to prevent duplicate work is to call memoized_disjoint instead.

  1. Why does lark have to call mark and is_marked? Seems unnecessary. If you want to avoid duplicate warnings, just keep a set.

  2. Why compare regexps (expensive) and only afterwards check a.priority == b.priority? This should be the first test, otherwise comparing them with interregular is pointless.

  3. skip_validation means you never use the comparator. So why even create it in the first place?

@erezsh
Copy link
Member

erezsh commented Oct 10, 2020

P.S. re point 3, you can do something like classify(regexps, lambda r: r.priority) to get all the subgroups that should be tested together.

erezsh and others added 28 commits October 19, 2021 12:08
Added lark syntax highlighting and a few tiny changes.
Since 1.0 isn't Python 2 compatible (according to Reddit post) which makes "& 3" redundant too :)
jmishra01 and others added 27 commits December 5, 2022 07:16
Generator is memory efficient approach.
Also improve performance of "iter_subtrees_topdown"
Performance of "iter_subtrees_topdown" method reduce as size of tree increases. Using instance of list method improve the performance.
Fix EOF line information in InteractiveParser.resume_parse()
Use generator instead of list expand or add method
Previous version used `_testlist_comp` which allowed for either one `test` or 2 and more `test_or_star_expr`
This version allows a list with one `star_expr` which is valid both in Python and in official Python grammar.
Moreover it merges rules used in set and list (since those terminals differ only in one thing: set literal cannot be empty)
Updated Python grammar list literal to support `[*x]`
Found via `codespell -L nd,iif,ot,datas`
Examples: Update version for PyQt5
Support for Python-style comments in Lark grammar
[M:grammar.md] doc: added Python-style comments.
…teregular-integration

# Conflicts:
#	lark/lexer.py
#	lark/load_grammar.py
#	setup.py
@MegaIng
Copy link
Member Author

MegaIng commented Mar 2, 2023

Well, this is now a messed up git history.

@MegaIng MegaIng closed this Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect collision in regular expressions when creating a lexer