Capture regular expression groups when lexing. #27

amcgregor · 2014-05-05T16:54:35Z

As certain token constructs represent elements being wrapped—such as text being wrapped in enclosing quotes—the parser step would need to pre-process the token to remove the quotes and identify flags (in the case for Python-style prefixed strings anyway. Why do the work twice?

The attached changes add slots and update __repr__ implementations where needed, and include a test for the "quoted string" case, demonstrating use. Documentation is also updated to clearly demonstrate the "quoted string" use case and update the presented object repr output.

alex · 2014-05-05T17:29:45Z

Thanks for contributing! Is the primary motivation here performance?

I'm not sure it's possible to make this compatible with the RPython version, so that will require some thinking before this can land.

amcgregor · 2014-05-05T17:32:18Z

I've been investigating the translation failure; it was my understanding that container types (lists, tuples, etc.) must be type-homogeneous internally (groups are always tuples of strings) and that None was an allowed exception to this (as there may be no groups, None is a possible value instead of a tuple). I may be missing something, though, as I'm very new to RPython.

The primary motivator is de-duplication of work. The regex during the tokenization step will already be capturing the groups, but the tokenizer just throws that information away. In the string parsing example the key elements needed (Python-style string flags and the contents of the quoted string) would need to be re-extracted in the parser.

alex · 2014-05-05T17:33:34Z

lists need to be homogenous internally, tuples are allowed to be
heterogenous, but must always be the same length (and have the same types
at the same positions). If they're all strings, maybe using lists here
makes sense?

On Mon, May 5, 2014 at 10:32 AM, Alice Zoë Bevan–McGregor <
notifications@github.com> wrote:

I've been investigating the translation failure; it was my understanding
that container types (lists, tuples, etc.) must be type-homogeneous
internally (groups are always tuples of strings) and that None was an
allowed exception to this (as there may be no groups, None is a possible
value instead of a tuple). I may be missing something, though, as I'm very
new to RPython.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/27#issuecomment-42214051
.

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

amcgregor · 2014-05-05T17:34:25Z

Indeed, lists would make more sense now that I know tuples are even weirder than I expected. ;) Let me patch and see if this fixes the test failure locally.

amcgregor · 2014-05-05T18:05:41Z

So, I've converted the regex group storage to using a list, however this has not corrected the somewhat mystifying translation error I'm getting:

E           AnnotatorError: 
E           
E           signature mismatch: __init__() takes exactly 4 arguments (3 given)
E           
E           
E           Occurred processing the following simple_call:
E                 (AttributeError getting at the binding!)
E               v3 = simple_call(v0, v1, v2)
E           
E           In <FunctionGraph of (rply.lexer:34)LexerStream.next at 0x10a3cb088>:
E           Happened at file /Users/amcgregor/Documents/Clueless/tmp/rply/rply/lexer.py line 43
E           
E           ==>             match = rule.matches(self.s, self.idx)
E                           if match:
E           
E           Known variable annotations:
E            v0 = SomeBuiltin(analyser=<rpython.tool.descriptor.InstanceMethod object at 0x000000010a4b44f0>, methodname='matches', s_self=SomeRule())
E            v1 = SomeString(no_nul=True)
E            v2 = SomeInteger(const=0, knowntype=int, nonneg=True, unsigned=False)

rule.matches() isn't an __init__ call. :/

alex · 2014-05-05T23:44:10Z

I think the answer is that the code in the if rpython section at the bottom of lexergenerator.py needs to be expanded to add the additional details on Matches. I haven't investigated what exactly needs adding though (I'm travelling ATM, will be more available from wednesday on).

amcgregor added 2 commits May 5, 2014 12:44

Capture regex groups for later use.

3349c9e

Updated doucmentation to match the new reality.

f97fc5e

amcgregor added 2 commits May 5, 2014 14:01

Updated test to match new repr.

6310905

Match groups are now a list.

7531871

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture regular expression groups when lexing. #27

Capture regular expression groups when lexing. #27

Uh oh!

amcgregor commented May 5, 2014

Uh oh!

alex commented May 5, 2014

Uh oh!

amcgregor commented May 5, 2014

Uh oh!

alex commented May 5, 2014

Uh oh!

amcgregor commented May 5, 2014

Uh oh!

amcgregor commented May 5, 2014

Uh oh!

alex commented May 5, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Capture regular expression groups when lexing. #27

Are you sure you want to change the base?

Capture regular expression groups when lexing. #27

Uh oh!

Conversation

amcgregor commented May 5, 2014

Uh oh!

alex commented May 5, 2014

Uh oh!

amcgregor commented May 5, 2014

Uh oh!

alex commented May 5, 2014

Uh oh!

amcgregor commented May 5, 2014

Uh oh!

amcgregor commented May 5, 2014

Uh oh!

alex commented May 5, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants