Skip to content

[mypyc] Add a str.format specializer which only supports empty brackets #10697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 2, 2021

Conversation

97littleleaf11
Copy link
Collaborator

@97littleleaf11 97littleleaf11 commented Jun 23, 2021

Description

This PR adds a str.format specializer which only supports empty brackets. Also, the detection code can deal with the bracket literals, for example, "{{}}".

The specializer first split the formatting string by "{}". Then it replaces the brackets by PyObject_Str. Finally, all these separated substrings would be sent to a C helper function to generate a joint one.

Test Plan

  • An IRbuild test
  • A run test

mypyc_benchmark:

running str_format_format_method
..........
interpreted: 0.597254s (avg of 5 iterations; stdev 1.7%)
compiled:    0.312174s (avg of 5 iterations; stdev 2.9%)

compiled is 1.913x faster

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad to see the performance improvement! Left some mostly minor comments. Once you handle {{ and }} this seems enough features for a first iteration.

@97littleleaf11
Copy link
Collaborator Author

Add several optimization:

  • Skip empty literals
  • Skip str_op on str_rprimitive

running str_format_format_method
..........
interpreted: 0.576623s (avg of 5 iterations; stdev 0.47%)
compiled: 0.280255s (avg of 5 iterations; stdev 0.63%)

compiled is 2.057x faster

@97littleleaf11 97littleleaf11 requested a review from JukkaL June 24, 2021 11:48
@JukkaL
Copy link
Collaborator

JukkaL commented Jun 24, 2021

The build is failing. Can you have a look?

@97littleleaf11
Copy link
Collaborator Author

@JukkaL I fixed it by replacing "{{" with "{"

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Left various suggestions about more test coverage. Right now test coverage is still a bit limited, and there may be some bugs still lurking in the code.

assert "{{}}{}".format(name) == "{}Eric"
assert "Hi! {{{}}}".format(name) == "Hi! {Eric}"
assert "Hi! {{ {}".format(name) == "Hi! { Eric"
assert "Hi! {{ {} }}}}".format(name) == "Hi! { Eric }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more tests. We want fairly complete test coverage to make sure nothing has regressed. It's important both valid and error cases produce the same output when optimized. For example, any exceptions should be identical.

Examples (but there are probably more):

  • Simplest possible case: ''.format()
  • Plain text only: 'xyz'.format()
  • Extra positional arguments: 'a{}b'.format(1, 2, 3)
  • Too few arguments
  • Various format specifiers that are supported within {...} (these should still work, even if they aren't optimized)
  • Keyword arguments to format
  • *args to format
  • Invalid format string

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra positional arguments case this compile error: error: Not all arguments converted during string formatting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases could still be useful. Can you double check if they aren't covered:

  • '{}{}'.format(1, 2) (empty literals)
  • 'x{}y{}'.format(*args) (*args)

return None


def can_optimize_format(format_str: str) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests for this, since it has pretty tricky logic. Also consider various error cases and edge cases.

@97littleleaf11 97littleleaf11 requested a review from JukkaL June 30, 2021 18:07
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! This is almost ready to merge. Just a few remaining comments.

@@ -159,3 +159,20 @@ L2:
L3:
unreachable

[case testStringFormatMethod]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a few more IR build test cases, such as these:

  1. an empty literal
  2. {{ and }} in the format string

assert "{{}}{}".format(name) == "{}Eric"
assert "Hi! {{{}}}".format(name) == "Hi! {Eric}"
assert "Hi! {{ {}".format(name) == "Hi! { Eric"
assert "Hi! {{ {} }}}}".format(name) == "Hi! { Eric }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases could still be useful. Can you double check if they aren't covered:

  • '{}{}'.format(1, 2) (empty literals)
  • 'x{}y{}'.format(*args) (*args)

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@JukkaL JukkaL merged commit e82e8ec into python:master Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants