-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-32677: Add .isascii() to str, bytes and bytearray #5342
Conversation
Objects/unicodeobject.c
Outdated
unicode_isascii_impl(PyObject *self) | ||
/*[clinic end generated code: output=c5910d64b5a8003f input=73b30d38f16965cd]*/ | ||
{ | ||
if (PyUnicode_READY(self) == -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 7: please add { ... }
Objects/unicodeobject.c
Outdated
{ | ||
if (PyUnicode_READY(self) == -1) | ||
return NULL; | ||
if (PyUnicode_IS_ASCII(self)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be simplified to PyBool_FromLong(PyUnicode_IS_ASCII(self));
Objects/unicodeobject.c
Outdated
/*[clinic input] | ||
str.isascii as unicode_isascii | ||
|
||
Return True if the string is an ascii string, False otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: You've used all uppercase version (ASCII) in the documentation and NEWS file. I'd suggest using the same style (whether ASCII or ascii) everywhere.
Doc/library/stdtypes.rst
Outdated
.. method:: str.isascii() | ||
|
||
Return true if all characters in the string are ASCII, false otherwise. | ||
ASCII characters are characters which :func:`ord` returns less than 128. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion: We way reuse curses.ascii.isascii()
documentation to describe what we meant by an ASCII character: https://docs.python.org/3/library/curses.ascii.html#curses.ascii.isascii
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest: "ASCII characters have code points in the range U+0000-U+007F."
Lib/test/test_unicode.py
Outdated
self.assertTrue("".isascii()) | ||
self.assertTrue("\0".isascii()) | ||
self.assertTrue("\x7f".isascii()) | ||
self.assertFalse("\x80".isascii()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to test larger code points as well:
self.assertFalse("\xe9".isascii())
self.assertFalse("\u20ac".isascii())
self.assertFalse("\U0010ffff".isascii())
My 3 favorite code points to test Latin1, BMP and non-BMP :-)
(only these tests, i don't think that it's useful to check for variant with spaces before/after)
Doc/library/stdtypes.rst
Outdated
.. method:: str.isascii() | ||
|
||
Return true if all characters in the string are ASCII, false otherwise. | ||
ASCII characters have code points in the range U+0000-U+007F. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can simplify the description as:
Return true if all characters have code points in the range U+0000-U+007F or if the string is empty, false otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just have a minor comment on str.isascii() docstring.
/*[clinic input] | ||
str.isascii as unicode_isascii | ||
|
||
Return True if all characters in the string are ASCII, False otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, maybe copy from the doc: "Return true if the string is empty or all characters in the string are ASCII," rather than "Empty string is ASCII too." below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Return true if the string is empty or all characters in the string are ASCII, False otherwise." overs 80 columns.
And clinic show error when I wrap the line.
All other docstrings in unicodeobject has short (<80) summaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, that's a nasty issue. Ignore my comment and leave the docstring as it is ;-)
if (*p >= 128) { | ||
Py_RETURN_FALSE; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to optimize this function, I suggest you to look at ascii_decode() of Objects/unicodeobject.c which is heavily optimized to scan ASCII characters in a uint8_t* string. It works on "unsigned long" words rather than working on bytes.
But it should be done in a second PR. Right now, I would prefer to push this PR before 3.7b1 (monday).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree.
These were all discovered by running the `stubtest_stdlib` test without the `--ignore-missing-stub` option. `ChainMap.copy()` and `ChainMap.fromkeys()` appear to have been there since at least Python 3.6 (cpython source code for the class here: https://github.com/python/cpython/blob/2c56c97f015a7ea81719615ddcf3c745fba5b4f3/Lib/collections/__init__.py#L853). `Counter.total()` was added in 3.10: https://docs.python.org/3/library/collections.html#collections.Counter.total. `UserString.isascii()` appears to have been added in Python 3.7; it doesn't appear in the ChangeLog, but you can see it was added in this commit here: python/cpython#5342.
Voting on Python-ideas ML for now.
https://bugs.python.org/issue32677