bpo-32677: Add .isascii() to str, bytes and bytearray #5342

methane · 2018-01-26T11:10:24Z

Voting on Python-ideas ML for now.

https://bugs.python.org/issue32677

vstinner · 2018-01-26T11:16:07Z

Objects/unicodeobject.c

+unicode_isascii_impl(PyObject *self)
+/*[clinic end generated code: output=c5910d64b5a8003f input=73b30d38f16965cd]*/
+{
+    if (PyUnicode_READY(self) == -1)


PEP 7: please add { ... }

vstinner · 2018-01-26T11:16:26Z

Objects/unicodeobject.c

+{
+    if (PyUnicode_READY(self) == -1)
+        return NULL;
+    if (PyUnicode_IS_ASCII(self)) {


Can be simplified to PyBool_FromLong(PyUnicode_IS_ASCII(self));

berkerpeksag · 2018-01-26T11:57:14Z

Objects/unicodeobject.c

+/*[clinic input]
+str.isascii as unicode_isascii
+
+Return True if the string is an ascii string, False otherwise.


Style nit: You've used all uppercase version (ASCII) in the documentation and NEWS file. I'd suggest using the same style (whether ASCII or ascii) everywhere.

berkerpeksag · 2018-01-26T12:19:32Z

Doc/library/stdtypes.rst

+.. method:: str.isascii()
+
+   Return true if all characters in the string are ASCII, false otherwise.
+   ASCII characters are characters which :func:`ord` returns less than 128.


Just a suggestion: We way reuse curses.ascii.isascii() documentation to describe what we meant by an ASCII character: https://docs.python.org/3/library/curses.ascii.html#curses.ascii.isascii

I suggest: "ASCII characters have code points in the range U+0000-U+007F."

vstinner · 2018-01-26T13:26:53Z

Lib/test/test_unicode.py

+        self.assertTrue("".isascii())
+        self.assertTrue("\0".isascii())
+        self.assertTrue("\x7f".isascii())
+        self.assertFalse("\x80".isascii())


I suggest to test larger code points as well:

self.assertFalse("\xe9".isascii())
self.assertFalse("\u20ac".isascii())
self.assertFalse("\U0010ffff".isascii())

My 3 favorite code points to test Latin1, BMP and non-BMP :-)

(only these tests, i don't think that it's useful to check for variant with spaces before/after)

vstinner · 2018-01-26T13:28:15Z

Doc/library/stdtypes.rst

+.. method:: str.isascii()
+
+   Return true if all characters in the string are ASCII, false otherwise.
+   ASCII characters have code points in the range U+0000-U+007F.


Maybe you can simplify the description as:

Return true if all characters have code points in the range U+0000-U+007F or if the string is empty, false otherwise.

vstinner

LGTM, I just have a minor comment on str.isascii() docstring.

vstinner · 2018-01-27T01:59:58Z

Objects/unicodeobject.c

+/*[clinic input]
+str.isascii as unicode_isascii
+
+Return True if all characters in the string are ASCII, False otherwise.


nitpick, maybe copy from the doc: "Return true if the string is empty or all characters in the string are ASCII," rather than "Empty string is ASCII too." below.

"Return true if the string is empty or all characters in the string are ASCII, False otherwise." overs 80 columns.
And clinic show error when I wrap the line.

All other docstrings in unicodeobject has short (<80) summaries.

Oh wow, that's a nasty issue. Ignore my comment and leave the docstring as it is ;-)

vstinner · 2018-01-27T02:03:02Z

Objects/bytes_methods.c

+        if (*p >= 128) {
+            Py_RETURN_FALSE;
+        }
+    }


If you want to optimize this function, I suggest you to look at ascii_decode() of Objects/unicodeobject.c which is heavily optimized to scan ASCII characters in a uint8_t* string. It works on "unsigned long" words rather than working on bytes.

But it should be done in a second PR. Right now, I would prefer to push this PR before 3.7b1 (monday).

These were all discovered by running the `stubtest_stdlib` test without the `--ignore-missing-stub` option. `ChainMap.copy()` and `ChainMap.fromkeys()` appear to have been there since at least Python 3.6 (cpython source code for the class here: https://github.com/python/cpython/blob/2c56c97f015a7ea81719615ddcf3c745fba5b4f3/Lib/collections/__init__.py#L853). `Counter.total()` was added in 3.10: https://docs.python.org/3/library/collections.html#collections.Counter.total. `UserString.isascii()` appears to have been added in Python 3.7; it doesn't appear in the ChangeLog, but you can see it was added in this commit here: python/cpython#5342.

methane added 2 commits January 26, 2018 19:53

Implement str.isascii()

1d4c0f4

Add tests and doc

4b01174

the-knights-who-say-ni added the CLA signed label Jan 26, 2018

bedevere-bot added the awaiting merge label Jan 26, 2018

Add NEWS entry

8b6452f

vstinner reviewed Jan 26, 2018

View reviewed changes

fix

120579a

berkerpeksag reviewed Jan 26, 2018

View reviewed changes

methane added 2 commits January 26, 2018 21:02

s/ascii/ASCII/

22a8400

Fix UserString

56b7727

methane requested a review from rhettinger as a code owner January 26, 2018 12:05

berkerpeksag reviewed Jan 26, 2018

View reviewed changes

Update doc with Victor's suggestion

3fb3240

vstinner reviewed Jan 26, 2018

View reviewed changes

methane added 3 commits January 27, 2018 01:44

Update doc

949b3ad

Add bytes.isascii() and bytearray.isascii()

5289bae

Add test for bytes.isascii()

40e08a0

methane changed the title ~~[DO NOT MERGE] bpo-32677: Add str.isascii()~~ bpo-32677: Add str.isascii() Jan 26, 2018

methane changed the title ~~bpo-32677: Add str.isascii()~~ bpo-32677: Add .isascii() to str, bytes and bytearray Jan 26, 2018

methane added 2 commits January 27, 2018 09:49

Fix test_doctest

4138202

Update NEWS entry

91a6b18

vstinner approved these changes Jan 27, 2018

View reviewed changes

methane merged commit a49ac99 into python:master Jan 27, 2018

bedevere-bot removed the awaiting merge label Jan 27, 2018

methane deleted the str-isascii branch January 27, 2018 05:06

AlexWaygood mentioned this pull request Nov 26, 2021

Add missing methods to collections classes python/typeshed#6388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-32677: Add .isascii() to str, bytes and bytearray #5342

bpo-32677: Add .isascii() to str, bytes and bytearray #5342

methane commented Jan 26, 2018 •

edited by bedevere-bot

Loading

vstinner Jan 26, 2018

vstinner Jan 26, 2018

berkerpeksag Jan 26, 2018

berkerpeksag Jan 26, 2018

vstinner Jan 26, 2018

vstinner Jan 26, 2018

vstinner Jan 26, 2018

vstinner left a comment

vstinner Jan 27, 2018

methane Jan 27, 2018

vstinner Jan 27, 2018

vstinner Jan 27, 2018

methane Jan 27, 2018

bpo-32677: Add .isascii() to str, bytes and bytearray #5342

bpo-32677: Add .isascii() to str, bytes and bytearray #5342

Conversation

methane commented Jan 26, 2018 • edited by bedevere-bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

methane commented Jan 26, 2018 •

edited by bedevere-bot

Loading