Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-32677: Add .isascii() to str, bytes and bytearray #5342

Merged
merged 12 commits into from
Jan 27, 2018

Conversation

methane
Copy link
Member

@methane methane commented Jan 26, 2018

Voting on Python-ideas ML for now.

https://bugs.python.org/issue32677

unicode_isascii_impl(PyObject *self)
/*[clinic end generated code: output=c5910d64b5a8003f input=73b30d38f16965cd]*/
{
if (PyUnicode_READY(self) == -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 7: please add { ... }

{
if (PyUnicode_READY(self) == -1)
return NULL;
if (PyUnicode_IS_ASCII(self)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified to PyBool_FromLong(PyUnicode_IS_ASCII(self));

/*[clinic input]
str.isascii as unicode_isascii

Return True if the string is an ascii string, False otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: You've used all uppercase version (ASCII) in the documentation and NEWS file. I'd suggest using the same style (whether ASCII or ascii) everywhere.

@methane methane requested a review from rhettinger as a code owner January 26, 2018 12:05
.. method:: str.isascii()

Return true if all characters in the string are ASCII, false otherwise.
ASCII characters are characters which :func:`ord` returns less than 128.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: We way reuse curses.ascii.isascii() documentation to describe what we meant by an ASCII character: https://docs.python.org/3/library/curses.ascii.html#curses.ascii.isascii

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest: "ASCII characters have code points in the range U+0000-U+007F."

self.assertTrue("".isascii())
self.assertTrue("\0".isascii())
self.assertTrue("\x7f".isascii())
self.assertFalse("\x80".isascii())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to test larger code points as well:

self.assertFalse("\xe9".isascii())
self.assertFalse("\u20ac".isascii())
self.assertFalse("\U0010ffff".isascii())

My 3 favorite code points to test Latin1, BMP and non-BMP :-)

(only these tests, i don't think that it's useful to check for variant with spaces before/after)

.. method:: str.isascii()

Return true if all characters in the string are ASCII, false otherwise.
ASCII characters have code points in the range U+0000-U+007F.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can simplify the description as:

Return true if all characters have code points in the range U+0000-U+007F or if the string is empty, false otherwise.

@methane methane changed the title [DO NOT MERGE] bpo-32677: Add str.isascii() bpo-32677: Add str.isascii() Jan 26, 2018
@methane methane changed the title bpo-32677: Add str.isascii() bpo-32677: Add .isascii() to str, bytes and bytearray Jan 26, 2018
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just have a minor comment on str.isascii() docstring.

/*[clinic input]
str.isascii as unicode_isascii

Return True if all characters in the string are ASCII, False otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, maybe copy from the doc: "Return true if the string is empty or all characters in the string are ASCII," rather than "Empty string is ASCII too." below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Return true if the string is empty or all characters in the string are ASCII, False otherwise." overs 80 columns.
And clinic show error when I wrap the line.

All other docstrings in unicodeobject has short (<80) summaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, that's a nasty issue. Ignore my comment and leave the docstring as it is ;-)

if (*p >= 128) {
Py_RETURN_FALSE;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to optimize this function, I suggest you to look at ascii_decode() of Objects/unicodeobject.c which is heavily optimized to scan ASCII characters in a uint8_t* string. It works on "unsigned long" words rather than working on bytes.

But it should be done in a second PR. Right now, I would prefer to push this PR before 3.7b1 (monday).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.

@methane methane merged commit a49ac99 into python:master Jan 27, 2018
@methane methane deleted the str-isascii branch January 27, 2018 05:06
AlexWaygood added a commit to AlexWaygood/typeshed that referenced this pull request Nov 26, 2021
These were all discovered by running the `stubtest_stdlib` test without the `--ignore-missing-stub` option.

`ChainMap.copy()` and `ChainMap.fromkeys()` appear to have been there since at least Python 3.6 (cpython source code for the class here: https://github.com/python/cpython/blob/2c56c97f015a7ea81719615ddcf3c745fba5b4f3/Lib/collections/__init__.py#L853).

`Counter.total()` was added in 3.10: https://docs.python.org/3/library/collections.html#collections.Counter.total.

`UserString.isascii()` appears to have been added in Python 3.7; it doesn't appear in the ChangeLog, but you can see it was added in this commit here: python/cpython#5342.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants