bpo-23493: json: Change sort_keys in Python encoder same to C #8131

methane · 2018-07-06T07:52:43Z

Stop using key=lambda idiom. This behavior is same to
C version encoder.

https://bugs.python.org/issue23493

Stop using key=lambda idiom. This behavior is same to C version encoder.

serhiy-storchaka

Confirm that there is a benefit from this change.

vstinner · 2018-07-06T11:46:50Z

Is it possible that sorted() compares two values and that the comparison raises a type error with this change? (Is it possible to have a dictionary with two keys with have the same order?)

vstinner

LGTM since the C code calls list(dct.items()).sort() (no key argument).

items = list(dct.items())
items.sort()

and

items = sorted(dct.items())

are equivalent and sorted() is shorter, so LGTM :-)

methane · 2018-07-06T11:56:04Z

Is it possible that sorted() compares two values and that the comparison raises a type error with this change? (Is it possible to have a dictionary with two keys with have the same order?)

It happens with C encoder.

$ python3
Python 3.7.0 (default, Jun 28 2018, 15:11:21)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class D(dict):
...     def items(self):
...         return [('foo', 42), ('foo', 'bar')]
...
>>> d = D({'foo': None})
>>> d
{'foo': None}
>>> import json
>>> json.dumps(d)
'{"foo": 42, "foo": "bar"}'
>>> json.dumps(d, sort_keys=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/inada-n/pyenv/versions/3.7.0/lib/python3.7/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/home/inada-n/pyenv/versions/3.7.0/lib/python3.7/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/home/inada-n/pyenv/versions/3.7.0/lib/python3.7/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
TypeError: '<' not supported between instances of 'str' and 'int'

>>> json.encoder.c_make_encoder = None
>>> json.dumps(d)
'{"foo": 42, "foo": "bar"}'

tirkarthi · 2018-07-06T11:56:40Z

Some numbers with ./configure --enable-optimizations

Before patch :

cpython git:(master)   rlwrap ./python
Python 3.8.0a0 (heads/master:c929df3, Jul  6 2018, 11:38:33)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> import json
>>> timeit.timeit('sorted(d.items())', setup='d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.3216385352425277
>>> timeit.timeit('sorted(d.items())', setup='d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.38790418207645416
>>> timeit.timeit('sorted(d.items())', setup='d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.3015676769427955
>>> timeit.timeit('sorted(d.items())', setup='d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.3441896280273795

After patch :

➜  cpython git:(master) ✗ rlwrap ./python
Python 3.8.0a0 (heads/master-dirty:c929df3, Jul  6 2018, 11:44:57)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> import json
>>> timeit.timeit('sorted(d.items())', setup='d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.3076137569732964
>>> timeit.timeit('sorted(d.items())', setup='d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.3111478630453348
>>> timeit.timeit('sorted(d.items())', setup='d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.3163580968976021
>>> timeit.timeit('sorted(d.items())', setup='d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.3175586462020874

>>> # itemgetter
>>> import operator
>>> timeit.timeit('sorted(d.items(), key=operator.itemgetter(0))', setup='import operator; d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.37424366315826774
>>> timeit.timeit('sorted(d.items(), key=operator.itemgetter(0))', setup='import operator; d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.37700869189575315
>>> timeit.timeit('sorted(d.items(), key=operator.itemgetter(0))', setup='import operator; d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.3674369710497558

Little confused that the original issue proposed operator.itemgetter which was the fastest while here it's the slowest. Any information on the difference between the proposed change vs the actual one ? Maybe things have changed from Python 3.4 when it was initially posted and key was removed in the later versions?

Thanks

methane · 2018-07-06T11:59:06Z

https://tools.ietf.org/html/rfc7159#section-4
RFC says "The names within an object SHOULD be unique."

methane · 2018-07-06T12:01:27Z

@tirkarthi You compared sorted(d.items()) with sorted(d.items(), key=itemgetter(0)).
Original issue compared sorted(d.items(), key=lambda x: x[0]) with sorted(d.items(), key=itemgetter(0))

vstinner · 2018-07-06T12:03:48Z

>>> class D(dict):
...     def items(self):
...         return [('foo', 42), ('foo', 'bar')]

Honestly, if you use such weird dictionary, you are going to get worse issues than TypeError on sorting, no?

Ignore my question, your PR is fine.

tirkarthi · 2018-07-06T12:05:11Z

Thanks @methane missed it. Makes sense now

➜  cpython git:(master) ✗ rlwrap ./python
Python 3.8.0a0 (heads/master-dirty:c929df3, Jul  6 2018, 11:44:57)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> import json
>>> timeit.timeit('sorted(d.items(), key=lambda x: x[0])', setup='import operator; d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.41919384989887476
>>> timeit.timeit('sorted(d.items(), key=lambda x: x[0])', setup='import operator; d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.4452404109761119
>>> timeit.timeit('sorted(d.items(), key=lambda x: x[0])', setup='import operator; d = dict.fromkeys(map(str, range(1000)))', number=1000)
0.42263348307460546

vstinner · 2018-07-06T12:06:04Z

Some numbers with ./configure --enable-optimizations (...) import timeit

Noooo, please! If you spend hours to compile Python with PGO, don't use the timeit module which is not reliable! Use perf timeit:
http://perf.readthedocs.io/en/latest/cli.html#timeit-versus-perf-timeit

Your results have a large deviation:

0.3216385352425277
0.38790418207645416
0.3015676769427955
0.3441896280273795

perf timeit should help to reduce the deviation. My doc to get reliable benchmarks:
http://perf.readthedocs.io/en/latest/run_benchmark.html#how-to-get-reproductible-benchmark-results

tirkarthi · 2018-07-06T12:10:17Z

Thanks @vstinner . The after patch doesn't make sense and I misread the posted benchmark that it's related to sorted function instead of related to JSON module and was comparing sorted function. Sorry for the noise.

methane · 2018-07-06T12:19:54Z

Even with perf, performance of json.dumps(d, sort_keys=True, indent=" ") is unstable.

$ ./python -m perf timeit -s 'import json; d = {"k%d" % i: True for i in range(100)}' -- 'json.dumps(d, indent=" ", sort_keys=True)'
.....................
Mean +- std dev: 64.7 us +- 4.5 us

$ ./python -m perf timeit -s 'import json; d = {"k%d" % i: True for i in range(100)}' -- 'json.dumps(d, indent=" ", sort_keys=True)'
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (8.89 us) is 13% of the mean (68.1 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 68.1 us +- 8.9 us

$ ./python -m perf timeit -s 'import json; d = {"k%d" % i: True for i in range(100)}' -- 'json.dumps(d, indent=" ", sort_keys=True)'
.....................
WARNING: the benchmark result may be unstable
* the standard deviation (8.02 us) is 12% of the mean (67.7 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 67.7 us +- 8.0 us

It seems sorting is not bottleneck. We can close the issue without merging this.

vstinner · 2018-07-06T14:54:44Z

It seems sorting is not bottleneck. We can close the issue without merging this.

I suggest to merge this change anyway, to respect the PEP 399: Python and C implementation must behave the same. Currently, the Python implementation is not the default but behaves differently :-(

methane added 2 commits July 6, 2018 16:48

bpo-23493: json: Optimize sort_keys in Python encoder

cad0af2

Stop using key=lambda idiom. This behavior is same to C version encoder.

Add news

452d5c2

methane added the performance Performance or resource usage label Jul 6, 2018

the-knights-who-say-ni added the CLA signed label Jul 6, 2018

bedevere-bot added the awaiting merge label Jul 6, 2018

serhiy-storchaka approved these changes Jul 6, 2018

View reviewed changes

vstinner approved these changes Jul 6, 2018

View reviewed changes

methane closed this Jul 6, 2018

methane deleted the json-sort-keys branch July 6, 2018 12:26

methane restored the json-sort-keys branch July 6, 2018 17:15

methane added skip news and removed performance Performance or resource usage labels Jul 6, 2018

methane reopened this Jul 6, 2018

methane changed the title ~~bpo-23493: json: Optimize sort_keys in Python encoder~~ bpo-23493: json: Make sort_keys in Python encoder same to C Jul 6, 2018

methane changed the title ~~bpo-23493: json: Make sort_keys in Python encoder same to C~~ bpo-23493: json: Change sort_keys in Python encoder same to C Jul 6, 2018

Remove news

5d41abd

methane merged commit e25399b into python:master Jul 6, 2018

bedevere-bot removed the awaiting merge label Jul 6, 2018

methane deleted the json-sort-keys branch July 6, 2018 23:55

Uh oh!

bpo-23493: json: Change sort_keys in Python encoder same to C #8131

bpo-23493: json: Change sort_keys in Python encoder same to C #8131

Uh oh!

Conversation

methane commented Jul 6, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner commented Jul 6, 2018

Uh oh!

vstinner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

methane commented Jul 6, 2018

Uh oh!

tirkarthi commented Jul 6, 2018

Uh oh!

methane commented Jul 6, 2018

Uh oh!

methane commented Jul 6, 2018

Uh oh!

vstinner commented Jul 6, 2018

Uh oh!

tirkarthi commented Jul 6, 2018

Uh oh!

vstinner commented Jul 6, 2018

Uh oh!

tirkarthi commented Jul 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

methane commented Jul 6, 2018

Uh oh!

vstinner commented Jul 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

methane commented Jul 6, 2018 •

edited by bedevere-bot

Loading

vstinner left a comment •

edited

Loading

tirkarthi commented Jul 6, 2018 •

edited

Loading