Skip to content

Use native bytearray truncation #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

tomchristie
Copy link
Contributor

Switching from .compress() to using native buffer truncation.

I figure despite some related work on #115 it's worth looking at this PR in isolation.

We probably want this to be blocked on #116.

Benchmarking on Python 3.7...

Before:

$ PYTHONPATH=. venv/bin/python bench/benchmarks/benchmarks.py
6901.9 requests/sec
7055.0 requests/sec
7084.2 requests/sec
7070.6 requests/sec
7107.9 requests/sec
7075.8 requests/sec
7079.4 requests/sec

$ PYTHONPATH=. venv/bin/python bench/benchmarks/benchmarks.py
6972.9 requests/sec
6996.9 requests/sec
6947.3 requests/sec
6999.9 requests/sec
7040.6 requests/sec
7030.0 requests/sec
6997.6 requests/sec

After:

$ PYTHONPATH=. venv/bin/python bench/benchmarks/benchmarks.py
7054.6 requests/sec
7133.3 requests/sec
7148.3 requests/sec
7116.4 requests/sec
7121.0 requests/sec
7128.2 requests/sec
7156.5 requests/sec

$ PYTHONPATH=. venv/bin/python bench/benchmarks/benchmarks.py
7018.9 requests/sec
7074.7 requests/sec
7075.0 requests/sec
7070.6 requests/sec
7028.2 requests/sec
7118.3 requests/sec
7131.2 requests/sec

@tomchristie
Copy link
Contributor Author

We also ought to follow up on the action in the comment "make sure PyPy also has the optimization", but I'm not sure where to start with that.

# bytearray is amortized O(n), thanks to some excellent work by Antoine
# Martin:
#
# https://bugs.python.org/issue19087
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be helpful to keep the paragraph above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, yes. I've addressed it as an inline comment.

if not out:
return None
self._start += len(out)
self._data[:count] = b""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a del would be a little clearer but maybe it's just me. (Also below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, yup!

@bluetech
Copy link
Contributor

We also ought to follow up on the action in the comment "make sure PyPy also has the optimization", but I'm not sure where to start with that.

With this code:

size = 50 * 2**20
b = bytearray(size)
for i in range(size):
    del b[0]

Results:

python3.8:           4.257s
pypy3 version 7.3.2: 0.326s
pypy2 version 7.3.2: 0.326s
python2.7:           DNF

So it's either that pypy has this optimization, or it's doing some magic JIT handling of this specific code. I tried some tricks to foil any such optimization and it seems legit.

If we're willing to have a timing-based unit test for this, a size = 3 * 2**20 with a timeout of 5s would do the trick IMO, even with a super fast CPU running the quadratic case or a very slow CPU running the linear case. But it will probably be flaky anyway...

Co-authored-by: Ran Benita <ran@unusedvar.com>
@njsmith
Copy link
Member

njsmith commented Dec 23, 2020

I think this has been superseded by #115

@pgjones
Copy link
Member

pgjones commented Dec 26, 2020

Agree with #115 superseding this. (Please reopen if not).

@pgjones pgjones closed this Dec 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants