Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and use more buffer view coercion #121

Closed
wants to merge 67 commits into from
Closed
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
37f34fa
Add alias to `buffer` on Python 2/3
jakirkham Nov 9, 2018
a2f667f
Add `to_buffer` function
jakirkham Nov 9, 2018
290581a
Use `to_buffer` to implement `buffer_tobytes`
jakirkham Nov 9, 2018
aa25a3f
Use `to_buffer` in `BZ2` to avoid a copy
jakirkham Nov 9, 2018
f17767d
Use `to_buffer` in `GZip` encoding and decoding
jakirkham Nov 9, 2018
987087f
Use `to_buffer` in `LZMA` to avoid a copy
jakirkham Nov 9, 2018
c2841af
Use `to_buffer` in `Zlib` encoding and decoding
jakirkham Nov 9, 2018
d50b664
Drop `handle_datetime` as `to_buffer` handles this
jakirkham Nov 9, 2018
9b51e4f
Only cast datetime/timedeltas in `to_buffer`
jakirkham Nov 9, 2018
f585c63
Flip Python 2/3 cases in `to_buffer`, fix pragmas
jakirkham Nov 9, 2018
bf5638e
Raise in `to_buffer` if `object` array is provided
jakirkham Nov 9, 2018
4425833
Use `buffer_tobytes` on non-`object` types in test
jakirkham Nov 9, 2018
d03f4c6
Check `buffer_tobytes` raises on `object` arrays
jakirkham Nov 9, 2018
c2ae1c4
Drop object checks from BZ2, GZip, LZMA, and Zlib
jakirkham Nov 9, 2018
c7b3f35
Cast datetime/timedelta arrays to uint64
jakirkham Nov 9, 2018
1052108
Revert "Use `buffer_tobytes` on non-`object` types in test"
jakirkham Nov 9, 2018
2764be8
Merge branch 'master' into use_more_buffers
jakirkham Nov 9, 2018
732fac7
Simply catch all non-`memoryview` coercible types
jakirkham Nov 10, 2018
7368f07
Test coercing `mmap`'s to `bytes` as well
jakirkham Nov 10, 2018
923d354
Tests `to_buffer` alongside `buffer_tobytes`
jakirkham Nov 10, 2018
1477dff
Breakup array handling, object check, & Python 2/3
jakirkham Nov 10, 2018
7fbb4d6
Use `try...except...else` with `memoryview`
jakirkham Nov 10, 2018
ddfbaee
Reshape all `memoryview`s using NumPy
jakirkham Nov 10, 2018
8b8d93e
Use `b` instead of `m`
jakirkham Nov 10, 2018
27b6ce3
Move reshape into `else`
jakirkham Nov 10, 2018
43ba708
Swap `object` array cases
jakirkham Nov 10, 2018
33bb8a3
Mark `to_buffer`'s NumPy array as readonly
jakirkham Nov 10, 2018
f934c85
Test `to_buffer` returns readonly buffers
jakirkham Nov 10, 2018
ce5bce1
Coerce `memoryview`s and `buffer`s to `ndarray`s
jakirkham Nov 11, 2018
32721f6
Use `np.getbuffer` instead of `buffer`
jakirkham Nov 11, 2018
30b63d2
Fix-up some codecs to handle `ndarray`
jakirkham Nov 11, 2018
98a7860
Check that `to_buffer` returns an `ndarray`
jakirkham Nov 11, 2018
c046a72
Ensure `array` has the correct type on Python 2
jakirkham Nov 11, 2018
5fb505d
State `arange` type in `test_buffer_coercion`
jakirkham Nov 11, 2018
efc7552
Test that `to_buffer` maintains types
jakirkham Nov 11, 2018
230da21
Drop the cast to `memoryview` in `to_buffer`
jakirkham Nov 11, 2018
af0946d
Use to_buffer directly instead of buffer_tobytes
jakirkham Nov 11, 2018
a0d299d
Drop `buffer_tobytes`
jakirkham Nov 11, 2018
dc58d20
Check that the buffer and the ndarray share memory
jakirkham Nov 11, 2018
d6bac35
Test type matches against `ndarray`
jakirkham Nov 11, 2018
cd550a6
Drop tests of `ndarray` coercion
jakirkham Nov 11, 2018
ad90e70
Skip coercion of `ndarray`s
jakirkham Nov 11, 2018
ab159bf
Join type checks after coercing non-`ndarray`s
jakirkham Nov 11, 2018
5b780d3
Limit casting of `array` to Python 2
jakirkham Nov 11, 2018
18f2f33
Drop disabling writing in `to_buffer`
jakirkham Nov 11, 2018
3aea13f
Move array coercion to `ndarray_from_buffer`
jakirkham Nov 11, 2018
398a3e9
Handle casting outside of `ndarray_from_buffer`
jakirkham Nov 12, 2018
592be49
Drop `ndarray_from_buffer`'s `dtype` argument
jakirkham Nov 12, 2018
fa92401
Handle reshaping outside of `ndarray_from_buffer`
jakirkham Nov 12, 2018
16701a8
Rewrite `buffer_copy` using `to_buffer`
jakirkham Nov 12, 2018
7a2b3b7
Raise for unicode `array` in `ndarray_from_buffer`
jakirkham Nov 12, 2018
074bf31
Check that unicode `array`'s raise an error
jakirkham Nov 12, 2018
de0a947
Revert "Check that unicode `array`'s raise an error"
jakirkham Nov 12, 2018
eacfdd7
Revert "Raise for unicode `array` in `ndarray_from_buffer`"
jakirkham Nov 12, 2018
9d43a3b
Cast around unicode array
jakirkham Nov 12, 2018
d0b0ef7
Test handling of unicode `array`s
jakirkham Nov 12, 2018
a789c4d
Add utility `getbuffer` function
jakirkham Nov 12, 2018
e01195c
Use `getbuffer` for checking shared memory
jakirkham Nov 12, 2018
94fcc37
Preserve `memoryview` of `array`'s `itemsize`
jakirkham Nov 12, 2018
5aa7b89
Re-raise `TypeError` on Python 3
jakirkham Nov 13, 2018
f0b5508
Test that non-buffer types raise errors
jakirkham Nov 13, 2018
78d39a2
Merge 'zarr-developers/master' into 'jakirkham/use_more_buffers'
jakirkham Nov 16, 2018
96625e8
Drop PY2K/PY3K `buffer` assignment
jakirkham Nov 22, 2018
ae0434f
Inline `getbuffer`
jakirkham Nov 22, 2018
3ed988b
Cast `memoryview` to `uint8` for `shares_memory`
jakirkham Nov 22, 2018
1f1f9f4
Add no cover pragmas for shared memory test
jakirkham Nov 23, 2018
95f38a5
Merge 'zarr-developers/master' into 'jakirkham/use_more_buffers'
jakirkham Nov 23, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 3 additions & 12 deletions numcodecs/bz2.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function, division
import bz2 as _bz2
import array


import numpy as np


from numcodecs.abc import Codec
from numcodecs.compat import buffer_copy, handle_datetime
from numcodecs.compat import buffer_copy, to_buffer


class BZ2(Codec):
Expand All @@ -28,29 +27,21 @@ def __init__(self, level=1):

def encode(self, buf):

# deal with lack of buffer support for datetime64 and timedelta64
buf = handle_datetime(buf)

if isinstance(buf, np.ndarray):

# cannot compress object array
if buf.dtype == object:
raise ValueError('cannot encode object array')

# if numpy array, can only handle C contiguous directly
if not buf.flags.c_contiguous:
buf = buf.tobytes(order='A')
buf = to_buffer(buf)
jakirkham marked this conversation as resolved.
Show resolved Hide resolved

# do compression
return _bz2.compress(buf, self.level)

# noinspection PyMethodMayBeStatic
def decode(self, buf, out=None):

# BZ2 cannot handle ndarray directly at all, coerce everything to
# memoryview
if not isinstance(buf, array.array):
buf = memoryview(buf)
buf = to_buffer(buf)

# do decompression
dec = _bz2.decompress(buf)
Expand Down
40 changes: 26 additions & 14 deletions numcodecs/compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,29 +14,47 @@

if PY2: # pragma: py3 no cover

buffer = buffer
text_type = unicode
binary_type = str
integer_types = (int, long)
reduce = reduce

else: # pragma: py2 no cover

buffer = memoryview
text_type = str
binary_type = bytes
integer_types = int,
from functools import reduce


def to_buffer(v):
"""Obtain a `buffer` or `memoryview` for `v`."""
b = v
if isinstance(b, np.ndarray):
if b.dtype.kind is 'O':
raise ValueError('cannot encode object array')
b = b.reshape(-1, order='A')
if b.dtype.kind in 'Mm':
b = b.view(np.uint8)
jakirkham marked this conversation as resolved.
Show resolved Hide resolved
b = buffer(b)
elif PY2: # pragma: py3 no cover
if not isinstance(b, array.array) and memoryview(b).format is 'O':
raise ValueError('cannot encode object array')
b = buffer(b)
else: # pragma: py2 no cover
b = buffer(b)
if b.format is 'O':
raise ValueError('cannot encode object array')
b = b.cast('B').cast(b.format)
jakirkham marked this conversation as resolved.
Show resolved Hide resolved

return b


def buffer_tobytes(v):
"""Obtain a sequence of bytes for the memory buffer used by `v`."""
if isinstance(v, binary_type):
return v
elif isinstance(v, np.ndarray):
return v.tobytes(order='A')
elif PY2 and isinstance(v, array.array): # pragma: py3 no cover
return v.tostring()
else:
return memoryview(v).tobytes()
return memoryview(to_buffer(v)).tobytes()


def buffer_copy(buf, out=None):
Expand Down Expand Up @@ -98,9 +116,3 @@ def ensure_text(l, encoding='utf-8'):
return l
else: # pragma: py3 no cover
return text_type(l, encoding=encoding)


def handle_datetime(buf):
if hasattr(buf, 'dtype') and buf.dtype.kind in 'Mm':
return buf.view('u8')
return buf
17 changes: 3 additions & 14 deletions numcodecs/gzip.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@


from .abc import Codec
from .compat import buffer_copy, handle_datetime, buffer_tobytes, PY2
from .compat import buffer_copy, to_buffer


class GZip(Codec):
Expand All @@ -28,22 +28,13 @@ def __init__(self, level=1):

def encode(self, buf):

# deal with lack of buffer support for datetime64 and timedelta64
buf = handle_datetime(buf)

if isinstance(buf, np.ndarray):

# cannot compress object array
if buf.dtype == object:
raise ValueError('cannot encode object array')

# if numpy array, can only handle C contiguous directly
if not buf.flags.c_contiguous:
buf = buf.tobytes(order='A')

if PY2: # pragma: py3 no cover
# ensure bytes, PY2 cannot handle things like bytearray
buf = buffer_tobytes(buf)
buf = to_buffer(buf)

# do compression
compressed = io.BytesIO()
Expand All @@ -58,9 +49,7 @@ def encode(self, buf):
# noinspection PyMethodMayBeStatic
def decode(self, buf, out=None):

if PY2: # pragma: py3 no cover
# ensure bytes, PY2 cannot handle things like bytearray
buf = buffer_tobytes(buf)
buf = to_buffer(buf)

# do decompression
buf = io.BytesIO(buf)
Expand Down
11 changes: 4 additions & 7 deletions numcodecs/lzma.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

import numpy as np
from .abc import Codec
from .compat import buffer_copy, handle_datetime
from .compat import buffer_copy, to_buffer

# noinspection PyShadowingBuiltins
class LZMA(Codec):
Expand Down Expand Up @@ -48,25 +48,22 @@ def __init__(self, format=1, check=-1, preset=None, filters=None):

def encode(self, buf):

# deal with lack of buffer support for datetime64 and timedelta64
buf = handle_datetime(buf)

if isinstance(buf, np.ndarray):

# cannot compress object array
if buf.dtype == object:
raise ValueError('cannot encode object array')

# if numpy array, can only handle C contiguous directly
if not buf.flags.c_contiguous:
buf = buf.tobytes(order='A')
buf = to_buffer(buf)

# do compression
return _lzma.compress(buf, format=self.format, check=self.check,
preset=self.preset, filters=self.filters)

def decode(self, buf, out=None):

buf = to_buffer(buf)

# do decompression
dec = _lzma.decompress(buf, format=self.format,
filters=self.filters)
Expand Down
3 changes: 2 additions & 1 deletion numcodecs/tests/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,8 @@ def check_backwards_compatibility(codec_id, arrays, codecs, precision=None, pref
# setup
i = int(arr_fn.split('.')[-2])
arr = np.load(arr_fn)
arr_bytes = buffer_tobytes(arr)
if arr.dtype.kind is not 'O':
arr_bytes = buffer_tobytes(arr)
jakirkham marked this conversation as resolved.
Show resolved Hide resolved
if arr.flags.f_contiguous:
order = 'F'
else:
Expand Down
9 changes: 9 additions & 0 deletions numcodecs/tests/test_compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,22 @@
from __future__ import absolute_import, print_function, division
import array

import pytest

import numpy as np


from numcodecs.compat import buffer_tobytes


def test_buffer_tobytes_raises():
a = np.array([u'Xin chào thế giới'], dtype=object)
with pytest.raises(ValueError):
buffer_tobytes(a)
with pytest.raises(ValueError):
buffer_tobytes(memoryview(a))
jakirkham marked this conversation as resolved.
Show resolved Hide resolved


def test_buffer_tobytes():
bufs = [
b'adsdasdas',
Expand Down
17 changes: 3 additions & 14 deletions numcodecs/zlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@


from .abc import Codec
from .compat import buffer_copy, handle_datetime, buffer_tobytes, PY2
from .compat import buffer_copy, to_buffer


class Zlib(Codec):
Expand All @@ -27,32 +27,21 @@ def __init__(self, level=1):

def encode(self, buf):

# deal with lack of buffer support for datetime64 and timedelta64
buf = handle_datetime(buf)

if isinstance(buf, np.ndarray):

# cannot compress object array
if buf.dtype == object:
raise ValueError('cannot encode object array')

# if numpy array, can only handle C contiguous directly
if not buf.flags.c_contiguous:
buf = buf.tobytes(order='A')

if PY2: # pragma: py3 no cover
# ensure bytes, PY2 cannot handle things like bytearray
buf = buffer_tobytes(buf)
buf = to_buffer(buf)

# do compression
return _zlib.compress(buf, self.level)

# noinspection PyMethodMayBeStatic
def decode(self, buf, out=None):

if PY2: # pragma: py3 no cover
# ensure bytes, PY2 cannot handle things like bytearray
buf = buffer_tobytes(buf)
buf = to_buffer(buf)

# do decompression
dec = _zlib.decompress(buf)
Expand Down