Skip to content

Refactor and use more buffer view coercion #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 67 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
37f34fa
Add alias to `buffer` on Python 2/3
jakirkham Nov 9, 2018
a2f667f
Add `to_buffer` function
jakirkham Nov 9, 2018
290581a
Use `to_buffer` to implement `buffer_tobytes`
jakirkham Nov 9, 2018
aa25a3f
Use `to_buffer` in `BZ2` to avoid a copy
jakirkham Nov 9, 2018
f17767d
Use `to_buffer` in `GZip` encoding and decoding
jakirkham Nov 9, 2018
987087f
Use `to_buffer` in `LZMA` to avoid a copy
jakirkham Nov 9, 2018
c2841af
Use `to_buffer` in `Zlib` encoding and decoding
jakirkham Nov 9, 2018
d50b664
Drop `handle_datetime` as `to_buffer` handles this
jakirkham Nov 9, 2018
9b51e4f
Only cast datetime/timedeltas in `to_buffer`
jakirkham Nov 9, 2018
f585c63
Flip Python 2/3 cases in `to_buffer`, fix pragmas
jakirkham Nov 9, 2018
bf5638e
Raise in `to_buffer` if `object` array is provided
jakirkham Nov 9, 2018
4425833
Use `buffer_tobytes` on non-`object` types in test
jakirkham Nov 9, 2018
d03f4c6
Check `buffer_tobytes` raises on `object` arrays
jakirkham Nov 9, 2018
c2ae1c4
Drop object checks from BZ2, GZip, LZMA, and Zlib
jakirkham Nov 9, 2018
c7b3f35
Cast datetime/timedelta arrays to uint64
jakirkham Nov 9, 2018
1052108
Revert "Use `buffer_tobytes` on non-`object` types in test"
jakirkham Nov 9, 2018
2764be8
Merge branch 'master' into use_more_buffers
jakirkham Nov 9, 2018
732fac7
Simply catch all non-`memoryview` coercible types
jakirkham Nov 10, 2018
7368f07
Test coercing `mmap`'s to `bytes` as well
jakirkham Nov 10, 2018
923d354
Tests `to_buffer` alongside `buffer_tobytes`
jakirkham Nov 10, 2018
1477dff
Breakup array handling, object check, & Python 2/3
jakirkham Nov 10, 2018
7fbb4d6
Use `try...except...else` with `memoryview`
jakirkham Nov 10, 2018
ddfbaee
Reshape all `memoryview`s using NumPy
jakirkham Nov 10, 2018
8b8d93e
Use `b` instead of `m`
jakirkham Nov 10, 2018
27b6ce3
Move reshape into `else`
jakirkham Nov 10, 2018
43ba708
Swap `object` array cases
jakirkham Nov 10, 2018
33bb8a3
Mark `to_buffer`'s NumPy array as readonly
jakirkham Nov 10, 2018
f934c85
Test `to_buffer` returns readonly buffers
jakirkham Nov 10, 2018
ce5bce1
Coerce `memoryview`s and `buffer`s to `ndarray`s
jakirkham Nov 11, 2018
32721f6
Use `np.getbuffer` instead of `buffer`
jakirkham Nov 11, 2018
30b63d2
Fix-up some codecs to handle `ndarray`
jakirkham Nov 11, 2018
98a7860
Check that `to_buffer` returns an `ndarray`
jakirkham Nov 11, 2018
c046a72
Ensure `array` has the correct type on Python 2
jakirkham Nov 11, 2018
5fb505d
State `arange` type in `test_buffer_coercion`
jakirkham Nov 11, 2018
efc7552
Test that `to_buffer` maintains types
jakirkham Nov 11, 2018
230da21
Drop the cast to `memoryview` in `to_buffer`
jakirkham Nov 11, 2018
af0946d
Use to_buffer directly instead of buffer_tobytes
jakirkham Nov 11, 2018
a0d299d
Drop `buffer_tobytes`
jakirkham Nov 11, 2018
dc58d20
Check that the buffer and the ndarray share memory
jakirkham Nov 11, 2018
d6bac35
Test type matches against `ndarray`
jakirkham Nov 11, 2018
cd550a6
Drop tests of `ndarray` coercion
jakirkham Nov 11, 2018
ad90e70
Skip coercion of `ndarray`s
jakirkham Nov 11, 2018
ab159bf
Join type checks after coercing non-`ndarray`s
jakirkham Nov 11, 2018
5b780d3
Limit casting of `array` to Python 2
jakirkham Nov 11, 2018
18f2f33
Drop disabling writing in `to_buffer`
jakirkham Nov 11, 2018
3aea13f
Move array coercion to `ndarray_from_buffer`
jakirkham Nov 11, 2018
398a3e9
Handle casting outside of `ndarray_from_buffer`
jakirkham Nov 12, 2018
592be49
Drop `ndarray_from_buffer`'s `dtype` argument
jakirkham Nov 12, 2018
fa92401
Handle reshaping outside of `ndarray_from_buffer`
jakirkham Nov 12, 2018
16701a8
Rewrite `buffer_copy` using `to_buffer`
jakirkham Nov 12, 2018
7a2b3b7
Raise for unicode `array` in `ndarray_from_buffer`
jakirkham Nov 12, 2018
074bf31
Check that unicode `array`'s raise an error
jakirkham Nov 12, 2018
de0a947
Revert "Check that unicode `array`'s raise an error"
jakirkham Nov 12, 2018
eacfdd7
Revert "Raise for unicode `array` in `ndarray_from_buffer`"
jakirkham Nov 12, 2018
9d43a3b
Cast around unicode array
jakirkham Nov 12, 2018
d0b0ef7
Test handling of unicode `array`s
jakirkham Nov 12, 2018
a789c4d
Add utility `getbuffer` function
jakirkham Nov 12, 2018
e01195c
Use `getbuffer` for checking shared memory
jakirkham Nov 12, 2018
94fcc37
Preserve `memoryview` of `array`'s `itemsize`
jakirkham Nov 12, 2018
5aa7b89
Re-raise `TypeError` on Python 3
jakirkham Nov 13, 2018
f0b5508
Test that non-buffer types raise errors
jakirkham Nov 13, 2018
78d39a2
Merge 'zarr-developers/master' into 'jakirkham/use_more_buffers'
jakirkham Nov 16, 2018
96625e8
Drop PY2K/PY3K `buffer` assignment
jakirkham Nov 22, 2018
ae0434f
Inline `getbuffer`
jakirkham Nov 22, 2018
3ed988b
Cast `memoryview` to `uint8` for `shares_memory`
jakirkham Nov 22, 2018
1f1f9f4
Add no cover pragmas for shared memory test
jakirkham Nov 23, 2018
95f38a5
Merge 'zarr-developers/master' into 'jakirkham/use_more_buffers'
jakirkham Nov 23, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions numcodecs/astype.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ def __init__(self, encode_dtype, decode_dtype):
def encode(self, buf):

# view input data as 1D array
arr = ndarray_from_buffer(buf, self.decode_dtype)
arr = ndarray_from_buffer(buf)
arr = arr.view(self.decode_dtype).reshape(-1, order='A')

# convert and copy
enc = arr.astype(self.encode_dtype)
Expand All @@ -60,7 +61,8 @@ def encode(self, buf):
def decode(self, buf, out=None):

# view encoded data as 1D array
enc = ndarray_from_buffer(buf, self.encode_dtype)
enc = ndarray_from_buffer(buf)
enc = enc.view(self.encode_dtype).reshape(-1, order='A')

# convert and copy
dec = enc.astype(self.decode_dtype)
Expand Down
24 changes: 3 additions & 21 deletions numcodecs/bz2.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function, division
import bz2 as _bz2
import array


import numpy as np


from numcodecs.abc import Codec
from numcodecs.compat import buffer_copy, handle_datetime
from numcodecs.compat import buffer_copy, to_buffer


class BZ2(Codec):
Expand All @@ -28,29 +24,15 @@ def __init__(self, level=1):

def encode(self, buf):

# deal with lack of buffer support for datetime64 and timedelta64
buf = handle_datetime(buf)

if isinstance(buf, np.ndarray):

# cannot compress object array
if buf.dtype == object:
raise ValueError('cannot encode object array')

# if numpy array, can only handle C contiguous directly
if not buf.flags.c_contiguous:
buf = buf.tobytes(order='A')
buf = to_buffer(buf)

# do compression
return _bz2.compress(buf, self.level)

# noinspection PyMethodMayBeStatic
def decode(self, buf, out=None):

# BZ2 cannot handle ndarray directly at all, coerce everything to
# memoryview
if not isinstance(buf, array.array):
buf = memoryview(buf)
buf = memoryview(to_buffer(buf))

# do decompression
dec = _bz2.decompress(buf)
Expand Down
6 changes: 4 additions & 2 deletions numcodecs/categorize.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@ def __init__(self, labels, dtype, astype='u1'):
def encode(self, buf):

# view input as ndarray
arr = ndarray_from_buffer(buf, self.dtype)
arr = ndarray_from_buffer(buf)
arr = arr.view(self.dtype).reshape(-1, order='A')

# setup output array
enc = np.zeros_like(arr, dtype=self.astype)
Expand All @@ -67,7 +68,8 @@ def encode(self, buf):
def decode(self, buf, out=None):

# view encoded data as ndarray
enc = ndarray_from_buffer(buf, self.astype)
enc = ndarray_from_buffer(buf)
enc = enc.view(self.astype).reshape(-1, order='A')

# setup output
if isinstance(out, np.ndarray):
Expand Down
6 changes: 4 additions & 2 deletions numcodecs/checksum32.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,17 @@ class Checksum32(Codec):
def encode(self, buf):
if isinstance(buf, np.ndarray) and buf.dtype == object:
raise ValueError('cannot encode object array')
arr = ndarray_from_buffer(buf, dtype='u1')
arr = ndarray_from_buffer(buf)
arr = arr.view('u1').reshape(-1, order='A')
checksum = self.checksum(arr) & 0xffffffff
enc = np.empty(arr.nbytes + 4, dtype='u1')
enc[:4].view('<u4')[0] = checksum
enc[4:] = arr
return enc

def decode(self, buf, out=None):
arr = ndarray_from_buffer(buf, dtype='u1')
arr = ndarray_from_buffer(buf)
arr = arr.view('u1').reshape(-1, order='A')
expect = arr[:4].view('<u4')[0]
checksum = self.checksum(arr[4:]) & 0xffffffff
if expect != checksum:
Expand Down
103 changes: 50 additions & 53 deletions numcodecs/compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,19 @@
from functools import reduce


def buffer_tobytes(v):
"""Obtain a sequence of bytes for the memory buffer used by `v`."""
if isinstance(v, binary_type):
return v
elif isinstance(v, np.ndarray):
return v.tobytes(order='A')
elif PY2 and isinstance(v, array.array): # pragma: py3 no cover
return v.tostring()
else:
return memoryview(v).tobytes()
def to_buffer(v):
"""Obtain a `buffer` or `memoryview` for `v`."""

b = ndarray_from_buffer(v)

if b.dtype.kind is 'O':
raise ValueError('cannot encode object array')
elif b.dtype.kind in 'Mm':
b = b.view(np.int64)

b = b.reshape(-1, order='A')

return b


def buffer_copy(buf, out=None):
Expand All @@ -46,50 +49,50 @@ def buffer_copy(buf, out=None):
# no-op
return buf

# handle ndarray destination
if isinstance(out, np.ndarray):

# view source as destination dtype
if isinstance(buf, np.ndarray):
buf = buf.view(dtype=out.dtype).reshape(-1, order='A')
else:
buf = np.frombuffer(buf, dtype=out.dtype)

# ensure shapes are compatible
if buf.shape != out.shape:
if out.flags.f_contiguous:
order = 'F'
else:
order = 'C'
buf = buf.reshape(out.shape, order=order)

# copy via numpy
np.copyto(out, buf)

# handle generic buffer destination
else:
# coerce to ndarrays
buf = ndarray_from_buffer(buf)
out = ndarray_from_buffer(out)

# obtain memoryview of destination
dest = memoryview(out)
# view source as destination dtype
if out.dtype.kind is not 'O':
buf = buf.view(out.dtype)

# ensure source is 1D
if isinstance(buf, np.ndarray):
buf = buf.reshape(-1, order='A')
# try to match itemsize
dtype = 'u%s' % dest.itemsize
buf = buf.view(dtype=dtype)
# ensure shapes are compatible
buf = buf.reshape(-1, order='A')
if buf.shape != out.shape:
if out.flags.f_contiguous:
order = 'F'
else:
order = 'C'
buf = buf.reshape(out.shape, order=order)

# try to copy via memoryview
dest[:] = buf
# copy via numpy
np.copyto(out, buf)

return out


def ndarray_from_buffer(buf, dtype):
if isinstance(buf, np.ndarray):
arr = buf.reshape(-1, order='A').view(dtype)
else:
arr = np.frombuffer(buf, dtype=dtype)
def ndarray_from_buffer(buf):
arr = buf
if not isinstance(arr, np.ndarray):
try:
arr = memoryview(arr)
except TypeError:
if PY2: # pragma: py3 no cover
arr = np.getbuffer(arr)
else:
raise
else: # pragma: py2 no cover
if isinstance(buf, array.array) and buf.typecode is 'u':
arr = arr.cast('B').cast(np.dtype('u%i' % buf.itemsize).char)

arr = np.array(arr, copy=False)
if PY2 and isinstance(buf, array.array): # pragma: py3 no cover
if buf.typecode is 'u':
arr = arr.view('u%i' % buf.itemsize)
else:
arr = arr.view(buf.typecode)

return arr


Expand All @@ -98,9 +101,3 @@ def ensure_text(l, encoding='utf-8'):
return l
else: # pragma: py3 no cover
return text_type(l, encoding=encoding)


def handle_datetime(buf):
if hasattr(buf, 'dtype') and buf.dtype.kind in 'Mm':
return buf.view('i8')
return buf
6 changes: 4 additions & 2 deletions numcodecs/delta.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ def __init__(self, dtype, astype=None):
def encode(self, buf):

# view input data as 1D array
arr = ndarray_from_buffer(buf, self.dtype)
arr = ndarray_from_buffer(buf)
arr = arr.view(self.dtype).reshape(-1, order='A')

# setup encoded output
enc = np.empty_like(arr, dtype=self.astype)
Expand All @@ -73,7 +74,8 @@ def encode(self, buf):
def decode(self, buf, out=None):

# view encoded data as 1D array
enc = ndarray_from_buffer(buf, self.astype)
enc = ndarray_from_buffer(buf)
enc = enc.view(self.astype).reshape(-1, order='A')

# setup decoded output
if isinstance(out, np.ndarray):
Expand Down
6 changes: 4 additions & 2 deletions numcodecs/fixedscaleoffset.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@ def __init__(self, offset, scale, dtype, astype=None):
def encode(self, buf):

# interpret buffer as 1D array
arr = ndarray_from_buffer(buf, self.dtype)
arr = ndarray_from_buffer(buf)
arr = arr.view(self.dtype).reshape(-1, order='A')

# compute scale offset
enc = (arr - self.offset) * self.scale
Expand All @@ -104,7 +105,8 @@ def encode(self, buf):
def decode(self, buf, out=None):

# interpret buffer as 1D array
enc = ndarray_from_buffer(buf, self.astype)
enc = ndarray_from_buffer(buf)
enc = enc.view(self.astype).reshape(-1, order='A')

# decode scale offset
dec = (enc / self.scale) + self.offset
Expand Down
26 changes: 3 additions & 23 deletions numcodecs/gzip.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,8 @@
import io


import numpy as np


from .abc import Codec
from .compat import buffer_copy, handle_datetime, buffer_tobytes, PY2
from .compat import buffer_copy, to_buffer


class GZip(Codec):
Expand All @@ -28,22 +25,7 @@ def __init__(self, level=1):

def encode(self, buf):

# deal with lack of buffer support for datetime64 and timedelta64
buf = handle_datetime(buf)

if isinstance(buf, np.ndarray):

# cannot compress object array
if buf.dtype == object:
raise ValueError('cannot encode object array')

# if numpy array, can only handle C contiguous directly
if not buf.flags.c_contiguous:
buf = buf.tobytes(order='A')

if PY2: # pragma: py3 no cover
# ensure bytes, PY2 cannot handle things like bytearray
buf = buffer_tobytes(buf)
buf = memoryview(to_buffer(buf))

# do compression
compressed = io.BytesIO()
Expand All @@ -58,9 +40,7 @@ def encode(self, buf):
# noinspection PyMethodMayBeStatic
def decode(self, buf, out=None):

if PY2: # pragma: py3 no cover
# ensure bytes, PY2 cannot handle things like bytearray
buf = buffer_tobytes(buf)
buf = to_buffer(buf)

# do decompression
buf = io.BytesIO(buf)
Expand Down
6 changes: 3 additions & 3 deletions numcodecs/json.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@


from .abc import Codec
from .compat import buffer_tobytes
from .compat import to_buffer


class JSON(Codec):
Expand Down Expand Up @@ -63,7 +63,7 @@ def encode(self, buf):
return self._encoder.encode(items).encode(self._text_encoding)

def decode(self, buf, out=None):
buf = buffer_tobytes(buf)
buf = to_buffer(buf).tobytes()
items = self._decoder.decode(buf.decode(self._text_encoding))
dec = np.empty(items[-1], dtype=items[-2])
dec[:] = items[:-2]
Expand Down Expand Up @@ -125,7 +125,7 @@ def encode(self, buf):
return self._encoder.encode(items).encode(self._text_encoding)

def decode(self, buf, out=None):
buf = buffer_tobytes(buf)
buf = to_buffer(buf).tobytes()
items = self._decoder.decode(buf.decode(self._text_encoding))
dec = np.array(items[:-1], dtype=items[-1])
if out is not None:
Expand Down
18 changes: 4 additions & 14 deletions numcodecs/lzma.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,8 @@

if _lzma:

import numpy as np
from .abc import Codec
from .compat import buffer_copy, handle_datetime
from .compat import buffer_copy, to_buffer

# noinspection PyShadowingBuiltins
class LZMA(Codec):
Expand Down Expand Up @@ -48,25 +47,16 @@ def __init__(self, format=1, check=-1, preset=None, filters=None):

def encode(self, buf):

# deal with lack of buffer support for datetime64 and timedelta64
buf = handle_datetime(buf)

if isinstance(buf, np.ndarray):

# cannot compress object array
if buf.dtype == object:
raise ValueError('cannot encode object array')

# if numpy array, can only handle C contiguous directly
if not buf.flags.c_contiguous:
buf = buf.tobytes(order='A')
buf = to_buffer(buf)

# do compression
return _lzma.compress(buf, format=self.format, check=self.check,
preset=self.preset, filters=self.filters)

def decode(self, buf, out=None):

buf = to_buffer(buf)

# do decompression
dec = _lzma.decompress(buf, format=self.format,
filters=self.filters)
Expand Down
Loading