Skip to content

Conditionally use indexed_gzip #552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Sep 25, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
9e60eeb
NF: Add ability for ArrayProxy to keep its ImageOpener open, instead of
pauldmccarthy Sep 3, 2017
3d170be
NF: Use indexed_gzip, if it is present, when opening .gz files in rb …
pauldmccarthy Sep 3, 2017
32e3b05
BF: fileslice.fileslice and read_segments functions optionally accept a
pauldmccarthy Sep 3, 2017
572ab00
BF: Making all existing unit tests happy
pauldmccarthy Sep 3, 2017
7cbbab2
TEST: Unit tests for ArrayProxy keep_file_open flag, and handling of …
pauldmccarthy Sep 3, 2017
052ec60
TEST: Unit test to check thread-safety of fileslice.read_segments
pauldmccarthy Sep 4, 2017
172bb61
DOC: Clarification in read_segments doc, and inline comments in
pauldmccarthy Sep 4, 2017
bd39417
TEST: Unit test to make sure that GzipFile/SafeIndexedGzipFile classe…
pauldmccarthy Sep 4, 2017
cc73dd6
BF: Typo in _gzip_open was preventing max_read_chunk from having any …
pauldmccarthy Sep 4, 2017
9ceaf7f
TEST: Another fix to existing unit test
pauldmccarthy Sep 5, 2017
bf2ad0d
TEST: Compatibility for numpy 1.7. Also some style fixes to make flak…
pauldmccarthy Sep 6, 2017
388daae
BF: Typo
pauldmccarthy Sep 6, 2017
382dcf8
RF: Make _gzip_open a bit cleaner and simpler. Move the DummyLock out of
pauldmccarthy Sep 6, 2017
89a2d06
BF: ArrayProxy was closing pre-opened file handle when keep_file_open…
pauldmccarthy Sep 6, 2017
a6972c0
TEST: Unit test to make sure that keep_file_open works for pre-opened…
pauldmccarthy Sep 6, 2017
0a111ca
RF: Adjust the default behaviour of the ArrayProxy keep_file_open fla…
pauldmccarthy Sep 7, 2017
961f882
TEST: New test which checks default behaviour of the ArrayProxy keep_…
pauldmccarthy Sep 7, 2017
cd65af4
RF: Changed default keep_file_open value to 'auto', to make it a bit …
pauldmccarthy Sep 7, 2017
bfc0013
TEST: Test explicitly passing in value for ArrayProxy keep_file_open…
pauldmccarthy Sep 7, 2017
d9a1ebd
TEST: Make list element value change thread safe in fileslice/thread …
pauldmccarthy Sep 8, 2017
093ae6e
RF: Indexed_gzip import test performed once at top of openers.py. Add…
pauldmccarthy Sep 8, 2017
d7c7cac
DOC: Fixed docstrings referring to keep_file_open
pauldmccarthy Sep 8, 2017
39dea17
TEST: Update ArrayProxy/Opener tests to work with new HAVE_INDEXED_GZ…
pauldmccarthy Sep 8, 2017
82e7c84
NF+RF: MGHImage from_file_map/from_filename methods passes keep_file_…
pauldmccarthy Sep 10, 2017
191eb5c
TEST: Test invalid values for ArrayProxy keep_file_open flag. Add ind…
pauldmccarthy Sep 10, 2017
16f191e
RF: Explicitly catch ImportError, instead of catch-all
pauldmccarthy Sep 20, 2017
aae3417
RF: Default keep_file_open value is now False, and can be controlled …
pauldmccarthy Sep 23, 2017
9008696
TEST: Adjusted keep_file_open tests, and test changes to the
pauldmccarthy Sep 23, 2017
ed55620
DOC: Moved attribute docstring
pauldmccarthy Sep 23, 2017
0b41c49
TEST: Fixed indentation in arrayproxy tests. Not that it was having a…
pauldmccarthy Sep 23, 2017
dc07879
DOC+RF: Adjust docs for ArrayProxy keep_file_open flag, and for files…
pauldmccarthy Sep 24, 2017
67405b0
TEST: Small re-arrangements and code clean-up
pauldmccarthy Sep 24, 2017
790c037
RF: openers._gzip_open does not bother checking whether it is given o…
pauldmccarthy Sep 24, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ cache:
env:
global:
- DEPENDS="six numpy scipy matplotlib h5py pillow"
- OPTIONAL_DEPENDS=""
- PYDICOM=1
- INSTALL_TYPE="setup"
- EXTRA_WHEELS="https://5cf40426d9f06eb7461d-6fe47d9331aba7cd62fc36c7196769e4.ssl.cf2.rackcdn.com"
Expand Down Expand Up @@ -84,6 +85,13 @@ matrix:
- python: 3.5
env:
- DOC_DOC_TEST=1
# Run tests with indexed_gzip present
- python: 2.7
env:
- OPTIONAL_DEPENDS="indexed_gzip"
- python: 3.5
env:
- OPTIONAL_DEPENDS="indexed_gzip"
before_install:
- source tools/travis_tools.sh
- python -m pip install --upgrade pip
Expand All @@ -93,7 +101,7 @@ before_install:
- python --version # just to check
- pip install -U pip wheel # needed at one point
- retry pip install nose flake8 mock # always
- pip install $EXTRA_PIP_FLAGS $DEPENDS
- pip install $EXTRA_PIP_FLAGS $DEPENDS $OPTIONAL_DEPENDS
# pydicom <= 0.9.8 doesn't install on python 3
- if [ "${TRAVIS_PYTHON_VERSION:0:1}" == "2" ]; then
if [ "$PYDICOM" == "1" ]; then
Expand Down
39 changes: 33 additions & 6 deletions nibabel/analyze.py
Original file line number Diff line number Diff line change
Expand Up @@ -934,8 +934,8 @@ def set_data_dtype(self, dtype):

@classmethod
@kw_only_meth(1)
def from_file_map(klass, file_map, mmap=True):
''' class method to create image from mapping in `file_map ``
def from_file_map(klass, file_map, mmap=True, keep_file_open=None):
'''class method to create image from mapping in `file_map ``

Parameters
----------
Expand All @@ -950,6 +950,19 @@ def from_file_map(klass, file_map, mmap=True):
`mmap` value of True gives the same behavior as ``mmap='c'``. If
image data file cannot be memory-mapped, ignore `mmap` value and
read array from file.
keep_file_open : { None, 'auto', True, False }, optional, keyword only
`keep_file_open` controls whether a new file handle is created
every time the image is accessed, or a single file handle is
created and used for the lifetime of this ``ArrayProxy``. If
``True``, a single file handle is created and used. If ``False``,
a new file handle is created every time the image is accessed. If
``'auto'``, and the optional ``indexed_gzip`` dependency is
present, a single file handle is created and persisted. If
``indexed_gzip`` is not available, behaviour is the same as if
``keep_file_open is False``. If ``file_map`` refers to an open
file handle, this setting has no effect. The default value
(``None``) will result in the value of
``nibabel.arrayproxy.KEEP_FILE_OPEN_DEFAULT`` being used.

Returns
-------
Expand All @@ -964,7 +977,8 @@ def from_file_map(klass, file_map, mmap=True):
imgf = img_fh.fileobj
if imgf is None:
imgf = img_fh.filename
data = klass.ImageArrayProxy(imgf, hdr_copy, mmap=mmap)
data = klass.ImageArrayProxy(imgf, hdr_copy, mmap=mmap,
keep_file_open=keep_file_open)
# Initialize without affine to allow header to pass through unmodified
img = klass(data, None, header, file_map=file_map)
# set affine from header though
Expand All @@ -976,8 +990,8 @@ def from_file_map(klass, file_map, mmap=True):

@classmethod
@kw_only_meth(1)
def from_filename(klass, filename, mmap=True):
''' class method to create image from filename `filename`
def from_filename(klass, filename, mmap=True, keep_file_open=None):
'''class method to create image from filename `filename`

Parameters
----------
Expand All @@ -990,6 +1004,18 @@ def from_filename(klass, filename, mmap=True):
`mmap` value of True gives the same behavior as ``mmap='c'``. If
image data file cannot be memory-mapped, ignore `mmap` value and
read array from file.
keep_file_open : { None, 'auto', True, False }, optional, keyword only
`keep_file_open` controls whether a new file handle is created
every time the image is accessed, or a single file handle is
created and used for the lifetime of this ``ArrayProxy``. If
``True``, a single file handle is created and used. If ``False``,
a new file handle is created every time the image is accessed. If
``'auto'``, and the optional ``indexed_gzip`` dependency is
present, a single file handle is created and persisted. If
``indexed_gzip`` is not available, behaviour is the same as if
``keep_file_open is False``. The default value (``None``) will
result in the value of
``nibabel.arrayproxy.KEEP_FILE_OPEN_DEFAULT`` being used.

Returns
-------
Expand All @@ -998,7 +1024,8 @@ def from_filename(klass, filename, mmap=True):
if mmap not in (True, False, 'c', 'r'):
raise ValueError("mmap should be one of {True, False, 'c', 'r'}")
file_map = klass.filespec_to_file_map(filename)
return klass.from_file_map(file_map, mmap=mmap)
return klass.from_file_map(file_map, mmap=mmap,
keep_file_open=keep_file_open)

load = from_filename

Expand Down
139 changes: 128 additions & 11 deletions nibabel/arrayproxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,34 @@

See :mod:`nibabel.tests.test_proxy_api` for proxy API conformance checks.
"""
from contextlib import contextmanager
from threading import RLock

import numpy as np

from .deprecated import deprecate_with_version
from .volumeutils import array_from_file, apply_read_scaling
from .fileslice import fileslice
from .keywordonly import kw_only_meth
from .openers import ImageOpener
from .openers import ImageOpener, HAVE_INDEXED_GZIP


"""This flag controls whether a new file handle is created every time an image
is accessed through an ``ArrayProxy``, or a single file handle is created and
used for the lifetime of the ``ArrayProxy``. It should be set to one of
``True``, ``False``, or ``'auto'``.

If ``True``, a single file handle is created and used. If ``False``, a new
file handle is created every time the image is accessed. If ``'auto'``, and
the optional ``indexed_gzip`` dependency is present, a single file handle is
created and persisted. If ``indexed_gzip`` is not available, behaviour is the
same as if ``keep_file_open is False``.

If this is set to any other value, attempts to create an ``ArrayProxy`` without
specifying the ``keep_file_open`` flag will result in a ``ValueError`` being
raised.
"""
KEEP_FILE_OPEN_DEFAULT = False


class ArrayProxy(object):
Expand Down Expand Up @@ -69,8 +90,8 @@ class ArrayProxy(object):
_header = None

@kw_only_meth(2)
def __init__(self, file_like, spec, mmap=True):
""" Initialize array proxy instance
def __init__(self, file_like, spec, mmap=True, keep_file_open=None):
"""Initialize array proxy instance

Parameters
----------
Expand Down Expand Up @@ -99,8 +120,18 @@ def __init__(self, file_like, spec, mmap=True):
True gives the same behavior as ``mmap='c'``. If `file_like`
cannot be memory-mapped, ignore `mmap` value and read array from
file.
scaling : {'fp', 'dv'}, optional, keyword only
Type of scaling to use - see header ``get_data_scaling`` method.
keep_file_open : { None, 'auto', True, False }, optional, keyword only
`keep_file_open` controls whether a new file handle is created
every time the image is accessed, or a single file handle is
created and used for the lifetime of this ``ArrayProxy``. If
``True``, a single file handle is created and used. If ``False``,
a new file handle is created every time the image is accessed. If
``'auto'``, and the optional ``indexed_gzip`` dependency is
present, a single file handle is created and persisted. If
``indexed_gzip`` is not available, behaviour is the same as if
``keep_file_open is False``. If ``file_like`` is an open file
handle, this setting has no effect. The default value (``None``)
will result in the value of ``KEEP_FILE_OPEN_DEFAULT`` being used.
"""
if mmap not in (True, False, 'c', 'r'):
raise ValueError("mmap should be one of {True, False, 'c', 'r'}")
Expand All @@ -125,6 +156,70 @@ def __init__(self, file_like, spec, mmap=True):
# Permit any specifier that can be interpreted as a numpy dtype
self._dtype = np.dtype(self._dtype)
self._mmap = mmap
self._keep_file_open = self._should_keep_file_open(file_like,
keep_file_open)
self._lock = RLock()

def __del__(self):
"""If this ``ArrayProxy`` was created with ``keep_file_open=True``,
the open file object is closed if necessary.
"""
if hasattr(self, '_opener') and not self._opener.closed:
self._opener.close_if_mine()
self._opener = None

def __getstate__(self):
"""Returns the state of this ``ArrayProxy`` during pickling. """
state = self.__dict__.copy()
state.pop('_lock', None)
return state

def __setstate__(self, state):
"""Sets the state of this ``ArrayProxy`` during unpickling. """
self.__dict__.update(state)
self._lock = RLock()

def _should_keep_file_open(self, file_like, keep_file_open):
"""Called by ``__init__``, and used to determine the final value of
``keep_file_open``.

The return value is derived from these rules:

- If ``file_like`` is a file(-like) object, ``False`` is returned.
Otherwise, ``file_like`` is assumed to be a file name.
- if ``file_like`` ends with ``'gz'``, and the ``indexed_gzip``
library is available, ``True`` is returned.
- Otherwise, ``False`` is returned.

Parameters
----------

file_like : object
File-like object or filename, as passed to ``__init__``.
keep_file_open : { 'auto', True, False }
Flag as passed to ``__init__``.

Returns
-------

The value of ``keep_file_open`` that will be used by this
``ArrayProxy``.
"""
if keep_file_open is None:
keep_file_open = KEEP_FILE_OPEN_DEFAULT
# if keep_file_open is True/False, we do what the user wants us to do
if isinstance(keep_file_open, bool):
return keep_file_open
if keep_file_open != 'auto':
raise ValueError('keep_file_open should be one of {None, '
'\'auto\', True, False}')

# file_like is a handle - keep_file_open is irrelevant
if hasattr(file_like, 'read') and hasattr(file_like, 'seek'):
return False
# Otherwise, if file_like is gzipped, and we have_indexed_gzip, we set
# keep_file_open to True, else we set it to False
return HAVE_INDEXED_GZIP and file_like.endswith('gz')

@property
@deprecate_with_version('ArrayProxy.header deprecated', '2.2', '3.0')
Expand Down Expand Up @@ -155,12 +250,33 @@ def inter(self):
def is_proxy(self):
return True

@contextmanager
def _get_fileobj(self):
"""Create and return a new ``ImageOpener``, or return an existing one.

The specific behaviour depends on the value of the ``keep_file_open``
flag that was passed to ``__init__``.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line.

Yields
------
ImageOpener
A newly created ``ImageOpener`` instance, or an existing one,
which provides access to the file.
"""
if self._keep_file_open:
if not hasattr(self, '_opener'):
self._opener = ImageOpener(self.file_like)
yield self._opener
else:
with ImageOpener(self.file_like) as opener:
yield opener

def get_unscaled(self):
''' Read of data from file
""" Read of data from file

This is an optional part of the proxy API
'''
with ImageOpener(self.file_like) as fileobj:
"""
with self._get_fileobj() as fileobj, self._lock:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking array_from_file would take a lock and block its critical section. Do you have a sense over what kind of difference in slowdown could result from blocking the whole call instead? If it's trivial, then this is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, given that the array_from_file function reads an entire image, so where the lock occurs is not going to make much of a difference to performance. In fact, I am wondering whether this locking should occur in the ArrayProxy at all, or if it should be left to the responsibility of calling code.

This situation is a bit different to the locking needed in the read_segments function, which can't easily be managed by calling code ... unless we delegate responsibilty entirely to the calling code, and allow a Lock to be passed into ArrayProxy.__init__?

raw_data = array_from_file(self._shape,
self._dtype,
fileobj,
Expand All @@ -175,18 +291,19 @@ def __array__(self):
return apply_read_scaling(raw_data, self._slope, self._inter)

def __getitem__(self, slicer):
with ImageOpener(self.file_like) as fileobj:
with self._get_fileobj() as fileobj:
raw_data = fileslice(fileobj,
slicer,
self._shape,
self._dtype,
self._offset,
order=self.order)
order=self.order,
lock=self._lock)
# Upcast as necessary for big slopes, intercepts
return apply_read_scaling(raw_data, self._slope, self._inter)

def reshape(self, shape):
''' Return an ArrayProxy with a new shape, without modifying data '''
""" Return an ArrayProxy with a new shape, without modifying data """
size = np.prod(self._shape)

# Calculate new shape if not fully specified
Expand Down
Loading