Skip to content

FIX memoryview leaks and retrofit memory-manager as context-managers #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
2347f3d
feat(enc): test with unicode tmpdir
ankostis Oct 23, 2016
883217b
fix(leaks): attempt to plug the leaks & filter dead regions
ankostis Oct 24, 2016
43c5f23
chore(ver): bump 2.0.1-->2.1.0.dev0
ankostis Oct 24, 2016
f10196f
fix(regs): fix/rename scream_if_closed()-->collect_closed_regions()
ankostis Oct 24, 2016
88e2769
chore(ver): bump 2.0.1-->2.1.0.dev1
ankostis Oct 24, 2016
133dd1c
style(listuple): pep8, literals for empty lists/tuples
ankostis Oct 24, 2016
7c1eac7
refact(region): rename offset `_b --> _ofs`
ankostis Oct 24, 2016
bba086a
refact(minor): use region.priv-func, close fd on same condition
ankostis Oct 25, 2016
a2bc2d2
feat(mman): BREAKING API `mman` as context-manager to release regions
ankostis Oct 25, 2016
01df7f3
chore(ver): bump 2.1.0.dev1-->2.1.0.dev3
ankostis Oct 25, 2016
bf68f77
feat(mman-contxt): opt-out not to scream if mman not entered
ankostis Oct 25, 2016
4598966
fix(leaks): attempt to gc-collect before region-collect
ankostis Oct 25, 2016
d0bd74e
fix(mman): exit log-msg were missing left-overs arg, log as debug
ankostis Oct 25, 2016
e33235a
refact(buf): also use SlidingWindowMapBuffer as optional context-manager
ankostis Oct 25, 2016
33f12e6
refact(TCs): unittestize assertions
ankostis Oct 25, 2016
8489c31
refact(buf): simplify API - no begin/end after construct
ankostis Oct 25, 2016
d81dc1d
style(mman): move managed_mmaps() closer to 2 mmans
ankostis Oct 25, 2016
9ba1649
fix(leaks): FIX memoryview leak in Windows
ankostis Oct 27, 2016
144891b
chore(ver): bump 2.1.0.dev3-->2.1.0.dev4
ankostis Oct 27, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
feat(mman): BREAKING API mman as context-manager to release regions
+ Add PY3 compat utilities
+ doc(changes, tutorial): update on mman usage
  • Loading branch information
ankostis committed Oct 25, 2016
commit a2bc2d2e983e34b576f87127a531fdf978ecc322
36 changes: 24 additions & 12 deletions doc/source/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,46 @@
Changelog
#########

**********
2.1.0
======

* **BREAKING API:** etrofit ``git.util.mman`` as context-manager,
to release memory-mapped regions held.

The *mmap-manager(s)* are re-entrant, but not thread-safe **context-manager(s)**,
to be used within a ``with ...:`` block, ensuring any left-overs cursors are cleaned up.
If not entered, :meth:`StaticWindowMapManager.make_cursor()` and/or
:meth:`WindowCursor.use_region()` will scream.

Get them from ``smmap.managed_mmaps()``.

v0.9.0
**********
========
- Fixed issue with resources never being freed as mmaps were never closed.
- Client counting is now done manually, instead of relying on pyton's reference count

**********

v0.8.5
**********
========
- Fixed Python 3.0-3.3 regression, which also causes smmap to become about 3 times slower depending on the code path. It's related to this bug (http://bugs.python.org/issue15958), which was fixed in python 3.4

**********

v0.8.4
**********
========
- Fixed Python 3 performance regression

**********

v0.8.3
**********
========
- Cleaned up code and assured it works sufficiently well with python 3

**********

v0.8.1
**********
========
- A single bugfix

**********

v0.8.0
**********
========

- Initial Release
148 changes: 85 additions & 63 deletions doc/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,91 +5,111 @@ Usage Guide
###########
This text briefly introduces you to the basic design decisions and accompanying classes.

******
Design
******
Per application, there is *MemoryManager* which is held as static instance and used throughout the application. It can be configured to keep your resources within certain limits.
======
Per application, there must be a *MemoryManager* to be used throughout the application.
It can be configured to keep your resources within certain limits.

To access mapped regions, you require a cursor. Cursors point to exactly one file and serve as handles into it. As long as it exists, the respective memory region will remain available.
To access mapped regions, you require a cursor. Cursors point to exactly one file and serve as handles into it.
As long as it exists, the respective memory region will remain available.

For convenience, a buffer implementation is provided which handles cursors and resource allocation
behind its simple buffer like interface.

For convenience, a buffer implementation is provided which handles cursors and resource allocation behind its simple buffer like interface.

***************
Memory Managers
***************
There are two types of memory managers, one uses *static* windows, the other one uses *sliding* windows. A window is a region of a file mapped into memory. Although the names might be somewhat misleading as technically windows are always static, the *sliding* version will allocate relatively small windows whereas the *static* version will always map the whole file.
================
There are two types of memory managers, one uses *static* windows, the other one uses *sliding* windows.
A window is a region of a file mapped into memory. Although the names might be somewhat misleading,
as technically windows are always static, the *sliding* version will allocate relatively small windows
whereas the *static* version will always map the whole file.

The *static* memory-manager does nothing more than keeping a client count on the respective memory maps
which always map the whole file, which allows to make some assumptions that can lead to simplified
data access and increased performance, but reduces the compatibility to 32 bit systems or giant files.

The *sliding* memory-manager therefore should be the default manager when preparing an application
for handling huge amounts of data on 32 bit and 64 bit platforms

The *static* manager does nothing more than keeping a client count on the respective memory maps which always map the whole file, which allows to make some assumptions that can lead to simplified data access and increased performance, but reduces the compatibility to 32 bit systems or giant files.
.. Note::
The *mmap-manager(s)* are re-entrant, but not thread-safe **context-manager(s)**,
to be used within a ``with ...:`` block, ensuring any left-overs cursors are cleaned up.
If not entered, :meth:`StaticWindowMapManager.make_cursor()` and/or
:meth:`WindowCursor.use_region()` will scream.

The *sliding* memory manager therefore should be the default manager when preparing an application for handling huge amounts of data on 32 bit and 64 bit platforms::

Use the :math:`smmap.managed_mmaps()` to take care of all this::

import smmap
# This instance should be globally available in your application
# It is configured to be well suitable for 32-bit or 64 bit applications.
mman = smmap.SlidingWindowMapManager()
with smmap.managed_mmaps() as mman:

# the manager provides much useful information about its current state
# like the amount of open file handles or the amount of mapped memory
mman.num_file_handles()
mman.mapped_memory_size()
# and many more ...
# the manager provides much useful information about its current state
# like the amount of open file handles or the amount of mapped memory
mman.num_file_handles()
mman.mapped_memory_size()
# and many more ...


Cursors
*******
========
*Cursors* are handles that point onto a window, i.e. a region of a file mapped into memory. From them you may obtain a buffer through which the data of that window can actually be accessed::

import smmap.test.lib
fc = smmap.test.lib.FileCreator(1024*1024*8, "test_file")

# obtain a cursor to access some file.
c = mman.make_cursor(fc.path)

# the cursor is now associated with the file, but not yet usable
assert c.is_associated()
assert not c.is_valid()

# before you can use the cursor, you have to specify a window you want to
# access. The following just says you want as much data as possible starting
# from offset 0.
# To be sure your region could be mapped, query for validity
assert c.use_region().is_valid() # use_region returns self

# once a region was mapped, you must query its dimension regularly
# to assure you don't try to access its buffer out of its bounds
assert c.size()
c.buffer()[0] # first byte
c.buffer()[1:10] # first 9 bytes
c.buffer()[c.size()-1] # last byte

# its recommended not to create big slices when feeding the buffer
# into consumers (e.g. struct or zlib).
# Instead, either give the buffer directly, or use pythons buffer command.
buffer(c.buffer(), 1, 9) # first 9 bytes without copying them

# you can query absolute offsets, and check whether an offset is included
# in the cursor's data.
assert c.ofs_begin() < c.ofs_end()
assert c.includes_ofs(100)

# If you are over out of bounds with one of your region requests, the
# cursor will be come invalid. It cannot be used in that state
assert not c.use_region(fc.size, 100).is_valid()
# map as much as possible after skipping the first 100 bytes
assert c.use_region(100).is_valid()

# You can explicitly free cursor resources by unusing the cursor's region
c.unuse_region()
assert not c.is_valid()

with smmap.managed_mmaps() as mman:
fc = smmap.test.lib.FileCreator(1024*1024*8, "test_file")

# obtain a cursor to access some file.
c = mman.make_cursor(fc.path)

# the cursor is now associated with the file, but not yet usable
assert c.is_associated()
assert not c.is_valid()

# before you can use the cursor, you have to specify a window you want to
# access. The following just says you want as much data as possible starting
# from offset 0.
# To be sure your region could be mapped, query for validity
assert c.use_region().is_valid() # use_region returns self

# once a region was mapped, you must query its dimension regularly
# to assure you don't try to access its buffer out of its bounds
assert c.size()
c.buffer()[0] # first byte
c.buffer()[1:10] # first 9 bytes
c.buffer()[c.size()-1] # last byte

# its recommended not to create big slices when feeding the buffer
# into consumers (e.g. struct or zlib).
# Instead, either give the buffer directly, or use pythons buffer command.
buffer(c.buffer(), 1, 9) # first 9 bytes without copying them

# you can query absolute offsets, and check whether an offset is included
# in the cursor's data.
assert c.ofs_begin() < c.ofs_end()
assert c.includes_ofs(100)

# If you are over out of bounds with one of your region requests, the
# cursor will be come invalid. It cannot be used in that state
assert not c.use_region(fc.size, 100).is_valid()
# map as much as possible after skipping the first 100 bytes
assert c.use_region(100).is_valid()

# You can explicitly free cursor resources by unusing the cursor's region
c.unuse_region()
assert not c.is_valid()


Now you would have to write your algorithms around this interface to properly slide through huge amounts of data.

Alternatively you can use a convenience interface.

*******

========
Buffers
*******
========
To make first use easier, at the expense of performance, there is a Buffer implementation which uses a cursor underneath.

With it, you can access all data in a possibly huge file without having to take care of setting the cursor to different regions yourself::
Expand All @@ -112,7 +132,9 @@ With it, you can access all data in a possibly huge file without having to take

# it will stop using resources automatically once it goes out of scope

Disadvantages
*************
Buffers cannot be used in place of strings or maps, hence you have to slice them to have valid input for the sorts of struct and zlib. A slice means a lot of data handling overhead which makes buffers slower compared to using cursors directly.
Disadvantages
--------------
Buffers cannot be used in place of strings or maps, hence you have to slice them to have valid
input for the sorts of struct and zlib.
A slice means a lot of data handling overhead which makes buffers slower compared to using cursors directly.

84 changes: 71 additions & 13 deletions smmap/mman.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
"""Module containing a memory memory manager which provides a sliding window on a number of memory mapped files"""
from functools import reduce
import logging
import sys

from .util import (
PY3,
MapWindow,
MapRegion,
MapRegionList,
Expand All @@ -8,15 +13,29 @@
buffer,
)

import sys
from functools import reduce

__all__ = ["StaticWindowMapManager", "SlidingWindowMapManager", "WindowCursor"]
__all__ = ['managed_mmaps', "StaticWindowMapManager", "SlidingWindowMapManager", "WindowCursor"]
#{ Utilities

log = logging.getLogger(__name__)
#}END utilities


def managed_mmaps():
"""Makes a memory-map context-manager instance for the correct python-version.

:return: either :class:`SlidingWindowMapManager` or :class:`StaticWindowMapManager` (if PY2)

If you want to change the default parameters of these classes, use them directly.

.. Tip::
Use it in a ``with ...:`` block, to free cached (and unused) resources.

"""
mman = SlidingWindowMapManager if PY3 else StaticWindowMapManager

return mman()


class WindowCursor(object):

"""
Expand All @@ -25,9 +44,15 @@ class WindowCursor(object):

Cursors should not be created manually, but are instead returned by the SlidingWindowMapManager

**Note:**: The current implementation is suited for static and sliding window managers, but it also means
that it must be suited for the somewhat quite different sliding manager. It could be improved, but
I see no real need to do so."""
.. Tip::
This is a re-entrant, but not thread-safe context-manager, to be used within a ``with ...:`` block,
to ensure any left-overs cursors are cleaned up. If not entered, :meth:`use_region()``
will scream.

.. Note::
The current implementation is suited for static and sliding window managers,
but it also means that it must be suited for the somewhat quite different sliding manager.
It could be improved, but I see no real need to do so."""
__slots__ = (
'_manager', # the manger keeping all file regions
'_rlist', # a regions list with regions for our file
Expand Down Expand Up @@ -110,6 +135,10 @@ def use_region(self, offset=0, size=0, flags=0):

**Note:**: The size actually mapped may be smaller than the given size. If that is the case,
either the file has reached its end, or the map was created between two existing regions"""
if self._manager._entered <= 0:
raise ValueError('Context-manager %s not entered for %s!' %
(self._manager, self))

need_region = True
man = self._manager
fsize = self._rlist.file_size()
Expand Down Expand Up @@ -243,15 +272,23 @@ class StaticWindowMapManager(object):
These clients would have to use a SlidingWindowMapBuffer to hide this fact.

This type will always use a maximum window size, and optimize certain methods to
accommodate this fact"""
accommodate this fact

.. Tip::
The *memory-managers* are re-entrant, but not thread-safe context-manager(s),
to be used within a ``with ...:`` block, ensuring any left-overs cursors are cleaned up.
If not entered, :meth:`make_cursor()` and/or :meth:`WindowCursor.use_region()` will scream.

"""

__slots__ = [
'_fdict', # mapping of path -> StorageHelper (of some kind
'_window_size', # maximum size of a window
'_max_memory_size', # maximum amount of memory we may allocate
'_max_handle_count', # maximum amount of handles to keep open
'_memory_size', # currently allocated memory size
'_fdict', # mapping of path -> StorageHelper (of some kind
'_window_size', # maximum size of a window
'_max_memory_size', # maximum amount of memory we may allocate
'_max_handle_count', # maximum amount of handles to keep open
'_memory_size', # currently allocated memory size
'_handle_count', # amount of currently allocated file handles
'_entered', # updated on enter/exit, when 0, `close()`
]

#{ Configuration
Expand Down Expand Up @@ -280,6 +317,7 @@ def __init__(self, window_size=0, max_memory_size=0, max_open_handles=sys.maxsiz
self._max_handle_count = max_open_handles
self._memory_size = 0
self._handle_count = 0
self._entered = 0

if window_size < 0:
coeff = 64
Expand All @@ -297,6 +335,23 @@ def __init__(self, window_size=0, max_memory_size=0, max_open_handles=sys.maxsiz
self._max_memory_size = coeff * self._MB_in_bytes
# END handle max memory size

def __enter__(self):
assert self._entered >= 0, self._entered
self._entered += 1

return self

def __exit__(self, exc_type, exc_value, traceback):
assert self._entered > 0, self._entered
self._entered -= 1
if self._entered == 0:
leaft_overs = self.collect()
if leaft_overs:
log.warning("Cleaned up %s left-over mmap-regions.")

def close(self):
self.collect()

#{ Internal Methods

def _collect_lru_region(self, size):
Expand Down Expand Up @@ -399,6 +454,9 @@ def make_cursor(self, path_or_fd):

**Note:** Using file descriptors directly is faster once new windows are mapped as it
prevents the file to be opened again just for the purpose of mapping it."""
if self._entered <= 0:
raise ValueError('Context-manager %s not entered!' % self)

regions = self._fdict.get(path_or_fd)
if regions:
assert not regions.collect_closed_regions(), regions.collect_closed_regions()
Expand Down
Loading