Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-17852: Maintain a list of BufferedWriter objects. Flush them on exit. #1908

Merged
merged 5 commits into from
Sep 5, 2017

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Jun 1, 2017

In Python 3, the buffer and the underlying file object are separate
and so the order in which objects are finalized matters. This is
unlike Python 2 where the file and buffer were a single object and
finalization was done for both at the same time. In Python 3, if
the file is finalized and closed before the buffer then the data in
the buffer is lost.

This change adds a doubly linked list of open file buffers. An atexit
hook ensures they are flushed before proceeding with interpreter
shutdown. This is addition does not remove the need to properly close
files as there are other reasons why buffered data could get lost during
finalization.

Initial patch by Armin Rigo.

https://bugs.python.org/issue17852

Lib/_pyio.py Outdated

import atexit, weakref

_all_writers = weakref.WeakKeyDictionary()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have WeakSet now :-)


/* a doubly-linked chained list of "buffered" objects that need to
be flushed when the process exits */
struct doubly_linked_s buffered_writers_list;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested struct makes for quirky pointer calculation in _PyIO_atexit_flush(). Why not simply:

typedef struct _buffered {
    // ...
    struct _buffered *next, *prev;
} buffered;

offsetof(buffered, buffered_writers_list));
remove_from_linked_list(buf);
buffered_flush(buf, NULL);
PyErr_Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than silence errors, they can be displayed using PyErr_WriteUnraisable.

@nascheme
Copy link
Member Author

nascheme commented Jun 5, 2017

Antoine, thanks for taking the time for the review. I have modified the code as you suggest.

@nascheme
Copy link
Member Author

nascheme commented Jun 5, 2017

I have left PyErr_Clear(). Using PyErr_WriteUnraisable might be better except that we could get some mysterious errors where previously there were none. The unit test suite generates extra errors if we remove the PyErr_Clear() call.

$ ./python  ../Lib/test/regrtest.py test_io
Run tests sequentially
0:00:00 load avg: 0.18 [1/1] test_io
Exception ignored in: <_io.BufferedWriter name='<stderr>'>
Exception ignored in: <_io.BufferedWriter name='<stdout>'>
test test_io failed -- Traceback (most recent call last):
  File "/home/nas/src/cpython/Lib/test/test_io.py", line 3232, in test_create_at_shutdown_with_encoding
    self.assertFalse(err)
AssertionError: b"Exception ignored in: <_io.BufferedWriter name='<stderr>'>\nException ignored in: <_io.BufferedWriter name='<stdout>'>" is not false

test_io failed in 35 sec
```

Lib/_pyio.py Outdated
# finalized before the buffered writer wrapping it then any buffered
# data will be lost.
for w in _all_writers:
w.flush()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should ignore errors, as the C version does?

Copy link
Member Author

@nascheme nascheme Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be wise. flush() probably should not raise an error but if it does then not catching it would be bad. Do you think just try/except (bare) and just swallow it? I think that would be okay because we really don't expect flush() to raise an error and if it does, there is nothing much to do about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common case for a flush error is a broken pipe or disk full. Try writing to /dev/full. It would be nice if this indicated an error, but I suspect it would be hard to do (related: https://bugs.python.org/issue5319).

In Python 3, the buffer and the underlying file object are separate
and so the order in which objects are finalized matters.  This is
unlike Python 2 where the file and buffer were a single object and
finalization was done for both at the same time.  In Python 3, if
the file is finalized and closed before the buffer then the data in
the buffer is lost.

This change adds a doubly linked list of open file buffers.  An atexit
hook ensures they are flushed before proceeding with interpreter
shutdown.  This is addition does not remove the need to properly close
files as there are other reasons why buffered data could get lost during
finalization.

Initial patch by Armin Rigo.
@nascheme nascheme merged commit e38d12e into python:master Sep 5, 2017
nascheme added a commit that referenced this pull request Sep 5, 2017
nascheme added a commit that referenced this pull request Sep 5, 2017
jimmylai pushed a commit to jimmylai/cpython that referenced this pull request Sep 5, 2017
* 'master' of https://github.com/python/cpython: (32 commits)
  Conceptually, roots is a set.  Also searching it as a set is a tiny bit faster (python#3338)
  bpo-31343: Include sys/sysmacros.h (python#3318)
  bpo-30102: Call OPENSSL_add_all_algorithms_noconf (python#3112)
  Prevent a few make suspicious warnings. (python#3341)
  Include additional changes to support blurbified NEWS (python#3340)
  Simplify NEWS entry to prevent suspicious warnings. (python#3339)
  bpo-31347: _PyObject_FastCall_Prepend: do not call memcpy if args might not be null (python#3329)
  Revert "bpo-17852: Maintain a list of BufferedWriter objects.  Flush them on exit. (python#1908)" (python#3337)
  bpo-17852: Maintain a list of BufferedWriter objects.  Flush them on exit. (python#1908)
  Fix terminology in comment and add more design rationale. (python#3335)
  Add comment to explain the implications of not sorting keywords (python#3331)
  bpo-31170: Update libexpat from 2.2.3 to 2.2.4 (python#3315)
  bpo-28411: Remove "modules" field from Py_InterpreterState. (python#1638)
  random_triangular:  sqrt() is more accurate than **0.5 (python#3317)
  Travis: use ccache (python#3307)
  remove IRIX support (closes bpo-31341) (python#3310)
  Code clean-up.  Remove unnecessary pre-increment before the loop starts. (python#3312)
  Regen Moduls/clinic/_ssl.c.h (pythonGH-3320)
  bpo-30502: Fix handling of long oids in ssl. (python#2909)
  Cache externals, depending on changes to PCbuild (python#3308)
  ...
GadgetSteve pushed a commit to GadgetSteve/cpython that referenced this pull request Sep 10, 2017
…xit. (python#1908)

* Maintain a list of BufferedWriter objects.  Flush them on exit.

In Python 3, the buffer and the underlying file object are separate
and so the order in which objects are finalized matters.  This is
unlike Python 2 where the file and buffer were a single object and
finalization was done for both at the same time.  In Python 3, if
the file is finalized and closed before the buffer then the data in
the buffer is lost.

This change adds a doubly linked list of open file buffers.  An atexit
hook ensures they are flushed before proceeding with interpreter
shutdown.  This is addition does not remove the need to properly close
files as there are other reasons why buffered data could get lost during
finalization.

Initial patch by Armin Rigo.

* Use weakref.WeakSet instead of WeakKeyDictionary.

* Simplify buffered double-linked list types.

* In _flush_all_writers(), suppress errors from flush().

* Remove NEWS entry, use blurb.
GadgetSteve pushed a commit to GadgetSteve/cpython that referenced this pull request Sep 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants