Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error_already_set::what() is now constructed lazily #1895

Merged
merged 136 commits into from
Jun 2, 2022

Conversation

superbobry
Copy link
Contributor

@superbobry superbobry commented Aug 28, 2019

The original motivation for this PR was a performance optimization. See the original PR description below.

After nearly 3 years, work on this PR has led to a few related but distinct behavior changes:

This PR:

  • Secondary errors during normalization of the Python error are clearly reported (see test_flaky_exception_failure_point_init in test_exceptions.py).
  • Secondary errors obtaining the error message are clearly reported (see test_flaky_exception_failure_point_str in test_exceptions.py).
  • Calling error_already_set::restore() multiple times is clearly reported (preempts accidentally clearing an error, or accidentally restoring the same error twice; see test_error_already_set_double_restore in test_exceptions.py).
  • The error_already_set copy constructor is no longer racy.
  • detail::error_string() no longer restores the error message. This fixes (in practice inconsequential) bugs in detail/class.h.

The performance gain for exceptions with long tracebacks can be very substantial. See #1895 (comment) for concrete numbers.

A review of the error_already_set development history is under #1895 (comment).


Original PR description below

Prior to this PR throwing error_already_set was expensive due to the eager construction of the error string. The PR changes error_already_set::what to construct the error string on first call.

Before:

1.7 µs ± 9.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

After:

1.14 µs ± 1.93 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Benchmark:

const py::int_ foo() {
    py::dict d;
    try {
        return d["foo"];
    } catch (const py::error_already_set &) {
        return py::int_(42);
    }    
}

PYBIND11_MODULE(foo, m) {
  m.def("foo", &foo);
}

Note that the running time of the benchmark is dominated by stack unwinding, therefore the speedup is only ~30%.

Suggested changelog entry:

``error_already_set`` is now safer and more performant, especially for exceptions with long tracebacks.

Prior to this commit throwing error_already_set was expensive due to the
eager construction of the error string (which required traversing the
Python stack). See pybind#1853 for more context and an alternative take on the
issue.

Note that error_already_set no longer inherits from std::runtime_error
because the latter has no default constructor.
@wjakob
Copy link
Member

wjakob commented Aug 28, 2019

This looks like a really nice idea! For posterity, can you discuss a little bit what your use case was? (Presumably code that eagerly throws lots of exceptions, etc.) What kind of speedup does this patch yield?

I'll let you look into the TODO that is mentioned in your code, that seems potentially a dangerous aspect. Note also that your patch doesn't seem compatible with PyPy in its current iteration.

@wjakob
Copy link
Member

wjakob commented Aug 28, 2019

Note that I also like the solution in #1853. Are you suggesting that only one of those two PRs is merged?

This is not supported on PyPy-2.7 5.8.0.
@mosra
Copy link
Contributor

mosra commented Aug 28, 2019

Are you suggesting that only one of those two PRs is merged?

Why not both? :) This one could provide a nice "do the same, but faster" solution that doesn't require much effort from the user (only switching to a combination of PyError_SetString() + throw py::error_already_set instead of throwing a concrete exception), while #1853 requires deeper changes (but with more significant perf impact) where this really matters. But, check the benchmarks below.

The remaining unsolved problem with #1853 is that functions have to either return a holder type holding nullptr (instead of the held type directly, which is a minor inconvenience) or, in case of builtin types, return either the type py::cast()'d or a default-constructed py::object (which might require deeper changes in user code). For builtin types that means the reported signature loses its return type information, which isn't great for documentation. @wjakob, I'm pretty new to the internals so maybe there's a solution that doesn't involve returning a typeless py::object?

I tried this PR and here's the continuation of benchmarks from #1853 -- first, throwing the concrete exception (original numbers from that PR):

Vector3()[0] # doesn't throw                                         0.97913 µs
Vector3()[3] # throws                                                5.78696 µs
raise IndexError() # throws                                          0.15339 µs

Then, using py::error_already_set, with current master:

Vector3()[0] # doesn't throw                                         0.68927 µs
Vector3()[3] # throws                                                3.07559 µs
raise IndexError() # throws                                          0.12918 µs

With this PR:

Vector3()[0] # doesn't throw                                         0.69361 µs
Vector3()[3] # throws                                                2.93022 µs
raise IndexError() # throws                                          0.12839 µs

And with #1853, for comparison (original numbers from that PR):

Vector3()[0] # doesn't throw                                         0.68562 µs
Vector3()[3] # throws                                                0.84481 µs
raise IndexError() # throws                                          0.17830 µs

As I see it, this PR introduces a measurable speed improvement, but it's rather insignificant compared to the impact of #1853 ... not sure what's to blame, I guess the overhead of C++'s throw really is that big? Or maybe my case is special, @superbobry could you benchmark as well?

@superbobry
Copy link
Contributor Author

superbobry commented Aug 28, 2019

For posterity, can you discuss a little bit what your use case was?

Just to make the intention here clear, the use-case I'm optimizing for is handling CPython API exceptions in C++, e.g.

try {
    v = dict["foo"];
catch (const py::error_already_set&) {
    v = 42;
}

which was unnecessarily expensive prior to this PR due to the detail::error_string call.

Unlike #1853 this PR does not aim optimize the translation of a C++ exception to a Python one. In fact, any speedup there is either noise or accidental.

@sizmailov
Copy link
Contributor

Note that error_already_set no longer inherits from std::runtime_error

Isn't it a breaking change?

@superbobry
Copy link
Contributor Author

I did a quick flamegraph of a variation of the above example at this PR, and the conclusion is that the time of test::foo (see below) is dominated by stack unwinding when an exception is thrown.

namespace test {
  void foo() {
    py::dict d{};
    py::int_ v;
    while (true) {
      try { v = py::cast<py::int_>(d["foo"]); }
      catch (const py::error_already_set&) { v = py::int_(42); }
    }
  }
}

image

This makes me wonder if there is a possible design change we could do to get rid of exceptions entirely (see #1703) as they lead to inefficient code (and nobody wants their C++ code to go slow, especially due to exceptions)? The simplest (and the most invasive) thing which comes to mind is to move the exception to the return type in the spirit of std::expected (or Either from Haskell). Wdyt @wjakob?

@superbobry
Copy link
Contributor Author

@sizmailov yes, that change is breaking. I am actually not sure if it's possible to lazy-init the error and keep std::runtime_error as the base class. Additionally, I'm not sure how pybind11 would handle std::exception subclass if it bubbles up.

@superbobry
Copy link
Contributor Author

For comparison, here's a flamegraph for the same snippet at current master; detail::error_string takes ~24% of the time.

image

@sizmailov
Copy link
Contributor

sizmailov commented Aug 28, 2019

@superbobry std::runtime_error can be constructed with dummy string (e.g. "Unknown internal error occurred" or "") to avoid base class change. You override what() anyway, so the what_arg of std::runtime_error will be unused.

Lazyness of what() effectively makes it non-nonexcept, and sudden std::terminate will happen on exception. An extra try-catch block should save from this extremely rare cases:

virtual const char* what() const noexcept {
    try{
        if (m_lazy_what.empty()) {
            if (m_type || m_value || m_trace)
               PyErr_NormalizeException(&m_type.ptr(), &m_value.ptr(), &m_trace.ptr());
            m_lazy_what = detail::error_string(m_type.ptr(), m_value.ptr(), m_trace.ptr());
        }
        return m_lazy_what.c_str();
    } catch(...){
        return "Unknown internal error occurred during exception handling"; 
    }
}

This is faster than dynamically looking up __name__ via GetAttrString.
Note though that the runtime of the code throwing an error_already_set
will be dominated by stack unwinding so the improvement will not be
noticeable.

Before:

396 ns ± 0.913 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

After:

277 ns ± 0.549 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Benchmark:

const std::string foo() {
    PyErr_SetString(PyExc_KeyError, "");
    const std::string &s = py::detail::error_string();
    PyErr_Clear();
    return s;
}

PYBIND11_MODULE(foo, m) {
    m.def("foo", &::foo);
}
@superbobry
Copy link
Contributor Author

Thanks @sizmailov. I've updated the PR with your suggestions.

The implementation of __name__ is slightly more complex than that.
It handles the module name prefix, and heap-allocated types. We could
port it to pybind11 later on but for now it seems like an overkill.

This reverts commit f1435c7.
@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Aug 29, 2019

Nice! Good idea, @superbobry!

One thing I've noticed: since you're now calling PyErr_NormalizeException during what (or restore), the trio m_type, m_value, m_trace is potentially different depending on whether you've called what or not. Is it an expensive call? Otherwise, I'd argue to put it in the constructor?

Answering your TODO:

// TODO(superbobry): is it safe to assume that exception has been
// normalized by the caller?

I'd argue it is, if you write a line of documentation for the function?

And just for the fun of stringing C++ keyword, you can still append override to virtual const char* what() const noexcept? I didn't realize that that you could and were overriding std::runtime_error::what, at first.

Apart from that, is there no way we can nót initialize/store the string in the std::runtime_error base class? That's a minor detail, ofc, but it feels a waste to initialize and store that string without using it.

@superbobry
Copy link
Contributor Author

Thanks for the review @YannickJadoul!

Looking at the implementation of PyErr_NormalizeException, it could be expensive, although I suspect that such cases are rare. I still prefer to have it in ::what() because we only normalize the exception to construct a sensible error string. If the exception ends up bubbling up to CPython it will be normalized by the interpreter anyway.

I've also tried passing a reference to m_lazy_what to the std::runtime_exception constructor and then modifying m_lazy_what in-place in ::what(). Is that what you were referring to in your comment? It does look a bit fragile (and impl dependent) and in fact crashed various pybind11 tests on my machine.

Note that detail::error_string() no longer calls PyException_SetTraceback
as it is unncessary for pretty-printing the exception.
@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Aug 29, 2019

Looking at the implementation of PyErr_NormalizeException, it could be expensive, although I suspect that such cases are rare. I still prefer to have it in ::what() because we only normalize the exception to construct a sensible error string. If the exception ends up bubbling up to CPython it will be normalized by the interpreter anyway.

Hmmm, indeed; more expensive than I expected, probably/possibly. However, I still don't like the idea that auto x = e.value(); e.what(); auto y = e.value(); might result in x and y being different

Is there a way to also do this lazily (or maybe evaluate what() in the type, value, trace getters), or would that add so much bloat that it would defeat the purpose of the PR?

I've also tried passing a reference to m_lazy_what to the std::runtime_exception constructor and then modifying m_lazy_what in-place in ::what(). Is that what you were referring to in your comment? It does look a bit fragile (and impl dependent) and in fact crashed various pybind11 tests on my machine.

Myeah, I don't know if this is possible. I just noticed that there are now two strings inside of std::exception while we really only need one.

EDIT:

Hmmm, indeed; more expensive than I expected, probably/possibly.

But if we're profiling anyway, how can we actually trigger this function to do its work and measure the overhead?

@superbobry
Copy link
Contributor Author

On my machine the cost of calling PyErr_NormalizeException with an unnormalized value is roughly ~100ns.

>>> %timeit bar.noop(42)
111 ns ± 0.215 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit bar.normalize(42)
205 ns ± 0.988 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Where bar is

void noop(const py::int_& v) {
}

void normalize(const py::int_& v) {
    PyObject *type = PyExc_ValueError;
    PyObject *value = v.ptr();
    PyObject *trace = nullptr;
    PyErr_NormalizeException(&type, &value, &trace);
}

PYBIND11_MODULE(bar, m) {
    m.def("noop", &noop);
    m.def("normalize", &normalize);
}

This seems ~insignificant compared to other overheads, so I've moved normalization to the constructor. Another option is to make normalization an impl detail of detail::error_string, but it's a bit more subtle because PyErr_NormalizeException decrefs the type/value and currently neither error_already_set nor error_scope incref them.

rwgk pushed a commit to rwgk/rwgk_tbx that referenced this pull request Jun 2, 2022
…1#1895

```
Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests":
=========================================================== test session starts ============================================================
platform linux -- Python 3.9.12, pytest-6.2.3, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, configfile: pytest.ini
collected 1 item

test_shared_ptr_deleter_gil_acquire.py::test_merry_go_round
Ant 0 1
Bug 0 2
Cat 0 3
Bug 1 3
Ant 1 3
Bug 2 3
Cat 1 3
Bug 3 3
Ant 2 3
Bug 4 3
Bug 5 3
Ant 3 3
Cat done.
Bug 6 2
Bug 7 2
Ant done.
Bug 8 1
Bug 9 1
Bug 10 1
Bug 11 1
Bug done.
About to: delete raw_ptr;
All done: 1.204
PASSED

============================================================ 1 passed in 1.21s =============================================================
```
@rwgk
Copy link
Collaborator

rwgk commented Jun 2, 2022

@jbms I think I figured it out, based on this small experiment: rwgk/rwgk_tbx@e1e2010

Things got significantly simpler, which is usually a good sign. In particular, the custom error_already_set dtor, copy ctor, and move ctor are gone!

All clang sanitizers and valgrind here in the GHA CI are happy. I'll run the Google-global testing with the current state (at commit 53e29f4).

@jbms
Copy link
Contributor

jbms commented Jun 2, 2022

I'm still not sure I understand the motivation for using shared_ptr here. I think it is intended to solve two issues with the copy constrictor:

  • need to acquire the gil to increment the reference counts of the Python objects, which could result in failure in the case of finalization
  • need to copy the cached what message with possibility of failure due to memory exhaustion

However, I expect the copy constructor to only be used in rare cases (calling std::current_exception on Windows) and I think there is a simpler solution:

  • I don't think it is helpful to try to avoid failure from acquiring the gil when copying, since that failure will still occur upon destruction.

  • the cached what string doesn't need to be copied; it could just be recomputed when what is called on the copy.

@rwgk
Copy link
Collaborator

rwgk commented Jun 2, 2022

I'm still not sure I understand the motivation for using shared_ptr here.

It's a bit involved, the primary motivation is to enable this (newly added in commit 80dfaea yesterday):

def test_error_already_set_double_restore():
    m.test_error_already_set_double_restore(True)  # dry_run
    with pytest.raises(RuntimeError) as excinfo:
        m.test_error_already_set_double_restore(False)
    assert str(excinfo.value) == (
        "Internal error: pybind11::detail::error_fetch_and_normalize::restore()"
        " called a second time. ORIGINAL ERROR: ValueError: Random error."
    )

For background, current master does this:

    void restore() {
        PyErr_Restore(m_type.release().ptr(), m_value.release().ptr(), m_trace.release().ptr());
    }

Two lines of thought from here:

1. Aspect

In a 2nd call of restore(), it boils down to PyErr_Restore(nullptr, nullptr, nullptr). According to the Python documentation (confirmed with a quick experiment):

If the objects are NULL, the error indicator is cleared.

My thinking: If restore() is called multiple times, that's almost certainly a coding error (bug). The current behavior masks that. Therefore I decided to throw an exception in my version.

2. Aspect

I.e. it releases the Python references, but that conflicts with the lazy-what feature this PR is about. An earlier version of this PR (at commit e9a2e6d) has this:

    void restore() {
        what(); // Force-build `.what()`.
        if (m_lazy_what.empty()) {
            pybind11_fail("Critical error building lazy error_string().");
        }
        if (m_type) {
            PyErr_Restore(
                m_type.release().ptr(), m_value.release().ptr(), m_trace.release().ptr());
        }
    }

The what() there undoes the performance gain (original motivation for this PR) for a very common (most common? idk) situation.

To not lose the performance gain, I decided to hold on the the original references, landing here (current commit 53e29f4):

    void restore() {
        if (m_restore_called) {
            pybind11_fail("Internal error: pybind11::detail::error_fetch_and_normalize::restore() "
                          "called a second time. ORIGINAL ERROR: "
                          + error_string());
        }
        PyErr_Restore(m_type.inc_ref().ptr(), m_value.inc_ref().ptr(), m_trace.inc_ref().ptr());
        m_restore_called = true;
    }

With that

  • we are safer than before (clear error message if restore() is called multiple times),
  • and we can build what() only when actually needed (rare) even after restore().

For this to work we need the shared_ptr.

Further considerations:

  • I trust that shared_ptr is 100% reliable in combination with threading, and extremely well optimized in general.
  • The error_already_set implementation turns out simpler than before, with the core functionality delegated to a helper struct error_fetch_and_normalize that is neither copyable nor movable, for safety.

@jbms wrote:

need to acquire the gil to increment the reference counts of the Python objects, which could result in failure in the case of finalization

Yes, that was a consideration. Even if rare, I have to think millions of calls per second, 24/7. It will happen. Note that throw e; does use the copy constructor. I see that being used in the wild.

need to copy the cached what message with possibility of failure due to memory exhaustion

That wasn't a consideration: std::string "copy" is simply incrementing a reference count. std::string implements copy-on-write.

@rwgk
Copy link
Collaborator

rwgk commented Jun 2, 2022

I forgot to mention:

I don't think it is helpful to try to avoid failure from acquiring the gil when copying, since that failure will still occur upon destruction.

Yes, that's something we still need to solve (I believe the last hole that needs plugging).

It's nice that we don't have to worry about it anymore for the copy ctor, but that was not the motivation for using shared_ptr.

@@ -12,6 +12,9 @@
#include "detail/common.h"
#include "buffer_info.h"

#include <cstdio>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is stdio still necessary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: no, thanks for pointing it out!

Long answer: commit message of 8e145c0

```
iwyu version: include-what-you-use 0.17 based on Debian clang version 13.0.1-3+build2
```

Command used:

```
iwyu -c -std=c++17 -DPYBIND11_TEST_BOOST -Iinclude/pybind11 -I/usr/include/python3.9 -I/usr/include/eigen3 include/pybind11/pytypes.cpp
```

pytypes.cpp is a temporary file: `#include "pytypes.h"`

The raw output is very long and noisy.

I decided to use `#include <cstddef>` instead of `#include <cstdio>` for `std::size_t` (iwyu sticks to the manual choice).

I ignored all iwyu suggestions that are indirectly covered by `#include <Python.h>`.

I manually verified that all added includes are actually needed.
@rwgk
Copy link
Collaborator

rwgk commented Jun 2, 2022

Google-global testing passed! (For the state at commit 53e29f4, locally merged into the smart_holder branch; the internal test ID is OCL:448346887:BASE:452610608:1654202021385:c5fceac8).

Thanks for the approval @Skylion007! I'll wait for the CI to finish before merging.

In the meantime I asked in an internal chatroom about shared_ptr thread safety. I learned that atomic is used under the hood, but no mutex. Therefore I believe that we don't have to worry about deadlocks when we acquire the GIL in the custom deleter.

@jbms
Copy link
Contributor

jbms commented Jun 2, 2022

std::string is only copy-on-write in pre-c++11 standard library implementations; that is prohibited by c++11. To handle that change glibc has two std::string implementations, but I believe the COW version is not normally used by default anymore.

Regarding restore, I agree that the previous behavior where it was called with nullptr the second time was probably not ideal, though I suspect it didn't cause any problems in practice.

For restore to retain the objects means the destructor is slightly less efficient, but l think it will anyway be negligible compared to the cost of throwing an exception.

If restore does drop the object references, I don't think it is a problem that what doesn't produce a useful message after calling restore, as long as it is documented, since I would expect in that case the exception will almost always be immediately destroyed.

In general it seems to me more natural from an API perspective to either drop the references or retain them, but not add extra logic to prevent calling restore twice, since this exception is already a more advanced interface. I would say if restore drops the references, then if it is called twice, the second call should set some other error rather than clearing the error.

Rethrowing an exception with throw e rather than throw is usually considered bad practice since it may slice the exception type.

@rwgk
Copy link
Collaborator

rwgk commented Jun 2, 2022

std::string is only copy-on-write in pre-c++11 standard library implementations; that is prohibited by c++11. To handle that change glibc has two std::string implementations, but I believe the COW version is not normally used by default anymore.

Thanks, I didn't know about that change pre/post C++11. I'm not sure I like it... :-) But bet there were good reasons.

Regarding restore, I agree that the previous behavior where it was called with nullptr the second time was probably not ideal, though I suspect it didn't cause any problems in practice.

I think of it as a trap that people will fall into at some small-ish rate, wondering how their error got lost, while still developing. I don't think such a bug will survive long in production code, but getting it earlier can save a lot of debugging time.

For restore to retain the objects means the destructor is slightly less efficient, but l think it will anyway be negligible compared to the cost of throwing an exception.

Yes, and it's only a difference between paying sooner (at the time of the restore() call) rather than later (dtor).

If restore does drop the object references, I don't think it is a problem that what doesn't produce a useful message after calling restore, as long as it is documented, since I would expect in that case the exception will almost always be immediately destroyed.

I'm ambiguous about this one: fundamentally I agree, producing something like

The ValueError was processed already (either via restore() or discard_as_unraisable()).

We could easily even keep track of restore() vs discard_as_unraisable().

But it would be an additional behavior change. I think the current PR is a meaningful and fairly large step in a good direction. We can revisit that behavior change separately. (safe gil seems more important)

In general it seems to me more natural from an API perspective to either drop the references or retain them, but not add extra logic to prevent calling restore twice, since this exception is already a more advanced interface. I would say if restore drops the references, then if it is called twice, the second call should set some other error rather than clearing the error.

Without the shared_ptr we cannot reliably detect the multiple restore() situation: there could be copies. I.e. there are actually two issues: 1. accidentally clearing an error, 2. accidentally restoring the same error twice. This implementation prevents both.

Rethrowing an exception with throw e rather than throw is usually considered bad practice since it may slice the exception type.

Absolutely, but our tooling (internal & clang-tidy here) doesn't catch it, therefore it would be a whack-a-mole game trying to squash that bad practice. Even if it's bad practice, I wouldn't want to leave a hole: reference count corruption in large applications tends to be extremely difficult to reproduce and debug.

Sorry, way too many buts here! But if we also fold in safe gil, I believe we'll be in the rock solid state that is most important, then we can refine from there.

@rwgk rwgk merged commit a05bc3d into pybind:master Jun 2, 2022
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Jun 2, 2022
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Jul 9, 2022
@rwgk rwgk mentioned this pull request Apr 27, 2023
EricCousineau-TRI added a commit to EricCousineau-TRI/pybind11 that referenced this pull request Oct 13, 2023
)"

This reverts commit a05bc3d.

# Conflicts:
#	include/pybind11/pytypes.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.