Skip to content

connections: do not modify selector during iteration #313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 24, 2020

Conversation

liamstask
Copy link
Contributor

@liamstask liamstask commented Aug 13, 2020

❓ What kind of change does this PR introduce?

  • 🐞 bug fix
  • 🐣 feature
  • πŸ“‹ docs update
  • πŸ“‹ tests/coverage improvement
  • πŸ“‹ refactoring
  • πŸ’₯ other

πŸ“‹ What is the related issue number (starting with #)

Fixes #312

❓ What is the current behavior? (You can also link to an open issue here)

The selector map is modified during iteration, raising an error.

❓ What is the new behavior (if this is a feature change)?

Create a temporary list of the items to remove, then remove them after iterating the selector map.

πŸ“‹ Other information:

πŸ“‹ Checklist:

  • I think the code is well written
  • I wrote good commit messages
  • I have squashed related commits together after the changes have been approved
  • Unit tests for the changes exist
  • Integration tests for the changes exist (if applicable)
  • I used the same coding conventions as the rest of the project
  • The new code doesn't generate linter offenses
  • Documentation reflects the changes
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences

This change is Reviewable

@webknjaz
Copy link
Member

@Safihre mind confirming that this change fixes your issue?

@webknjaz
Copy link
Member

@Safihre also, it'd be great if we could come up with a stable reproducer that could be integrated into the test suite as a regression test.

@Safihre
Copy link
Contributor

Safihre commented Aug 13, 2020

It did not work:

Traceback (most recent call last):
  File "C:\Users\safv\Documents\GitHub\sabnzbd3\venv\lib\site-packages\cheroot\server.py", line 1797, in serve
    self.tick()
  File "C:\Users\safv\Documents\GitHub\sabnzbd3\venv\lib\site-packages\cheroot\server.py", line 2032, in tick
    self.connections.expire()
  File "C:\Users\safv\Documents\GitHub\sabnzbd3\venv\lib\site-packages\cheroot\connections.py", line 109, in expire
    for sock_fd, conn in timed_out_connections:
  File "C:\Users\safv\Documents\GitHub\sabnzbd3\venv\lib\site-packages\cheroot\connections.py", line 103, in <genexpr>
    timed_out_connections = (
  File "C:\Users\safv\AppData\Local\Programs\Python\Python38\lib\_collections_abc.py", line 743, in __iter__
    for key in self._mapping:
RuntimeError: dictionary changed size during iteration

I double checked the files, and it does have the copy.copy changes.
The reproducer is simply clicking around a bit and wait for the periodic check. Will see if I can make cheroot only reproducer.

@Safihre
Copy link
Contributor

Safihre commented Aug 13, 2020

I can reproduce with:

from cheroot import wsgi
def my_crazy_app(environ, start_response):
    status = '200 OK'
    response_headers = [('Content-type','text/plain')]
    start_response(status, response_headers)
    return [b'Hello world!']

addr = '0.0.0.0', 8070
server = wsgi.Server(addr, my_crazy_app)
server.start()

I open localhost:8070 in the browser, wait a few seconds and then it happens.

@webknjaz
Copy link
Member

@Safihre is it windows-specific?

@webknjaz
Copy link
Member

@Safihre it'd be great to have something in the form of a test that pytest would run. But since the CIs currently cannot reproduce this, I think there may be some extra preconditions to it.

@OneCDOnly
Copy link

OneCDOnly commented Aug 13, 2020

@Safihre is it windows-specific?

@webknjaz no, seeing the same thing in Linux:

2020-08-13 12:04:08,114::ERROR::[_cplogging:213] [13/Aug/2020:12:04:08] ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
  File "/opt/lib/python3.8/site-packages/cheroot/server.py", line 1795, in serve
    self.tick()
  File "/opt/lib/python3.8/site-packages/cheroot/server.py", line 2030, in tick
    self.connections.expire()
  File "/opt/lib/python3.8/site-packages/cheroot/connections.py", line 107, in expire
    for sock_fd, conn in timed_out_connections:
  File "/opt/lib/python3.8/site-packages/cheroot/connections.py", line 102, in <genexpr>
    timed_out_connections = (
  File "/opt/lib/python3.8/_collections_abc.py", line 743, in __iter__
RuntimeError: dictionary changed size during iteration

@liamstask
Copy link
Contributor Author

if you're able to test again, i've pushed a change that does not copy the map, but instead creates a temporary list of the items to be removed, such that the selector is not modified during iteration.

@Safihre
Copy link
Contributor

Safihre commented Aug 13, 2020

Tested, and this one works! πŸ‘
Have not been able to verify if it was Windows only today.

@webknjaz
Copy link
Member

@liamstask weird, it seems like a band-aid because the list creation still iterates over the items... It just does it faster.

@webknjaz
Copy link
Member

Oh, you've also found where you modify the map. That's better. So I guess it's reasonable to revert it to use a generator comprehension, then.

@webknjaz
Copy link
Member

@liamstask do you think you could come up with a regression test for this?

@liamstask
Copy link
Contributor Author

i cannot commit to a specific timeline, but i can try to take a look

@liamstask liamstask changed the title connections: copy selector map before modifying connections: do not modify selector during iteration Aug 13, 2020
@Safihre
Copy link
Contributor

Safihre commented Aug 14, 2020

I also noticed a new traceback with the new code when exiting the server with some requests still going:

2020-08-14 14:44:01,587::INFO::[_cplogging:213] [14/Aug/2020:14:44:01] ENGINE Bus STOPPING
2020-08-14 14:44:01,801::INFO::[_cplogging:213] [14/Aug/2020:14:44:01] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('127.0.0.1', 8081)) shut down
2020-08-14 14:44:01,801::INFO::[_cplogging:213] [14/Aug/2020:14:44:01] ENGINE Bus STOPPED
2020-08-14 14:44:01,801::INFO::[_cplogging:213] [14/Aug/2020:14:44:01] ENGINE Bus EXITING
2020-08-14 14:44:01,801::INFO::[_cplogging:213] [14/Aug/2020:14:44:01] ENGINE Bus EXITED
2020-08-14 14:44:01,805::ERROR::[_cplogging:213] [14/Aug/2020:14:44:01] ENGINE TypeError("object of type 'NoneType' has no len()")
Traceback (most recent call last):
  File "c:\users\safv\documents\github\sabnzbd3\venv\lib\site-packages\cheroot\server.py", line 1283, in communicate
    req.respond()
  File "c:\users\safv\documents\github\sabnzbd3\venv\lib\site-packages\cheroot\server.py", line 1073, in respond
    self.server.gateway(self).respond()
  File "c:\users\safv\documents\github\sabnzbd3\venv\lib\site-packages\cheroot\wsgi.py", line 145, in respond
    self.write(chunk)
  File "c:\users\safv\documents\github\sabnzbd3\venv\lib\site-packages\cheroot\wsgi.py", line 229, in write
    self.req.ensure_headers_sent()
  File "c:\users\safv\documents\github\sabnzbd3\venv\lib\site-packages\cheroot\server.py", line 1120, in ensure_headers_sent
    self.send_headers()
  File "c:\users\safv\documents\github\sabnzbd3\venv\lib\site-packages\cheroot\server.py", line 1165, in send_headers
    can_keep = self.server.connections.can_add_keepalive_connection
  File "c:\users\safv\documents\github\sabnzbd3\venv\lib\site-packages\cheroot\connections.py", line 292, in can_add_keepalive_connection
    return ka_limit is None or self._num_connections < ka_limit
  File "c:\users\safv\documents\github\sabnzbd3\venv\lib\site-packages\cheroot\connections.py", line 286, in _num_connections
    return len(self._readable_conns) + len(self._selector.get_map()) - 1
TypeError: object of type 'NoneType' has no len()

@liamstask
Copy link
Contributor Author

that appears to be due to the server using connections after it has already been closed - pushed a small change to address (would also benefit from a regression test case).

@Safihre
Copy link
Contributor

Safihre commented Aug 15, 2020

@liamstask Tested the new version and it seems to not show any tracebacks.

@Safihre
Copy link
Contributor

Safihre commented Aug 16, 2020

Just got a new one with the latest code, during regular serving:

Ignore!

@webknjaz
Copy link
Member

@Safihre this traceback looks like #313 (comment) but there's no generator expression in that place anymore. Are you sure you've checked out the latest incarnation of this branch?

P.S. Pro tip: you can mark your code blocks with python-traceback and they'll show up highlighted.

@webknjaz
Copy link
Member

webknjaz commented Aug 16, 2020

@liamstask I think I'll wait until there's some tests added. No rush here: I'll be on vacation this week and will probably be unable to do proper reviews anyway. Also, the CI is half-red so that needs to be fixed too. It looks like these failures were caused by the updates around pytest, this should be fixed now.

@Safihre
Copy link
Contributor

Safihre commented Aug 16, 2020

@webknjaz You seem to be right! Seems my pip-install went wrong.

@@ -1162,7 +1162,10 @@ def send_headers(self): # noqa: C901 # FIXME
# Override the decision to not close the connection if the connection
# manager doesn't have space for it.
if not self.close_connection:
can_keep = self.server.connections.can_add_keepalive_connection
can_keep = (
self.server.ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe embed this into connections.can_add_keepalive_connection property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in c94eaab i instead made connections a 'private' member of server, since it should never be accessed after server.stop() has been called.

@webknjaz
Copy link
Member

@liamstask should I expect you to add the test this weekend? If not, I'll just merge this right away and hope that the test gets added in some follow-up PR.

@saidgadjiev
Copy link

When it will be fixed?

@webknjaz
Copy link
Member

When it will be fixed?

Once this PR is merged and a new tag is cut.

@cyraxjoe
Copy link
Contributor

My understanding is that the current preferences is to copy the list instead of using a lock, shouldn't be preferred to make it thread safe explicitly with a lock? Is it because of performance reasons?

The class is directly manipulated from the worker threads. Is not a matter of internal consistency, the ConnectionManager is concurrently used by all the worker threads.

@cyraxjoe
Copy link
Contributor

Well... after a brief read of the selectors module implementation, it seems that indeed, copying is the alternative, because register and unregister calls modify the map. Any call of those methods while looping over get_map().items() is a potential problem.

@webknjaz webknjaz added the bug Something is broken label Aug 22, 2020
cyraxjoe added a commit to cyraxjoe/cheroot that referenced this pull request Aug 22, 2020
The new wrappers keeps an additional copy of the selected
sockets, the external copy allow is to concurrently
iterate and modify the list of the selected sockets.

In addition, fixes an issue when the server is never notified
that is not serving the the process gets interrupted inside
a `safe_start` call and `stop` gets stuck in a while loop,
waiting to be ready to exit gracefully.

Fixes issue cherrypy#313.
@liamstask
Copy link
Contributor Author

@webknjaz i looked a bit into adding a test, and part of the issue is that tracebacks like this are not caught in the existing tests because server.serve() catches Exception and just writes it to error_log() which the tests have no visibility into.

i experimented briefly with allowing server.error_log() to write to an io.StringIO() which can be set by testing.cheroot_server() which creates the server used in many tests. this successfully captures the errors, but that's not really the right place to be asserting on test-specific behavior, so would likely need further plumbing to be done in an acceptable way.

can you comment on how you'd like to expose exceptions thrown in server.serve() to tests so these errors aren't swallowed silently in the future?

cyraxjoe added a commit to cyraxjoe/cheroot that referenced this pull request Aug 23, 2020
…tests.

Using the new error_log mock class inspect the value during the keep
alive tests, previously ignoring the log caused the regression of
issue cherrypy#313.
@cyraxjoe
Copy link
Contributor

cyraxjoe commented Aug 23, 2020

@liamstask I implemented something to verify the error log calls in master...cyraxjoe:verify-server-error-log-test-fix
In addition I have an alternative fix for the problem at cyraxjoe@518eace, but I'm not very convinced that we are properly using the selectors module, it feels like it's going against this type of use case.

Those two branches are also integrated in other branch: master...cyraxjoe:full-fix-for-issue-313

Also from the performance perspective, did we standardized on something to profile the performance? Given that these changes were originally made with performance in mind.

Any thoughts @jaraco @webknjaz ?

@webknjaz
Copy link
Member

@liamstask I currently don't have any strong opinion on this. Ideally, logging across the whole project should be refactored to use stdlib logging and then, it'd be easier to assert thing with pytest's caplog, for example. But for right now, Joel's implementation should be enough.

@cyraxjoe I think this PR fixes the immediate problem that is unrelated to threading: changing the dict in a for-loop while iterating it. And so I'd like to merge it as-is and explore any possible improvements later. atm I'm not sure if locking is necessary and I'd like to see the suggested changes as PRs because it's easier to add inline comments to diffs this way.

So my plan is as follows:

  1. Merge this PR + release it
  2. Wait for the PR with tests (by Joel?) and merge it
  3. Iterate on any other improvement PRs

cyraxjoe added a commit to cyraxjoe/cheroot that referenced this pull request Aug 23, 2020
…tests.

Using the new error_log mock class inspect the value during the keep
alive tests, previously ignoring the log caused the regression of
issue cherrypy#313.
@cyraxjoe cyraxjoe mentioned this pull request Aug 23, 2020
15 tasks
@cyraxjoe
Copy link
Contributor

PR with the regression tests #316

@liamstask
Copy link
Contributor Author

@cyraxjoe thanks, that makes sense to me.

@webknjaz i added one more commit to address one of your previous comments, related to protecting access of server.connections - i generalized that a bit and made it internal to server altogether. if that looks ok to you, i can squash all these commits prior to merging.

@webknjaz
Copy link
Member

It's alright to keep the commits separate. In fact, I was about to merge this yesterday but got distracted.

@webknjaz webknjaz merged commit 7644616 into cherrypy:master Aug 24, 2020
@liamstask liamstask deleted the issue312-map-iter branch August 24, 2020 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken
Projects
None yet
Development

Successfully merging this pull request may close these issues.

timed_out_connections - dictionary changed size during iteration
6 participants