-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Connection refused" when using keep-alive with gthread #1698
Comments
My guess: [NOT VALID - see comment below]
|
…th keep-alive Fixes benoitc#1698
NOTE: couldn't reproduce after introduced patch: #1699 |
I'm reading this and trying to reproduce and I wonder if there is not actually a bug here. The code for I'm okay with making a change, but I want to check that our expectations are aligned. I don't think anything we do can guarantee that requests will not close with this error. Finally, I have not yet reproduced the issue locally, but I think your analysis is correct. |
@tilgovi you are right, Gunicorn may not eliminate possibility that client sends request just before This is very rare error, in our prod we want to replace CherryPy with Gunicorn and we have around ten services and we are running stress tests each night and it is happening only few times per month. Our tests are configured to handle each error (in prod request will be repeated and that error woudn't be visible). Still this is bug in gthread and should be fixed. Maybe |
I doubt that the keepalive setting will have any impact there until the OS in instructed to do it though. In HTTP keepalive it's expected to do the next request really quick. Gunicorn doesn't do any TCP keep alive (per design as it makes hard to detect client disconnections/reset). I'm not quite sure the timing of events you describe is true though. A connection is added or removed from |
I didn't say about TCP keep alive, unfortunately both are named in the same way, I'm always referring to HTTP connection keep-alive. My test executes series of queries without waiting for anything and it should not receive 'Connection refused' within less then 30 seconds... I've analyzed a bit deeper and probably have to confirm your doubts as the only function that is called from other thread then MainThread is |
I am using your script to test, and I saw the large keepalive. Again, I don't disagree with your analysis at all, I just want to set expectations for anyone reading this thread that such errors cannot be completely eliminated. I will look at the code some more and figure out whether your patch or some other change makes sense. |
Is this operator used anywhere? |
Good find. I was testing on Python 3. I will look later today. |
@tilgovi this test take ~30 seconds, for that test I can use 60 seconds if you wish. I've found the root cause, please read my previous comment. If you want to reproduce it you must use python2 (this bug is happening only on python2) |
…th keep-alive Fixes benoitc#1698
…th keep-alive Fixes benoitc#1698
|
note: i will also run a test using a client different than requests like curl, for me ab with the |
…th keep-alive (benoitc#1699) Fixes benoitc#1698
Gunicorn version: 19.7.1 (also tried master branch)
This bug could be reproduced with attached files:
Test case starts command:
with logs redirected to
/tmp/_test_gunicorn.out
and then create separate threads to open http session and send 1000 requests within it.
In my case gunicorn server received request
/request/1096
and then reset connection (seetcp.stream eq 10
from attachedgunicorn_reset_keep_alived_connection.pcapng
NOTE: this is race condition so it may happen that all requests finish with success (see myapp.py -> without
time.sleep
it almost never fails) but with thattime.sleep
it usually fails and then script should output line similar to:[1] Failed request id=1096 with ('Connection aborted.', error(104, 'Connection reset by peer'))
it mean that thread sent request
GET /request/1096
and receivedConnection reset by peer
(so the last one request from that session that succeed isGET /request/1095
).The text was updated successfully, but these errors were encountered: