Skip to content

retry on 500s #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

klimburg
Copy link

@klimburg klimburg commented Nov 2, 2023

We noticed that occasionally we would see database deadlocks on report_session_end if a send_keepalive was also sent at roughly the same time. This results in the report_session_end post raising on a 500.

This adds server_error or status code 500 to the backslash client's retry list.

@ayalash
Copy link
Collaborator

ayalash commented Nov 5, 2023

@klimburg Do you have more information.
500 (internal server error) should not be retried, it should be fixed in server side.

@klimburg
Copy link
Author

klimburg commented Nov 6, 2023

Yea so agree this needs to be fixed server side. If you want I can open an issue over in https://github.com/getslash/backslash to discuss options server-side.

I think (but definitely not sure) what is happening here is that both send_keepalive and report_session_end UPDATE the session table so if the background send_keepalive POST is sent at roughly the same time as report_session_end and the two requests are picked up by different worker processes of the app you can get a deadlock because they both are trying to modify the same row of the session table.

I think one option would be to just retry the db transaction server side when deadlocks are detected.

My rationale for retrying on the client side was just how painful it can be to have a slash process fail even though the test session itself passed. Given that not all the api requests are idempotent I get the desire not to generally retry on 500s.

Logs

current-nginx-1               | 172.31.15.10 - - [06/Nov/2023:07:59:49 +0000] "POST /api/report_test_end HTTP/1.1" 200 16 0.737 0.738 .
current-python-backend-api-1  | [2023-11-06 07:59:51 +0000] [29] [ERROR] Error handling request /api/report_session_end
current-python-backend-api-1  | Traceback (most recent call last):
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1235, in _execute_context
current-python-backend-api-1  |     self.dialect.do_execute(
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute
current-python-backend-api-1  |     cursor.execute(statement, parameters)
current-python-backend-api-1  | psycopg2.errors.DeadlockDetected: deadlock detected
current-python-backend-api-1  | DETAIL:  Process 467 waits for ShareLock on transaction 13479356; blocked by process 466.
current-python-backend-api-1  | Process 466 waits for ShareLock on transaction 13479357; blocked by process 467.
current-python-backend-api-1  | HINT:  See server log for query details.
current-python-backend-api-1  | CONTEXT:  while updating tuple (6962,38) in relation "session"
current-python-backend-api-1  | The above exception was the direct cause of the following exception:
current-python-backend-api-1  | Traceback (most recent call last):
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 136, in handle
current-python-backend-api-1  |     self.handle_request(listener, req, client, addr)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 179, in handle_request
current-python-backend-api-1  |     respiter = self.wsgi(environ, resp.start_response)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/app.py", line 2464, in __call__
current-python-backend-api-1  |     return self.wsgi_app(environ, start_response)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/raven/middleware.py", line 100, in __call__
current-python-backend-api-1  |     iterable = self.application(environ, start_response)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/werkzeug/middleware/proxy_fix.py", line 169, in __call__
current-python-backend-api-1  |     return self.app(environ, start_response)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/app.py", line 2450, in wsgi_app
current-python-backend-api-1  |     response = self.handle_exception(e)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask_restful/__init__.py", line 271, in error_router
current-python-backend-api-1  |     return original_handler(e)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/app.py", line 1867, in handle_exception
current-python-backend-api-1  |     reraise(exc_type, exc_value, tb)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/_compat.py", line 39, in reraise
current-python-backend-api-1  |     raise value
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/app.py", line 2447, in wsgi_app
current-python-backend-api-1  |     response = self.full_dispatch_request()
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/app.py", line 1952, in full_dispatch_request
current-python-backend-api-1  |     rv = self.handle_user_exception(e)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask_restful/__init__.py", line 271, in error_router
current-python-backend-api-1  |     return original_handler(e)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/app.py", line 1821, in handle_user_exception
current-python-backend-api-1  |     reraise(exc_type, exc_value, tb)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/_compat.py", line 39, in reraise
current-python-backend-api-1  |     raise value
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/app.py", line 1950, in full_dispatch_request
current-python-backend-api-1  |     rv = self.dispatch_request()
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask/app.py", line 1936, in dispatch_request
current-python-backend-api-1  |     return self.view_functions[rule.endpoint](**req.view_args)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/flask_simple_api/simple_api.py", line 44, in new_func
current-python-backend-api-1  |     returned = func(**kwargs)
current-python-backend-api-1  |   File "/src/flask_app/utils/api_utils.py", line 56, in new_func
current-python-backend-api-1  |     return func(*args, **kwargs)
current-python-backend-api-1  |   File "/src/flask_app/activity.py", line 25, in new_func
current-python-backend-api-1  |     return func(*args, **kwargs)
current-python-backend-api-1  |   File "/src/flask_app/utils/api_utils.py", line 20, in new_func
current-python-backend-api-1  |     returned = func(*args, **kwargs)
current-python-backend-api-1  |   File "/src/flask_app/blueprints/api/sessions.py", line 138, in report_session_end
current-python-backend-api-1  |     db.session.commit()
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/scoping.py", line 162, in do
current-python-backend-api-1  |     return getattr(self.registry(), name)(*args, **kwargs)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1023, in commit
current-python-backend-api-1  |     self.transaction.commit()
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 487, in commit
current-python-backend-api-1  |     self._prepare_impl()
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 466, in _prepare_impl
current-python-backend-api-1  |     self.session.flush()
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 2446, in flush
current-python-backend-api-1  |     self._flush(objects)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 2584, in _flush
current-python-backend-api-1  |     transaction.rollback(_capture_exception=True)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 67, in __exit__
current-python-backend-api-1  |     compat.reraise(exc_type, exc_value, exc_tb)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 277, in reraise
current-python-backend-api-1  |     raise value
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 2544, in _flush
current-python-backend-api-1  |     flush_context.execute()
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 416, in execute
current-python-backend-api-1  |     rec.execute(self)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 580, in execute
current-python-backend-api-1  |     persistence.save_obj(
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 230, in save_obj
current-python-backend-api-1  |     _emit_update_statements(
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 975, in _emit_update_statements
current-python-backend-api-1  |     c = cached_connections[connection].execute(
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 980, in execute
current-python-backend-api-1  |     return meth(self, multiparams, params)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 273, in _execute_on_connection
current-python-backend-api-1  |     return connection._execute_clauseelement(self, multiparams, params)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1093, in _execute_clauseelement
current-python-backend-api-1  |     ret = self._execute_context(
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1239, in _execute_context
current-nginx-1               | 172.31.15.10 - - [06/Nov/2023:07:59:51 +0000] "POST /api/report_session_end HTTP/1.1" 500 141 1.033 1.033 .
current-python-backend-api-1  |     self._handle_dbapi_exception(
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1458, in _handle_dbapi_exception
current-python-backend-api-1  |     util.raise_from_cause(sqlalchemy_exception, exc_info)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause
current-python-backend-api-1  |     reraise(type(exception), exception, tb=exc_tb, cause=cause)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 276, in reraise
current-python-backend-api-1  |     raise value.with_traceback(tb)
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1235, in _execute_context
current-python-backend-api-1  |     self.dialect.do_execute(
current-python-backend-api-1  |   File "/root/.local/share/virtualenvs/src-NVTF7jWz/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute
current-python-backend-api-1  |     cursor.execute(statement, parameters)
current-python-backend-api-1  | sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
current-python-backend-api-1  | DETAIL:  Process 467 waits for ShareLock on transaction 13479356; blocked by process 466.
current-python-backend-api-1  | Process 466 waits for ShareLock on transaction 13479357; blocked by process 467.
current-python-backend-api-1  | HINT:  See server log for query details.
current-python-backend-api-1  | CONTEXT:  while updating tuple (6962,38) in relation "session"
current-nginx-1               | 172.31.15.10 - - [06/Nov/2023:07:59:51 +0000] "POST /api/send_keepalive HTTP/1.1" 200 16 1.174 1.174 .
current-python-backend-api-1  |  [SQL: 'UPDATE session SET timespan=%(timespan)s, updated_at=%(updated_at)s, end_time=%(end_time)s, status=%(status)s, delete_at=%(delete_at)s WHERE session.id = %(session_id)s'] [parameters: {'timespan': DateTimeTZRange(datetime.datetime(2023, 11, 6, 7, 52, 26, 75104), datetime.datetime(2023, 11, 6, 7, 59, 50, 376080), '[)'), 'updated_at': datetime.datetime(2023, 11, 6, 7, 59, 50, 379782), 'end_time': 1699257590.3760798, 'status': 'SUCCESS', 'delete_at': 1701849590.376117, 'session_id': 219544}] (Background on this error at: http://sqlalche.me/e/e3q8)

@ayalash
Copy link
Collaborator

ayalash commented Nov 10, 2023

@klimburg I understand the reason that you opened it here, but you should not add 500 (internal server error) to retry. It should be fixed in server side.
I've opened an issue on the server repository (linked above), And I'm rejecting this PR.
Sorry and thanks for reporting on this bug.

@ayalash ayalash closed this Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants