Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly import eventlet to prevent threads blocking each other #131

Merged
merged 3 commits into from
Jun 13, 2024

Conversation

lathiat
Copy link
Contributor

@lathiat lathiat commented Jun 11, 2024

Currently, slow running OpenStack API Requests (either stuck connecting
or still waiting for the actual response) from the periodic DataGatherer
task will block HTTPServer connections from being processed. Blocked
HTTPServer connections will also block both other connections and the
DataGatherer task.

Observed Symptoms:

  • Slow or failed prometheus requests
  • Statistics not being updated as often as you would expect
  • HTTP 500 responses and BrokenPipeError tracebacks being logged due to
    later trying to respond to prometheus clients which timed out and
    disconnected the socket
  • Hitting the forked process limit

This happens because in the current code, we are intending to use the
eventlet library for asynchronous non-blocking I/O, but we are not using
it correctly. All code within the main application and all imported
dependencies must import the special eventlet "green" versions of many
python libraries (e.g. socket, time, threading, SimpleHTTPServer, etc)
which yield to other green threads when they would have blocked waiting
for I/O or to sleep. Currently this does not always happen.

Fix this by importing eventlet and using eventlet.patcher.monkey_patch()
before importing any other modules. This will automatically intercept
all future imports (including those inside dependencies) and
automatically load the green versions of relevant libraries.

Documentation on correctly import eventlet can be found here:
https://eventlet.readthedocs.io/en/latest/patching.html

A detailed and comprehensive analysis of the issue and multiple previous
attempts to fix it can be found in Issue #130. If you intend to make
further related changes to the use of eventlet, threads or forked
processes please read the detailed history lesson available there.

aieri
aieri previously approved these changes Jun 11, 2024
Copy link

@aieri aieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contribution!

Copy link
Contributor

@samuelallan72 samuelallan72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great thanks, a well researched patch. I've just flagged an issue with the fix for the lint, and some confirmation questions about the changes throughout.

prometheus-openstack-exporter Outdated Show resolved Hide resolved
prometheus-openstack-exporter Show resolved Hide resolved
prometheus-openstack-exporter Show resolved Hide resolved
Copy link
Contributor

@samuelallan72 samuelallan72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and extra information about the changes! Looks good to me. 😁

Newer versions of pycodestyle started erroring on comparing type(obj),
switch syntax to avoid the error.
Add additional debug-level logging for the progress of the statistics
gathering, so we can see each time the refresh starts and finishes as
well as how long each specific check took.
Currently, slow running OpenStack API Requests (either stuck connecting
or still waiting for the actual response) from the periodic DataGatherer
task will block HTTPServer connections from being processed. Blocked
HTTPServer connections will also block both other connections and the
DataGatherer task.

Observed Symptoms:
- Slow or failed prometheus requests
- Statistics not being updated as often as you would expect
- HTTP 500 responses and BrokenPipeError tracebacks being logged due to
  later trying to respond to prometheus clients which timed out and
  disconnected the socket
- Hitting the forked process limit

This happens because in the current code, we are intending to use the
eventlet library for asynchronous non-blocking I/O, but we are not using
it correctly. All code within the main application and all imported
dependencies must import the special eventlet "green" versions of many
python libraries (e.g. socket, time, threading, SimpleHTTPServer, etc)
which yield to other green threads when they would have blocked waiting
for I/O or to sleep. Currently this does not always happen.

Fix this by importing eventlet and using eventlet.patcher.monkey_patch()
before importing any other modules. This will automatically intercept
all future imports (including those inside dependencies) and
automatically load the green versions of relevant libraries.

Documentation on correctly import eventlet can be found here:
https://eventlet.readthedocs.io/en/latest/patching.html

A detailed and comprehensive analysis of the issue and multiple previous
attempts to fix it can be found in Issue canonical#130. If you intend to make
further related changes to the use of eventlet, threads or forked
processes please read the detailed history lesson available there.

Fixes: canonical#130, canonical#126, canonical#124, canonical#116, canonical#115, canonical#112
@lathiat
Copy link
Contributor Author

lathiat commented Jun 12, 2024

Pushed new otherwise unchanged commits which are signed

@samuelallan72 samuelallan72 requested a review from aieri June 12, 2024 06:55
@samuelallan72 samuelallan72 merged commit 76500ab into canonical:main Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants