Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

Add keepalive websocket ping #29

Merged
merged 1 commit into from
Apr 8, 2019
Merged

Conversation

esevan
Copy link
Contributor

@esevan esevan commented Apr 4, 2019

When nb2kg server is behind the proxy closing idle stream connection such as nginx and envoy websocket is closed and reconnected.
This behavior updates kernel last_activity so that it keeps kernel_gateway from culling kernels.

This change is from WebsocketMixin in jupyter/notebook.

Signed-off-by: Eunsoo Park esevan.park@gmail.com

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution. I think you're running with an older Tornado. Please update and retest.

def open(self, kernel_id, *args, **kwargs):
"""Handle web socket connection open to notebook server and delegate to gateway web socket handler """
loop = IOLoop.current()

self.ping_callback = PeriodicCallback(self.send_ping, WS_PING_INTERVAL_MS, io_loop=loop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the io_loop parameter. It was removed in Tornado 5.0 and deprecated in 4.1. It currently produces the following repeatedly on systems running Tornado >= 5.0...

[E 07:53:09.523 NotebookApp] Uncaught exception GET /api/kernels/e26308d6-517d-40b3-a5ef-3da204c73923/channels?session_id=70fb168f96ef449a80de44728d1f906d (::1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/kernels/e26308d6-517d-40b3-a5ef-3da204c73923/channels?session_id=70fb168f96ef449a80de44728d1f906d', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/opt/anaconda3/envs/nb2kg-dev/lib/python3.7/site-packages/tornado/websocket.py", line 952, in _accept_connection
        open_result = handler.open(*handler.open_args, **handler.open_kwargs)
      File "/opt/anaconda3/envs/nb2kg-dev/lib/python3.7/site-packages/nb2kg/handlers.py", line 106, in open
        self.ping_callback = PeriodicCallback(self.send_ping, WS_PING_INTERVAL_MS, io_loop=loop)
    TypeError: __init__() got an unexpected keyword argument 'io_loop'

Once removed, I see expected behavior for startup (although I didn't test behind a proxy).

@@ -86,8 +89,23 @@ def get(self, kernel_id, *args, **kwargs):
self.kernel_id = cast_unicode(kernel_id, 'ascii')
yield gen.maybe_future(super(WebSocketChannelsHandler, self).get(kernel_id=kernel_id, *args, **kwargs))

def send_ping(self):
if self.stream.closed() and self.ping_callback is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing the following every 30 seconds. Since the callback is never stopped, this error gets logged every 30 seconds even after the kernel is shutdown...

[E 08:11:10.512 NotebookApp] Exception in callback <bound method WebSocketChannelsHandler.send_ping of <nb2kg.handlers.WebSocketChannelsHandler object at 0x1038c29b0>>
    Traceback (most recent call last):
      File "/opt/anaconda3/envs/nb2kg-dev/lib/python3.7/site-packages/tornado/ioloop.py", line 907, in _run
        return self.callback()
      File "/opt/anaconda3/envs/nb2kg-dev/lib/python3.7/site-packages/nb2kg/handlers.py", line 93, in send_ping
        if self.stream.closed() and self.ping_callback is not None:
    AttributeError: 'NoneType' object has no attribute 'closed'

It may be trickier than just testing self.stream for None since the connection code uses 'detach' and I'm not certain how that actually works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be an issue with the move to Tornado 6. See this update in notebook: jupyter/notebook@5828300#diff-808f61143410aa59f6ac185458495d66

However, I suspect there's a bug there in that self.ws_connection is really not accessible to the WebSocketMixin class or I don't understand how multiple inheritance works in Python. This member variable is defined in WebSocketHandler, which is the sibling super-class to ZMQStreamHandler.

That said, since NB2KG doesn't use a mixin here, you're probably fine using self.ws_connection in the condition.

@kevin-bates
Copy link
Member

This brings up an interesting issue. How to dictate dependency versions? As a result, I think we should probably bump the minimal Notebook version to 5.7.8.

@esevan
Copy link
Contributor Author

esevan commented Apr 5, 2019

@kevin-bates Oh.. sorry. I tested this only in my env. My fault..T^T. I deliberately changed tornado version dependency for compatibility with other server extensions using older version of tornado.
I'm using notebook==5.7.2, tornado==4.5.3. I'm going to rebase this by testing with tornado >= 5.0.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I had a couple more comments/questions.

@@ -40,6 +40,9 @@
KG_CONNECT_TIMEOUT = float(os.getenv('KG_CONNECT_TIMEOUT', 20.0))
KG_REQUEST_TIMEOUT = float(os.getenv('KG_REQUEST_TIMEOUT', 20.0))

# Keepalive ping interval (30 seconds)
WS_PING_INTERVAL_MS = 30000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this value a function of a seconds-based environment variable (e.g., KG_WS_PING_INTERVAL_SECS) that defaults to 30? I'd like the option of being able to change this w/o building.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this as you recommended. BTW do you have any plan to move those settings from Environment Variable to Configuration File such as ini, yaml, or even .py dealing with traits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.

No plans on creating formal config values for these since NB2KG is embedded in Notebook (in master tip) where these have been promoted to formal config values (with env support). Once NB 6 is released, there won't be a need to deal with a separate NB2KG server extension (although that approach would still work provided once doesn't start NB with --gateway-url).

self.ping(b'')

def on_pong(self, data):
self.log.debug('Client has sent pong')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the Notebook code implements a timeout mechanism where on_pong captures the last pong time. Is that not necessary here (between Notebook and Gateway)? If not, can we remove this method of comment out the debug statement and use pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need it because KG already have logic for culling idle kernel. It will cover culling websocket of no response.

When nb2kg server is behind the proxy closing idle stream connection such as nginx and envoy websocket is closed and reconnected.
This behavior updates kernel `last_activity` so that it keeps `kernel_gateway` from culling kernels.

This change is from `WebsocketMixin` in jupyter/notebook.

Signed-off-by: Eunsoo Park <esevan.park@gmail.com>
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Eunsoo - thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants