Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliable list_running_servers(runtime_dir) design #5716

Open
wants to merge 18 commits into
base: 6.4.x
Choose a base branch
from

Conversation

abaelhe
Copy link

@abaelhe abaelhe commented Aug 31, 2020

active check whether that process is really available via real HTTP request;
process pid maybe recycle and reused by host Operating System that i encountered;
and we must use real HTTP request other than the past check_pid(info['pid']).

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this pull request. I understand why this is being done, although I'm curious how often you're finding the PIDs getting recycled (I know some OSs recycle much more frequently than others) and wonder why you're encountering "stale" nbserver files in the first place. How are you terminating the Notebook server normally? Is there some interaction that is using SIGKILL instead of SIGTERM or a larger problem where the server is improperly terminating? I suppose it takes just one bad apple (i.e., stale PID) and things can go wrong, and, I agree, this approach helps mitigate that - I'm just not sure about the trade-off.

I've provided a few comments and observations that I hope you find helpful. I didn't comment on the test code and it appears some socket-based tests are failing which rely on the list and stop functionality.

Returns True if the server was stopped by any means, False if stopping it
failed (on Windows).
"""
def kernel_request(server_info, path='/login', method='GET', body=None, headers=None, timeout=5, log=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be renamed since its unrelated to kernel. Perhaps server_request?

Also, please remove the default value from path since that should be explicitly provided by the caller (which is already the case anyway).

Copy link
Author

@abaelhe abaelhe Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be renamed since its unrelated to kernel. Perhaps server_request?

we creator have this naming right;
with respect to Guido name Python; and their first creator name IPython and Jupyter also Anaconda;
i am working on my IPython Swift Kernel, kernel in my brain;
don't mind, different think lead sparks.

Also, please remove the default value from path since that should be explicitly provided by the caller (which is already the case anyway).

caller will provide their; this default value does not conflict to different path;
we use '/login' as 'default value' to remind caller authentication first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method needs to change and the default for path either removed or updated to /api

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with @kevin-bates here. This method needs to be renamed to something like api_request before we can accept it. It's not requesting a response from a "kernel" (as we define it in the Jupyter ecosystem). It's just a general server API ping.

I also agree, the default path should probably be /api.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who create new who own naming rights.
answered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name matters here. It obscures this library's API if function names are not clear.

kernel_request is the wrong name for this function. In the Jupyter ecosystem, kernel refers specifically to the underlying language kernel being executed by calls to the server's kernel API. This method is not making a kernel request. It's making a generic request the server's REST API. Let's generalize this function name to something like server_request or api_request.

particularly because Jupyter has a notion of kernels that is specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm setting this conversation to unresolved, since this change is necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def kernel_request(server_info, path='/login', method='GET', body=None, headers=None, timeout=5, log=None):
def api_request(server_info, path='/api', method='GET', body=None, headers=None, timeout=5, log=None):

notebook/notebookapp.py Outdated Show resolved Hide resolved
notebook/notebookapp.py Outdated Show resolved Hide resolved
notebook/notebookapp.py Outdated Show resolved Hide resolved
notebook/notebookapp.py Outdated Show resolved Hide resolved
@abaelhe
Copy link
Author

abaelhe commented Sep 1, 2020

Thanks for opening this pull request. I understand why this is being done, although I'm curious how often you're finding the PIDs getting recycled (I know some OSs recycle much more frequently than others) and wonder why you're encountering "stale" nbserver files in the first place. How are you terminating the Notebook server normally? Is there some interaction that is using SIGKILL instead of SIGTERM or a larger problem where the server is improperly terminating? I suppose it takes just one bad apple (i.e., stale PID) and things can go wrong, and, I agree, this approach helps mitigate that - I'm just not sure about the trade-off.

i am building a Dockerfile on RedHat's quay.io, to deploy my Cloud based Amazon.com Crawler on Mac/Linux/...;
in the meantime; god or the Operating System selected my dear "jupyter-notebook" process;
meanwhile, i am making my IPython Kernel for Apple Swift Language(Python/C++/Swift/LLVM/CLang/LLDB... );
and i choose this reliable and better design and implement as my first milestone;

I've provided a few comments and observations that I hope you find helpful. I didn't comment on the test code and it appears some socket-based tests are failing which rely on the list and stop functionality.

your sparks are welcome.

abaelhe and others added 3 commits September 2, 2020 05:05
Accepted, Thanks.

Co-authored-by: Kevin Bates <kbates4@gmail.com>
Accepted.

Co-authored-by: Kevin Bates <kbates4@gmail.com>
@kevin-bates
Copy link
Member

I just now found that Lab returns yet a different title. This implies any front-end extension will have a different "title", making this approach too fragile. Since those applications could also rely on jupyter notebook list and jupyter notebook stop <port/socket> functionality, I believe a different approach should be adopted and, again, question the utility when, in practice, PID reuse would be rare in normal circumstances (i.e., when the notebook server is stopped gracefully).

@abaelhe abaelhe closed this Sep 2, 2020
@abaelhe
Copy link
Author

abaelhe commented Sep 2, 2020

reopen

@abaelhe abaelhe reopened this Sep 2, 2020
@Zsailer Zsailer self-requested a review September 2, 2020 15:34
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abaelhe - this is getting much closer. I'm still slightly concerned about requiring the use of the same notebook version, but I'm not sure that's a very big deal. It's certainly better than terminating the wrong process! To that point, and as I noted in the review, we should probably eliminate the PID-based termination altogether for this PR.

notebook/notebookapp.py Show resolved Hide resolved
notebook/notebookapp.py Outdated Show resolved Hide resolved
notebook/notebookapp.py Show resolved Hide resolved
notebook/notebookapp.py Show resolved Hide resolved
notebook/notebookapp.py Outdated Show resolved Hide resolved
notebook/notebookapp.py Outdated Show resolved Hide resolved
notebook/notebookapp.py Outdated Show resolved Hide resolved
notebook/notebookapp.py Show resolved Hide resolved
notebook/notebookapp.py Show resolved Hide resolved
abaelhe and others added 2 commits September 3, 2020 08:05
Co-authored-by: Kevin Bates <kbates4@gmail.com>
Copy link
Author

@abaelhe abaelhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abaelhe - this is getting much closer. I'm still slightly concerned about requiring the use of the same notebook version, but I'm not sure that's a very big deal. It's certainly better than terminating the wrong process! To that point, and as I noted in the review, we should probably eliminate the PID-based termination altogether for this PR.

really useful key point;
i will change the "/api" protocol, define a better policy for jupyter determination;
and i still invite you as the reviewer.

notebook/notebookapp.py Show resolved Hide resolved
notebook/notebookapp.py Outdated Show resolved Hide resolved
@abaelhe
Copy link
Author

abaelhe commented Sep 3, 2020

Thanks @abaelhe - this is getting much closer. I'm still slightly concerned about requiring the use of the same notebook version, but I'm not sure that's a very big deal. It's certainly better than terminating the wrong process! To that point, and as I noted in the review, we should probably eliminate the PID-based termination altogether for this PR.

really useful key point;
i will change the "/api" protocol, define a better policy for jupyter determination;
and i still invite you as the reviewer.

commit 978d883
Author: git abaelhe@icloud.com
Date: Thu Sep 3 21:19:26 2020 +0800

Multi-instance and multi-version of Jupyter Notebook

commit 07570c6
Author: Abael He abaelhe@icloud.com
Date: Thu Sep 3 08:11:22 2020 +0800

@abaelhe abaelhe closed this Sep 3, 2020
@abaelhe abaelhe reopened this Sep 3, 2020
@kevin-bates
Copy link
Member

Hi @abaelhe - thank you for your patience.

I believe this is getting too complicated for the original goals.

  1. List running servers without using an HTTP request instead of PID.
  2. Do not terminate any process that may have reused the PID.

Instead of adding a version to the /api request and force upgrades, could we just check that the response succeeded? I think we can change to use /api/status later so useful information can be displayed when listing the servers, but for now, I think we just need to check the success status.

Let's try to keep these changes as simple as possible. Does that work for you?

@abaelhe
Copy link
Author

abaelhe commented Sep 4, 2020

Hi @abaelhe - thank you for your patience.

I believe this is getting too complicated for the original goals.

  1. List running servers without using an HTTP request instead of PID.

we are on the way to build a reliable ipython/jupyter; a right way requires a bit efforts;

  1. Do not terminate any process that may have reused the PID.

no matter reused process is processing a billions transaction or just hello world;
we MUST respect to user's ownner rights.
especially ipython/jupyter are used to process valuable massive data at most time;

Instead of adding a version to the /api request and force upgrades, could we just check that the response succeeded? I think we can change to use /api/status later so useful information can be displayed when listing the servers, but for now, I think we just need to check the success status.
Let's try to keep these changes as simple as possible. Does that work for you?

the response status is success if response.body marked with JUPYTER_NOTEBOOK_TAG;
we already did that in a reliable and fast "/api" way;
i can't achieve it without you.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @abaelhe. There is no need to alter the REST API for this change. Let's just use the fact that a response was successful and issue a message if the version strings don't match.

I still ask that the name and signature of kernel_request() be changed. Thank you.

notebook/base/handlers.py Show resolved Hide resolved
notebook/notebookapp.py Show resolved Hide resolved
Returns True if the server was stopped by any means, False if stopping it
failed (on Windows).
"""
def kernel_request(server_info, path='/login', method='GET', body=None, headers=None, timeout=5, log=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method needs to change and the default for path either removed or updated to /api

notebook/notebookapp.py Show resolved Hide resolved
notebook/notebookapp.py Show resolved Hide resolved
notebook/notebookapp.py Show resolved Hide resolved
notebook/notebookapp.py Outdated Show resolved Hide resolved
@Zsailer
Copy link
Member

Zsailer commented Sep 4, 2020

kernel_request() MUST retrieve Jupyter Notebook identification tag before or after authentication.
and "api/status" is an AUTHENTICATED ONLY call.

Ah, I see. You're right. That is an issue. I agree with you, the /api endpoint should probably return more specific information about the application (namely, Jupyter Notebook).

Here's the problem... changing the response model is a significant REST API breaking change. Such changes take longer to review for us (at Jupyter). As I mentioned before, this type of change affects multiple other applications. In order to accept it, we would have to get buy-in from the larger community. That could take weeks (or even months).

What you're trying to achieve in this PR is a rather small change (though much appreciate!!). To move forward, I see two possible choices:

  1. Drop the API breaking change to the response model in this PR and merge the api_request function that pings the server to verify it's running when list_running_servers is called. Then, sometime in a later PR, we can lobby to change the /api endpoint and improve this function further.
  2. Pause any further review on this PR, propose the change to the REST API through the Jupyter Enhancement Proposals repository, wait for approval there, then return to this PR once everything is approved.

Unfortunately, I (and likely @kevin-bates) don't have the bandwidth to shepherd (2) through the process, but I'd be happy to review a proposal over there. In full disclosure, (2) will likely take much longer to happen.

If you decide (1) is the direction you want to go, I'm happy to merge this after the final review.

Thanks again, @abaelhe, for all your work here. It's great stuff!

Returns True if the server was stopped by any means, False if stopping it
failed (on Windows).
"""
def kernel_request(server_info, path='/login', method='GET', body=None, headers=None, timeout=5, log=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def kernel_request(server_info, path='/login', method='GET', body=None, headers=None, timeout=5, log=None):
def api_request(server_info, path='/api', method='GET', body=None, headers=None, timeout=5, log=None):

Copy link
Author

@abaelhe abaelhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernel_request() MUST retrieve Jupyter Notebook identification tag before or after authentication.
and "api/status" is an AUTHENTICATED ONLY call.

Ah, I see. You're right. That is an issue. I agree with you, the /api endpoint should probably return more specific information about the application (namely, Jupyter Notebook).

Here's the problem... changing the response model is a significant REST API breaking change. Such changes take longer to review for us (at Jupyter). As I mentioned before, this type of change affects multiple other applications. In order to accept it, we would have to get buy-in from the larger community. That could take weeks (or even months).

add one JUPYTER_NOTEBOOK_TAG feature is transparent to those who using "verson" field from "/api";
i don't see that is a break to REST API response model; they are compatible.

What you're trying to achieve in this PR is a rather small change (though much appreciate!!). To move forward, I see two possible choices:

less code, better product; more user.so,
focus on our main road;

  1. Drop the API breaking change to the response model in this PR and merge the api_request function that pings the server to verify it's running when list_running_servers is called. Then, sometime in a later PR, we can lobby to change the /api endpoint and improve this function further.
  2. Pause any further review on this PR, propose the change to the REST API through the Jupyter Enhancement Proposals repository, wait for approval there, then return to this PR once everything is approved.

Unfortunately, I (and likely @kevin-bates) don't have the bandwidth to shepherd (2) through the process, but I'd be happy to review a proposal over there. In full disclosure, (2) will likely take much longer to happen.

If you decide (1) is the direction you want to go, I'm happy to merge this after the final review.

Thanks again, @abaelhe, for all your work here. It's great stuff!

thanks to customers tolerantly using Jupyter Notebook with previous check_pid().

systematic design principle to flexible and diverse methods; shield to spear; they are all unity.
Apple's Logo keeps one bite in history;
why not place JUPYTER_NOTEBOOK_TAG at "/api" or somewhere required; for us, for users.

we break DOS into Windows; RPC into REST API; and many things.
we keep break REST API if that is reasonable;

merge into master branch first;
JUST DO IT!

notebook/notebookapp.py Show resolved Hide resolved
@abaelhe abaelhe requested a review from Zsailer September 5, 2020 07:26
@Zsailer
Copy link
Member

Zsailer commented Sep 8, 2020

we break DOS into Windows; RPC into REST API; and many things.
we keep break REST API if that is reasonable;

merge into master branch first;
JUST DO IT!

This isn't how we do things in the Jupyter ecosystem, for better or worse. Jupyter's REST API is an old protocol that many applications depend on. Changes require formal review through the enhancement proposal process.

In this case, we need to follow Jupyter's process. If you want to propose such a change, we'd appreciate if you'd open the discussion on the enhancement-proposals repo. I am guessing you'll get plenty of support for this idea and we'll be able to move forward fairly quickly. (I just don't have the bandwidth to champion this change myself.)

As this PR stands, we can't accept it until that process is complete or we drop the changes to the REST API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants