-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protect raylet against bad messages #4003
protect raylet against bad messages #4003
Conversation
Test PASSed. |
src/ray/common/client_connection.cc
Outdated
@@ -242,7 +244,11 @@ void ClientConnection<T>::ProcessMessageHeader(const boost::system::error_code & | |||
} | |||
|
|||
// If there was no error, make sure the protocol version matches. | |||
RAY_CHECK(read_version_ == RayConfig::instance().ray_protocol_version()); | |||
if (!CheckProtocolVersion()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be slightly clearer if we just have CheckProtocolVersion()
here (not returning anything), and then inside of CheckProtocolVersion we call ServerConnection<T>::Close();
where the comment says "and stop processing the connection".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to skip the rest of this function if CheckProtocolVersion
fails, thus it still needs to return a value.
Thanks, I left some comments. Let's also change the ray_protocol_version to |
@pcmoritz I don't think we should rely on |
If we get rid of it in the future, we need to find a different mechanism to do it then. We need a magic cookie to send before the raylet tries to interpret the next int64_t as a size of the next message (and potentially crash). |
Why don’t we do that in this PR because I think it’s very likely well
remove the protocol version and forget to do this.
…On Sat, Feb 9, 2019 at 2:37 PM Philipp Moritz ***@***.***> wrote:
If we get rid of it in the future, we need a different mechanism to do
this. We need a magic cookie to send before the raylet tries to interpret
the next int64_t as a size of the next message (and potentially crash).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4003 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPOrZMDBmgdQUYO0_rDO4mG6DYzkAQVks5vL025gaJpZM4aye67>
.
|
Let's just rename the protocol_version to ray_cookie. |
updated according to comments, and also renamed |
Test FAILed. |
Thanks, I added a python end-to-end test and fixed the linting. |
Test PASSed. |
664e627
to
636a5a7
Compare
Test PASSed. |
The Travis failure is not related to this PR. I will merge it. |
This is the PR for issue #3915. This fixes the raylet crashes caused by malformed messages.
This change updates node manager so that when it discovers a new node manager, it first sends a
ConnectClient
message to it with its client ID, before sending any other messages, so that the remote node manager knows that further messages from this connection come from a valid source.This change then modifies client_connection, so that when receiving a malformed message (with incorrect protocol version), we'll check if it's from a valid source that has previously sent us a
ConnectClient
message, if it is then it's likely to be a real bug, otherwise we can know it's not from a legitimate source, and we just print a message and don't further process this connection.