-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raylet memory corruption fixes #2591
raylet memory corruption fixes #2591
Conversation
Test PASSed. |
src/ray/common/client_connection.cc
Outdated
@@ -84,7 +84,7 @@ ray::Status ServerConnection<T>::WriteMessage(int64_t type, int64_t length, | |||
boost::system::error_code error; | |||
WriteBuffer(message_buffers, error); | |||
if (error) { | |||
return ray::Status::IOError(error.message()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks strange why error.message() is corrupted. May we could print error.value() as well since that's valid? That might also give some hints on what kind of errors encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it'd be preferable to do strerror(ec.value())
as in #2548 so that we have the error message, even despite the fact that this is probably not platform independent.
We can address this by factoring out all of the calls to strerror(ec.value())
in a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not certain that ec.value()
is consistent with the errno numbers expected by strerror
. There may be another way that's specific to boost. I am looking into it.
@atumanov so are you saying that it is pretty certain that the bug is in boost (as opposed to our code)? |
@robertnishihara , I am not 100% sure, but:
From that, I have some degree of confidence that it's not our code, but not 100% confidence. I added a utility function to convert boost error codes to ray status objects and cleaned up the code in several places to use this utility function in this PR. |
this PR now subsumes #2548 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks :)
Test PASSed. |
Test PASSed. |
Test FAILed. |
jenkins retest this please |
Test PASSed. |
What do these changes do?
There are two memory corruption fixes here:
boost::system::error_code
message() call returns a corrupted string, even immediately after initializing the error as follows:boost::system::error_code error;
The (temp?) fix is just to replace it with a generic message indicating failure to write message to the socket.
erase
is called on an unordered_map iterator, which is then subsequently accessed.Iterators to the erased element are invalidated:
https://stackoverflow.com/questions/38854265/unordered-map-element-being-deleted
Valgrind was reporting a bad read, which is fixed by this PR.
Related issue number
fixes : #2584
subsumes: #2548