-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the Plasma store ready for Arrow integration #579
Conversation
206b928
to
9604673
Compare
Build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
src/plasma/status.h
Outdated
bool IsTypeError() const { return code() == StatusCode::TypeError; } | ||
bool IsUnknownError() const { return code() == StatusCode::UnknownError; } | ||
bool IsNotImplemented() const { return code() == StatusCode::NotImplemented; } | ||
bool IsPlasmaObjectExists() const { return code() == StatusCode::PlasmaObjectExists; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly are the semantics of IsPlasmaObjectExists
and IsPlasmaStoreFull
? I suppose they just return whatever the last status code that they got from the store, right?
This is a very minor point, but it is possible to add a large object, then IsPlasmaStoreFull
will return true. Then add a small object and IsPlasmaStoreFull
will return false. Also, it is possible for the store to be literally full, but for `IsPlasmaStoreFull to return false. I guess all I'm saying is that the naming is a little unintuitive.
Note, I'm not actually suggesting that you change the name, maybe add a comment though.
src/plasma/plasma_io.h
Outdated
|
||
#include "status.h" | ||
|
||
#define RAY_PROTOCOL_VERSION 0x0000000000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to replace this with PLASMA_PROTOCOL_VERSION
?
} | ||
}; | ||
|
||
typedef UniqueID ObjectID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so happy to finally make these classes (it's especially great to have the hash
method). If we go ahead and make ObjectID
a subclass would that break anything? It'd be nice to get rid of the typedef
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't work right now but can be made to work; I'd like to merge the PR first and do it as part of a subsequent PR, because I'm also working on the arrow integration PR right now and would like to keep the changes that are needed to backport minimal.
src/plasma/plasma_common.h
Outdated
const uint8_t *data() const; | ||
uint8_t *mutable_data(); | ||
std::string binary() const; | ||
std::string sha1() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a more generic name like hash
or hex
? Some of these IDs are just generated randomly and it's possible we'd change the hash function later.
@@ -0,0 +1,210 @@ | |||
// Copyright (c) 2011 The LevelDB Authors. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it matters, but some of this code does not come from the LevelDB authors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is actually from arrow (which presumably took it from leveldb) and will be removed from our codebase after the arrow merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but the copyright seems inaccurate regardless of which codebase it is in, right?
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this looks great, I left a few comments! At a high level, the code for the manager and store seem to be very tightly coupled. What is your plan for decoupling them?
@@ -45,7 +45,6 @@ if(HAS_PLASMA) | |||
include_directories("${CMAKE_CURRENT_LIST_DIR}/../plasma") | |||
include_directories("${CMAKE_CURRENT_LIST_DIR}/../common") | |||
include_directories("${CMAKE_CURRENT_LIST_DIR}/../common/thirdparty") | |||
set(COMMON_EXTENSION ../common/lib/python/common_extension.cc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this change, note that we still use the variable COMMON_EXTENSION
later in this file.
#ifdef HAS_PLASMA | ||
#include "plasma_common.h" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no reason to have a space here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the space is not there, clang-format tries to reorder the includes to sort them alphabetically, but this include needs to come first, so I think we do need the space here.
src/plasma/CMakeLists.txt
Outdated
|
||
add_dependencies(plasma protocol_fbs) | ||
|
||
add_executable(plasma_manager | ||
plasma_manager.cc) | ||
add_executable(plasma_manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually want these extra spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
return 0; | ||
} | ||
if (errno == EPIPE || errno == EBADF || errno == ECONNRESET) { | ||
ARROW_LOG(WARNING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really make sense to convert the plasma manager to use arrow macros for logging? Are you planning on moving plasma_manager.cc
into Arrow? Other than for checking status codes for the manager's interaction with the store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to use ARROW_LOG like macros throughout the codebase eventually. We can copy logging.h over and rename it to RAY_LOG.
src/plasma/plasma_store.h
Outdated
/// present in the store. In this case, the client should not call | ||
/// plasma_release. | ||
/// - PlasmaError_OutOfMemory, if the store is out of memory and | ||
/// cannot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indentation here.
@@ -0,0 +1,210 @@ | |||
// Copyright (c) 2011 The LevelDB Authors. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but the copyright seems inaccurate regardless of which codebase it is in, right?
@@ -163,7 +163,7 @@ def check_components_alive(self, component_type, check_component_alive): | |||
if check_component_alive: | |||
self.assertTrue(component.poll() is None) | |||
else: | |||
self.assertTrue(component.poll() <= 0) | |||
self.assertTrue(not component.poll() is None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plasma store now returns 1 if an error occurs (before that the status was negative). This is because of the arrow status checking macros.
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
return std::memcmp(data(), rhs.data(), kUniqueIDSize) == 0; | ||
} | ||
|
||
Status plasma_error_status(int plasma_error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed, but this gives a compiler warning on MacOS.
/ray/src/plasma/plasma_manager.cc:59:1: warning: control may reach end of non-void function [-Wreturn-type]
This PR disentangles the plasma store from the rest of Ray and prepares the merge into Apache Arrow.
Going forward there are two possibilities: Put the store into arrow and then create a new PR that integrates it into Ray again, or get this PR into a mergeable state, merge it into Ray and then put the store into arrow. Any opinions on this? [Right now there is a bunch of code duplication that will go away]
Note this makes a bunch of progress on #406.