Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the Plasma store ready for Arrow integration #579

Merged
merged 20 commits into from
May 31, 2017

Conversation

pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented May 21, 2017

This PR disentangles the plasma store from the rest of Ray and prepares the merge into Apache Arrow.

Going forward there are two possibilities: Put the store into arrow and then create a new PR that integrates it into Ray again, or get this PR into a mergeable state, merge it into Ray and then put the store into arrow. Any opinions on this? [Right now there is a bunch of code duplication that will go away]

Note this makes a bunch of progress on #406.

@pcmoritz pcmoritz force-pushed the arrow-2 branch 7 times, most recently from 206b928 to 9604673 Compare May 23, 2017 16:33
@AmplabJenkins
Copy link

Build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/903/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/904/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/905/
Test PASSed.

bool IsTypeError() const { return code() == StatusCode::TypeError; }
bool IsUnknownError() const { return code() == StatusCode::UnknownError; }
bool IsNotImplemented() const { return code() == StatusCode::NotImplemented; }
bool IsPlasmaObjectExists() const { return code() == StatusCode::PlasmaObjectExists; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly are the semantics of IsPlasmaObjectExists and IsPlasmaStoreFull? I suppose they just return whatever the last status code that they got from the store, right?

This is a very minor point, but it is possible to add a large object, then IsPlasmaStoreFull will return true. Then add a small object and IsPlasmaStoreFull will return false. Also, it is possible for the store to be literally full, but for `IsPlasmaStoreFull to return false. I guess all I'm saying is that the naming is a little unintuitive.

Note, I'm not actually suggesting that you change the name, maybe add a comment though.


#include "status.h"

#define RAY_PROTOCOL_VERSION 0x0000000000000000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to replace this with PLASMA_PROTOCOL_VERSION?

}
};

typedef UniqueID ObjectID;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so happy to finally make these classes (it's especially great to have the hash method). If we go ahead and make ObjectID a subclass would that break anything? It'd be nice to get rid of the typedef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work right now but can be made to work; I'd like to merge the PR first and do it as part of a subsequent PR, because I'm also working on the arrow integration PR right now and would like to keep the changes that are needed to backport minimal.

const uint8_t *data() const;
uint8_t *mutable_data();
std::string binary() const;
std::string sha1() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a more generic name like hash or hex? Some of these IDs are just generated randomly and it's possible we'd change the hash function later.

@@ -0,0 +1,210 @@
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it matters, but some of this code does not come from the LevelDB authors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is actually from arrow (which presumably took it from leveldb) and will be removed from our codebase after the arrow merge

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the copyright seems inaccurate regardless of which codebase it is in, right?

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/911/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/912/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/913/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/914/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/915/
Test PASSed.

@pcmoritz pcmoritz changed the title [WIP] Make the Plasma store ready for Arrow integration Make the Plasma store ready for Arrow integration May 31, 2017
Copy link
Collaborator

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this looks great, I left a few comments! At a high level, the code for the manager and store seem to be very tightly coupled. What is your plan for decoupling them?

@@ -45,7 +45,6 @@ if(HAS_PLASMA)
include_directories("${CMAKE_CURRENT_LIST_DIR}/../plasma")
include_directories("${CMAKE_CURRENT_LIST_DIR}/../common")
include_directories("${CMAKE_CURRENT_LIST_DIR}/../common/thirdparty")
set(COMMON_EXTENSION ../common/lib/python/common_extension.cc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this change, note that we still use the variable COMMON_EXTENSION later in this file.

#ifdef HAS_PLASMA
#include "plasma_common.h"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no reason to have a space here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the space is not there, clang-format tries to reorder the includes to sort them alphabetically, but this include needs to come first, so I think we do need the space here.


add_dependencies(plasma protocol_fbs)

add_executable(plasma_manager
plasma_manager.cc)
add_executable(plasma_manager
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want these extra spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return 0;
}
if (errno == EPIPE || errno == EBADF || errno == ECONNRESET) {
ARROW_LOG(WARNING)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really make sense to convert the plasma manager to use arrow macros for logging? Are you planning on moving plasma_manager.cc into Arrow? Other than for checking status codes for the manager's interaction with the store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to use ARROW_LOG like macros throughout the codebase eventually. We can copy logging.h over and rename it to RAY_LOG.

/// present in the store. In this case, the client should not call
/// plasma_release.
/// - PlasmaError_OutOfMemory, if the store is out of memory and
/// cannot
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation here.

@@ -0,0 +1,210 @@
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the copyright seems inaccurate regardless of which codebase it is in, right?

@@ -163,7 +163,7 @@ def check_components_alive(self, component_type, check_component_alive):
if check_component_alive:
self.assertTrue(component.poll() is None)
else:
self.assertTrue(component.poll() <= 0)
self.assertTrue(not component.poll() is None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plasma store now returns 1 if an error occurs (before that the status was negative). This is because of the arrow status checking macros.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/917/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/918/
Test PASSed.

@robertnishihara robertnishihara merged commit b94b4a3 into ray-project:master May 31, 2017
@robertnishihara robertnishihara deleted the arrow-2 branch May 31, 2017 23:24
return std::memcmp(data(), rhs.data(), kUniqueIDSize) == 0;
}

Status plasma_error_status(int plasma_error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed, but this gives a compiler warning on MacOS.

/ray/src/plasma/plasma_manager.cc:59:1: warning: control may reach end of non-void function [-Wreturn-type]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants