-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XRay] Raylet node and object manager unification/backend redesign. #1640
Conversation
cc @imzhenyu |
Test PASSed. |
Test PASSed. |
src/ray/gcs/tables.h
Outdated
/// Get a client's information from the cache. | ||
/// | ||
/// @param client The client to get information about. | ||
const ClientInformation &GetClientInformation(ClientID client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const ClientID& client
@@ -0,0 +1,134 @@ | |||
#ifndef RAY_OBJECTDIRECTORY_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAY_OBJECT_DIRECTORY_H
private: | ||
|
||
/// Reference to the gcs client. | ||
std::shared_ptr<GcsClient> gcs_client; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcs_client_
ObjectDirectory::~ObjectDirectory() = default; | ||
|
||
ObjectDirectory::ObjectDirectory(std::shared_ptr<GcsClient> gcs_client){ | ||
this->gcs_client = gcs_client; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after gcs_client -> gcs_client_, can remove the this-> everywhere.
/// \param object_id The object id that was put into the store. | ||
/// \param client_id The client id corresponding to this node. | ||
/// \return Status of whether this method succeeded. | ||
virtual ray::Status ObjectAdded(const ObjectID &object_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call this ReportObjectAdded?
shared_ptr<ray::GcsClient> gcs_client) | ||
: od(new ObjectDirectory(gcs_client)), | ||
work_(io_service_) { | ||
this->store_client_ = unique_ptr<ObjectStoreClient>(new ObjectStoreClient(io_service, config.store_socket_name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the this->
src/ray/raylet/actor.cc
Outdated
|
||
ActorInformation::~ActorInformation() {} | ||
|
||
const ActorID& ActorInformation::GetActorId() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be consistent with Id and ID
src/ray/raylet/node_manager.h
Outdated
|
||
namespace ray { | ||
|
||
class NodeManager : public ClientManager<boost::asio::local::stream_protocol> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to LocalScheduler? And "Raylet" is what we used to call the "NodeManager".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think this was the way we originally had it but I'm not sure now what the reasoning was for changing it. @atumanov?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) The notion of "local" scheduler is being outdated with this rewrite, because there's no "global" scheduler, which makes the term ambiguous and suggestive of the fact there's something other than the "local" scheduler.
(2) I think we might simply be used to the term and, naturally, have affinity for calling this component as it was named before. If we take a step back, I think this component is doing more than just task scheduling. Scheduling is only part of the NodeManager's responsibility. In fact, the scheduler is a component inside the NodeManager, periodically invoked to make scheduling decisions.
I felt like the NodeManager is more appropriate, as a more general and descriptive term capturing the fact that this component does more than just scheduling. It performs functions related to the management of the local node. Raylet is the name of the whole process that includes node management, object management, etc.
Let me know if this works for you, @pcmoritz .
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test PASSed. |
ray::Status status_code = ray::Status::OK(); | ||
if (existing_requests_.count(object_id) == 0) { | ||
existing_requests_[object_id] = ODCallbacks({success_cb, fail_cb}); | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this semicolon do anything?
/// becomes available. | ||
/// @param done_callback Callback to be called when subscription is installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but "becomes available" should be unindented (other places in this file as well)
|
||
ray::Status ObjectDirectory::GetInformation(const ClientID &client_id, | ||
const InfoSuccessCallback &success_cb, | ||
const InfoFailureCallback &fail_cb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer success_callback
and fail_callback
here (and other places)
|
||
ray::Status ObjectDirectory::GetLocationsComplete( | ||
const ray::Status &status, const ObjectID &object_id, | ||
const std::vector<RemoteConnectionInfo> &v) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this something other than v
?
namespace ray { | ||
|
||
struct RemoteConnectionInfo { | ||
RemoteConnectionInfo(const ClientID &id, const std::string &ipaddr, uint16_t portnum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using ip_address
in most places n the codebase. Also portnum
-> port_num
src/ray/raylet/mock_gcs_client.cc
Outdated
} | ||
|
||
ray::Status ClientTable::Add(const ClientID &client_id, const std::string &ip, | ||
uint16_t port, DoneCallback done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done -> done_callback
src/ray/raylet/raylet.cc
Outdated
Raylet::Raylet(boost::asio::io_service& io_service, | ||
const std::string &socket_name, | ||
const ResourceSet &resource_config, | ||
const ObjectManagerConfig &om_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
om_config -> object_manager_config
bool SubtractResources(const ResourceSet &other); | ||
|
||
/// Return the capacity value associated with the specified resource. | ||
/// \param[in] resource_name: Resource name for which capacity is requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we aren't using \param[in]
in very many places, what does it mean?
/// Create a task execution specification. | ||
/// | ||
/// \param execution_dependencies The task's dependencies, determined at | ||
/// execution time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unindent (also other places in this file)
src/ray/raylet/task_spec.h
Outdated
#include <cstddef> | ||
#include <string> | ||
#include <vector> | ||
#include <unordered_map> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u
comes before v
:)
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
What do these changes do?
This PR is the first pass at redesigning and refactoring the Ray backend responsible for task, actor, and object management. We unify this functionality in a single new component, called Raylet, which is intended to combine the functionality of the current local scheduler and the object manager. The rewrite aims to follow object-oriented design, taking the opportunity to convert the backend to pure C++.
Significant changes:
Specific functionality in this PR:
Planned functionality not in this PR: