Skip to content

Conversation

Kunchd
Copy link
Contributor

@Kunchd Kunchd commented Oct 9, 2025

Why are these changes needed?

As discovered in the PR to better define the interface for reference counter, plasma store provider and memory store both share thin dependencies on reference counter that can be refactored out. This will reduce entanglement in our code base and improve maintainability.

The main logic changes are located in

  • src/ray/core_worker/store_provider/plasma_store_provider.cc, where reference counter related logic is refactor into core worker
  • src/ray/core_worker/core_worker.cc, where factored out reference counter logic is resolved
  • src/ray/core_worker/store_provider/memory_store/memory_store.cc, where logic related to reference counter has either been removed due to the fact that it is tech debt or refactored into caller functions.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run pre-commit jobs to lint the changes in this PR. (pre-commit setup)
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

}

Status CoreWorkerPlasmaStoreProvider::Get(
const absl::flat_hash_set<ObjectID> &object_ids,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic below lifts the owner address resolution logic into the caller of Get.

}
}

absl::flat_hash_map<ObjectID, rpc::Address> CoreWorker::GetObjectIdToOwnerAddressMap(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function represents the owner address resolution logic lifted out of plasma store.

for (auto &get_request : get_requests) {
get_request->Set(object_id, object_entry);
// If ref counting is enabled, override the removal behaviour.
if (get_request->ShouldRemoveObjects() && ref_counter_ == nullptr) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove after get flag is getting removed since it stems from the dark ages when reference counting wasn't a thing.

const ray::RayObject &object, const ObjectID &object_id)> object_allocator)
: io_context_(io_context),
ref_counter_(counter),
reference_counting_(reference_counting),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All reference counter null checks for whether reference counting is enabled is lifted into the caller.

void CoreWorkerMemoryStore::Put(const RayObject &object, const ObjectID &object_id) {
void CoreWorkerMemoryStore::Put(const RayObject &object,
const ObjectID &object_id,
const bool has_reference) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory store only uses reference counter for checking if it has reference to an object or if reference counting is enabled. Both is lifted into caller.

@Kunchd Kunchd marked this pull request as ready for review October 9, 2025 19:59
@Kunchd Kunchd requested review from a team, SongGuyang, kfstorm and raulchen as code owners October 9, 2025 19:59
@Kunchd Kunchd requested a review from israbbani October 9, 2025 19:59
@Kunchd Kunchd added the go add ONLY when ready to merge, run all tests label Oct 9, 2025
explicit DefaultCoreWorkerMemoryStoreWithThread(
std::unique_ptr<InstrumentedIOContextWithThread> io_context)
: CoreWorkerMemoryStore(io_context->GetIoService()),
: CoreWorkerMemoryStore(io_context->GetIoService(), /*reference_counting=*/false),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: CoreWorkerMemoryStore(io_context->GetIoService(), /*reference_counting=*/false),
: CoreWorkerMemoryStore(io_context->GetIoService(), /*reference_counting=*/true),

There's no reason for the default core worker memory store to be created with reference counting disabled.

@ray-gardener ray-gardener bot added the core Issues that should be addressed in Ray Core label Oct 10, 2025
Copy link
Contributor

@dayshah dayshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't looked at this too hard yet, but at first glance it looks like having address in that vector will make wait/get perf on large lists even worse than it already is bc the impact of the vector set copying madness will be even worse

Do we need address to be in there??

davik added 2 commits October 10, 2025 23:56
@Kunchd
Copy link
Contributor Author

Kunchd commented Oct 10, 2025

haven't looked at this too hard yet, but at first glance it looks like having address in that vector will make wait/get perf on large lists even worse than it already is bc the impact of the vector set copying madness will be even worse

Do we need address to be in there??

Nope, definitely doesn't need to be in the same data structure. I've moved it out for performance.

explicit DefaultCoreWorkerMemoryStoreWithThread(
std::unique_ptr<InstrumentedIOContextWithThread> io_context)
: CoreWorkerMemoryStore(io_context->GetIoService()),
: CoreWorkerMemoryStore(io_context->GetIoService(), /*reference_counting=*/false),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Mock Initialization Mismatch Causes Test Inconsistency

The DefaultCoreWorkerMemoryStoreWithThread mock is initialized with reference_counting=false. PR discussions suggested this should be true, as there's no reason to disable it. This inconsistency can cause the mock to behave differently from production code, potentially masking reference counting bugs in tests.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants