ARROW-3920: [plasma] Fix reference counting in custom tensorflow plasma operator. #3061

robertnishihara · 2018-11-30T22:32:47Z

There is an issue here where Release was never being called in the plasma TF operator.

Note that I also changed the release delay in the plasma operator to 0.

concretevitamin · 2018-11-30T23:28:20Z

Thoughts:

It'd be a good idea to update Plasma documentation, in the source code & outside, to reflect this usage pattern.
Question: is the omission of Release() considered a bug? If so, can you add a test case that fails before your change and passes after this change? Or is this an internal detail?
I would avoid doing renaming changes in a bug fix PR.

robertnishihara · 2018-12-01T00:10:24Z

Thanks @concretevitamin!

I agree, though there's no change in the usage here.
It's a bug. I added a test (and might add another).
The renaming is somewhat necessary because there was actually another bug that changed the behavior of the plasma op (to use arrow tensors instead of ndarrays), and I needed to fix both things to get it working.

robertnishihara · 2018-12-01T07:14:39Z

python/pyarrow/tensorflow/plasma_op.cc

        ARROW_CHECK_OK(client_.Release(object_id));
+        auto orig_stream = context->op_device_context()->stream();
+        auto stream_executor = orig_stream->parent();
+        CHECK(stream_executor->HostMemoryUnregister(static_cast<void*>(data)));


We probably only want this to happen in the GPU case, right?

Same with the other op.

Yeah, I updated it (it actually doesn't compile for cpu only tensorflow).

codecov-io · 2018-12-01T08:23:29Z

Codecov Report

Merging #3061 into master will decrease coverage by <.01%.
The diff coverage is 93.33%.

@@            Coverage Diff             @@
##           master    #3061      +/-   ##
==========================================
- Coverage   87.07%   87.07%   -0.01%     
==========================================
  Files         489      489              
  Lines       69160    69161       +1     
==========================================
- Hits        60222    60221       -1     
- Misses       8837     8839       +2     
  Partials      101      101

Impacted Files	Coverage Δ
cpp/src/arrow/python/serialize.h	`0% <ø> (ø)`	⬆️
cpp/src/arrow/python/serialize.cc	`89.95% <100%> (ø)`	⬆️
python/pyarrow/tests/test_plasma_tf_op.py	`97.91% <100%> (+0.04%)`	⬆️
cpp/src/arrow/python/deserialize.cc	`91.7% <87.5%> (ø)`	⬆️
cpp/src/plasma/thirdparty/ae/ae.c	`72.03% <0%> (-0.95%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2bc4d95...c109566. Read the comment docs.

This includes a fix so the TensorFlow op releases memory properly (apache/arrow#3061) and the possibility to store arrow data structures in plasma (apache/arrow#2832). #3404

robertnishihara added 7 commits November 28, 2018 17:56

Add Release call in plasma op.

f434094

Change release delay to 0.

0db9154

Add release call into wrapped_callback.

a2a9c36

Have plasma op deserialize as numpy array.

06985cd

Fix ndarray/tensor confusion in plasma op.

574c035

Fix

f04a7d2

Remove logging statement.

75f2bd9

robertnishihara changed the title ~~ARROW-: [plasma] Fix reference counting in custom tensorflow plasma operator.~~ ARROW-3920: [plasma] Fix reference counting in custom tensorflow plasma operator. Nov 30, 2018

robertnishihara added 2 commits November 30, 2018 14:39

Add test.

b948ce0

Linting

e3b3864

unregister memory

4836342

pcmoritz force-pushed the extrareleaseinplasmaop branch from a4ff374 to 4836342 Compare December 1, 2018 03:08

lint

f89d5df

robertnishihara commented Dec 1, 2018

View reviewed changes

add include guards

c109566

pcmoritz closed this in a667fca Dec 1, 2018

pcmoritz mentioned this pull request Dec 1, 2018

Upgrade Arrow to include Plasma TensorFlow Op release fix ray-project/ray#3448

Merged

robertnishihara deleted the extrareleaseinplasmaop branch December 1, 2018 20:17

asfimport mentioned this pull request Dec 1, 2018

Plasma reference counting not properly done in TensorFlow custom operator. #15890

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ARROW-3920: [plasma] Fix reference counting in custom tensorflow plasma operator. #3061

ARROW-3920: [plasma] Fix reference counting in custom tensorflow plasma operator. #3061

Uh oh!

robertnishihara commented Nov 30, 2018 •

edited

Loading

Uh oh!

concretevitamin commented Nov 30, 2018

Uh oh!

robertnishihara commented Dec 1, 2018

Uh oh!

robertnishihara Dec 1, 2018

Uh oh!

robertnishihara Dec 1, 2018

Uh oh!

pcmoritz Dec 1, 2018

Uh oh!

codecov-io commented Dec 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

ARROW-3920: [plasma] Fix reference counting in custom tensorflow plasma operator. #3061

ARROW-3920: [plasma] Fix reference counting in custom tensorflow plasma operator. #3061

Uh oh!

Conversation

robertnishihara commented Nov 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

concretevitamin commented Nov 30, 2018

Uh oh!

robertnishihara commented Dec 1, 2018

Uh oh!

robertnishihara Dec 1, 2018

Choose a reason for hiding this comment

Uh oh!

robertnishihara Dec 1, 2018

Choose a reason for hiding this comment

Uh oh!

pcmoritz Dec 1, 2018

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Dec 1, 2018

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

robertnishihara commented Nov 30, 2018 •

edited

Loading