Skip to content

Conversation

@robertnishihara
Copy link
Contributor

@robertnishihara robertnishihara commented Nov 30, 2018

There is an issue here where Release was never being called in the plasma TF operator.

Note that I also changed the release delay in the plasma operator to 0.

@robertnishihara robertnishihara changed the title ARROW-: [plasma] Fix reference counting in custom tensorflow plasma operator. ARROW-3920: [plasma] Fix reference counting in custom tensorflow plasma operator. Nov 30, 2018
@concretevitamin
Copy link

Thoughts:

  • It'd be a good idea to update Plasma documentation, in the source code & outside, to reflect this usage pattern.
  • Question: is the omission of Release() considered a bug? If so, can you add a test case that fails before your change and passes after this change? Or is this an internal detail?
  • I would avoid doing renaming changes in a bug fix PR.

@robertnishihara
Copy link
Contributor Author

Thanks @concretevitamin!

  • I agree, though there's no change in the usage here.
  • It's a bug. I added a test (and might add another).
  • The renaming is somewhat necessary because there was actually another bug that changed the behavior of the plasma op (to use arrow tensors instead of ndarrays), and I needed to fix both things to get it working.

@pcmoritz pcmoritz force-pushed the extrareleaseinplasmaop branch from a4ff374 to 4836342 Compare December 1, 2018 03:08
ARROW_CHECK_OK(client_.Release(object_id));
auto orig_stream = context->op_device_context()->stream();
auto stream_executor = orig_stream->parent();
CHECK(stream_executor->HostMemoryUnregister(static_cast<void*>(data)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably only want this to happen in the GPU case, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the other op.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I updated it (it actually doesn't compile for cpu only tensorflow).

@codecov-io
Copy link

Codecov Report

Merging #3061 into master will decrease coverage by <.01%.
The diff coverage is 93.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3061      +/-   ##
==========================================
- Coverage   87.07%   87.07%   -0.01%     
==========================================
  Files         489      489              
  Lines       69160    69161       +1     
==========================================
- Hits        60222    60221       -1     
- Misses       8837     8839       +2     
  Partials      101      101
Impacted Files Coverage Δ
cpp/src/arrow/python/serialize.h 0% <ø> (ø) ⬆️
cpp/src/arrow/python/serialize.cc 89.95% <100%> (ø) ⬆️
python/pyarrow/tests/test_plasma_tf_op.py 97.91% <100%> (+0.04%) ⬆️
cpp/src/arrow/python/deserialize.cc 91.7% <87.5%> (ø) ⬆️
cpp/src/plasma/thirdparty/ae/ae.c 72.03% <0%> (-0.95%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2bc4d95...c109566. Read the comment docs.

@pcmoritz pcmoritz closed this in a667fca Dec 1, 2018
@robertnishihara robertnishihara deleted the extrareleaseinplasmaop branch December 1, 2018 20:17
robertnishihara pushed a commit to ray-project/ray that referenced this pull request Dec 2, 2018
This includes a fix so the TensorFlow op releases memory properly (apache/arrow#3061) and the possibility to store arrow data structures in plasma (apache/arrow#2832).

#3404
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants