-
Notifications
You must be signed in to change notification settings - Fork 3.9k
ARROW-4983: [Plasma] Unmap memory upon destruction of the PlasmaClient #4001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR is used to fix: ray-project/ray#4385 (comment) . |
Multiple issues here:
|
I also note that Client uses |
@fsaintjacques Thanks! I fixed the issues you mentioned and made ClientMmapTableEntry into a C++ class that uses RAII to keep track of the mmaps/munmaps. For 3., the file descriptor is closed as soon as it is received from the store, in the constructor of ClientMmapTableEntry. The memory of |
/// is a hash table mapping a file descriptor to a struct containing the | ||
/// address of the corresponding memory-mapped file. | ||
std::unordered_map<int, ClientMmapTableEntry> mmap_table_; | ||
std::unordered_map<int, std::unique_ptr<ClientMmapTableEntry>> mmap_table_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why unique_ptr
? If you do std::unordered_map<int, ClientMmapTableEntry>
, then when you delete an entry from mmap_table_
, its destructor will run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh do you need that in order to compile when using ARROW_DISALLOW_COPY_AND_ASSIGN
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without the unique_ptr it would not be correct (upon rehashing). The constructor will run with unique_ptr as well.
class ClientMmapTableEntry { | ||
public: | ||
ClientMmapTableEntry(int fd, int64_t map_size) | ||
: fd_(fd), pointer_(nullptr), length_(0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: fd_(fd), pointer_(nullptr), length_(0) { | |
: fd_(fd) { |
there's no point to those initializations, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see @fsaintjacques suggested this. That's fine I guess, but what's the reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if we remove the copy constructor, then move can swap in un-initialized values.
The problem with mutable buffer is that they do not extend the lifetime of the mmap-ed pointer. To change this, you could have a custom MutableBuffer that has a |
@fsaintjacques I introduced PlasmaMutableBuffer to behave similar to the PlasmaBuffer class now, by holding a shared pointer to a PlasmaClient and keeping the client alive this way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
I know this is now merged, but I think it would have been better to link the Buffer to the mmap entry instead of the client, this way you get eager reclamation. Otherwise, one buffer holds all mmap. |
@fsaintjacques You are right, however since #3490, most buffers are allocated into a single memory mapped file [*], so it wouldn't help in practice. Unfortunately we still need the mechanism to support multiple memory mapped files since most malloc implementation won't be able to guarantee that there is actually only a single mmaped file, in practice it can happen that there is the large one and then another small one for example. [*] This has two advantages: 1) it will use the whole memory available. Before, it had a doubling strategy that would lead to only half of the memory being used in the worst case and 2) it avoids that files are mapped and unmapped, which led to performance overheads. |
Author: Philipp Moritz <pcmoritz@gmail.com> Author: Robert Nishihara <robertnishihara@gmail.com> Closes #4001 from pcmoritz/plasma-client-unmap and squashes the following commits: 893bcdc <Robert Nishihara> Update client.cc 784f9af <Philipp Moritz> add documentation 7988a85 <Philipp Moritz> add PlasmaMutableBuffer class 0a6203b <Philipp Moritz> fixes af51c7a <Philipp Moritz> add documentation 53e4b1d <Philipp Moritz> unmap files in plasma client destructor
No description provided.