Skip to content

Conversation

pcmoritz
Copy link
Contributor

No description provided.

@guoyuhong
Copy link

This PR is used to fix: ray-project/ray#4385 (comment) .

@fsaintjacques
Copy link
Contributor

Multiple issues here:

  1. The object supports copy and assign, you probably don't want this (use ARROW_DISALLOW_COPY_AND_ASSIGN).
  2. The fields are not initialized by default, make uint8_t *ptr = nullptr; and size_t length = 0.
  3. What about the file descriptor?
  4. mmap in one class and munmap in the destructor of another class, seems like bad design and an afterthought.

@fsaintjacques
Copy link
Contributor

I also note that Client uses LookupOrMmap to create MutableBuffer in PlasmaClient::Impl::Create, all said buffers needs to be reclaimed too.

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Mar 21, 2019

@fsaintjacques Thanks! I fixed the issues you mentioned and made ClientMmapTableEntry into a C++ class that uses RAII to keep track of the mmaps/munmaps. For 3., the file descriptor is closed as soon as it is received from the store, in the constructor of ClientMmapTableEntry. The memory of MutableBuffers are reclaimed by the same mechanism, they also create ClientMmapTableEntrys that will do the munmaps on destruction (there are in general many buffers per memory mapped region).

/// is a hash table mapping a file descriptor to a struct containing the
/// address of the corresponding memory-mapped file.
std::unordered_map<int, ClientMmapTableEntry> mmap_table_;
std::unordered_map<int, std::unique_ptr<ClientMmapTableEntry>> mmap_table_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why unique_ptr? If you do std::unordered_map<int, ClientMmapTableEntry>, then when you delete an entry from mmap_table_, its destructor will run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh do you need that in order to compile when using ARROW_DISALLOW_COPY_AND_ASSIGN?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, without the unique_ptr it would not be correct (upon rehashing). The constructor will run with unique_ptr as well.

class ClientMmapTableEntry {
public:
ClientMmapTableEntry(int fd, int64_t map_size)
: fd_(fd), pointer_(nullptr), length_(0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: fd_(fd), pointer_(nullptr), length_(0) {
: fd_(fd) {

there's no point to those initializations, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see @fsaintjacques suggested this. That's fine I guess, but what's the reason for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if we remove the copy constructor, then move can swap in un-initialized values.

@fsaintjacques
Copy link
Contributor

The problem with mutable buffer is that they do not extend the lifetime of the mmap-ed pointer. To change this, you could have a custom MutableBuffer that has a shared_ptr<ClientMmapTableEntry> (and also modify the Client class to use shared_ptr instead of unique_ptr`.

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Mar 21, 2019

@fsaintjacques I introduced PlasmaMutableBuffer to behave similar to the PlasmaBuffer class now, by holding a shared pointer to a PlasmaClient and keeping the client alive this way.

Copy link
Contributor

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@fsaintjacques
Copy link
Contributor

I know this is now merged, but I think it would have been better to link the Buffer to the mmap entry instead of the client, this way you get eager reclamation. Otherwise, one buffer holds all mmap.

robertnishihara pushed a commit to ray-project/arrow-build that referenced this pull request Mar 22, 2019
@pcmoritz
Copy link
Contributor Author

@fsaintjacques You are right, however since #3490, most buffers are allocated into a single memory mapped file [*], so it wouldn't help in practice. Unfortunately we still need the mechanism to support multiple memory mapped files since most malloc implementation won't be able to guarantee that there is actually only a single mmaped file, in practice it can happen that there is the large one and then another small one for example.

[*] This has two advantages: 1) it will use the whole memory available. Before, it had a doubling strategy that would lead to only half of the memory being used in the worst case and 2) it avoids that files are mapped and unmapped, which led to performance overheads.

kszucs pushed a commit that referenced this pull request Mar 24, 2019
Author: Philipp Moritz <pcmoritz@gmail.com>
Author: Robert Nishihara <robertnishihara@gmail.com>

Closes #4001 from pcmoritz/plasma-client-unmap and squashes the following commits:

893bcdc <Robert Nishihara> Update client.cc
784f9af <Philipp Moritz> add documentation
7988a85 <Philipp Moritz> add PlasmaMutableBuffer class
0a6203b <Philipp Moritz> fixes
af51c7a <Philipp Moritz> add documentation
53e4b1d <Philipp Moritz> unmap files in plasma client destructor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants