out_of_core configuration and documentation #3845

Peji-moghimi · 2021-12-11T18:08:59Z

System information

OS Distribution: Centos linux 7
Memory: 376 GB DDR4
Modin version: 0.9.1
Ray version: 1.1.0
Python version: 3.5.1

I'm still unsure as to what the documentation is suggesting here.

Does the line below, as it stands, disable out-of-core, or does it only disable out-of-core when _plasma_directory=None? If it does disable out-of-core as it stands, how does one specify a desired directory for spilling, instead of the default spilling directory (as I cannot use the default)?

ray.init(_plasma_directory="/tmp") # setting to disable out of core in Ray

Currently, the following is my setup:

import ray
def ray_init():    
    ray.init(_temp_dir="/some/specific/path/ray/tmp/", 
             _plasma_directory="/some/specific/path/ray/",
             _memory=3000000000000,
             object_store_memory=3000000000000)
    os.environ["MODIN_ENGINE"] = "ray"
    os.environ['MODIN_OUT_OF_CORE']='true'
    os.environ['MODIN_MEMORY']='365000000000'
    return None

ray_init()

import modin.pandas as pd
from modin.config import ProgressBar
ProgressBar.enable()

df_128gb = pd.read_csv('df_128gb.csv', low_memory=True, memory_map=True)

I just want to know if this is the most memory efficient setup, which would prevent my program running out of RAM, by spilling onto disk, no matter how large the dataframe (within the bounds of my disk space)? And furthermore, does that extend to doing very expensive operations such as merge?

I would really appreciate it if you could settle this for me.

Thanks!
Pej

Originally posted by @Peji-moghimi in #3705 (comment)

The text was updated successfully, but these errors were encountered:

devin-petersohn · 2021-12-13T19:29:38Z

Hi @Peji-moghimi, thanks for the email. You are right, we should put more clarity in the docs.

Setting _plasma_directory at all will disable the built-in out of core in Ray, but it is still going to be a memory mapped file so it will still be able to use the disk. This will use the operating system to page data to/from memory, which still works fine. We should make the docs more clear on this.

I think your ray.init looks fine, are you running into any specific problem?

anmyachev added the documentation 📜 Updates and issues with the documentation label Apr 21, 2022

vnlitvinov added Needs more information ❔ Issues that require more information from the reporter P3 Very minor bugs, or features we can hopefully add some day. labels Aug 26, 2022

anmyachev added the External Pull requests and issues from people who do not regularly contribute to modin label Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out_of_core configuration and documentation #3845

out_of_core configuration and documentation #3845

Peji-moghimi commented Dec 11, 2021 •

edited

Loading

devin-petersohn commented Dec 13, 2021

out_of_core configuration and documentation #3845

out_of_core configuration and documentation #3845

Comments

Peji-moghimi commented Dec 11, 2021 • edited Loading

devin-petersohn commented Dec 13, 2021

Peji-moghimi commented Dec 11, 2021 •

edited

Loading