Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Add cached memory to available memory #10020

Merged
merged 3 commits into from
Aug 11, 2020

Conversation

lixin-wei
Copy link
Contributor

Why are these changes needed?

Cached memory should also be considered as available memory.

When I start ray cluster in a machine that has low available mem but high buff/cache memory, a rayOutOfMemoryError is raised.

2020-08-10 16:26:03,954 ERROR worker.py:1085 -- Possible unhandled error from worker: ray::JobWorker.__init__() (pid=42130, ip=11.166.237.22)
  File "python/ray/_raylet.pyx", line 435, in ray._raylet.execute_task
  File "/home/admin/ray/python/ray/memory_monitor.py", line 128, in raise_if_low_memory
    self.error_threshold))
ray.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node corgiunit-eu95-5.rz00b.stable.alipay.net is used (15.59 / 16.0 GB). 
$ free -h
              total        used        free      shared  buff/cache   available
Mem:            16G        5.9G        980M        3.6G        9.1G        980M
Swap:            0B          0B          0B

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

@lixin-wei lixin-wei requested a review from ericl August 10, 2020 08:36
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/29656/
Test FAILed.

Copy link
Member

@suquark suquark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.available/(1024**3)

@lixin-wei
Copy link
Contributor Author

.available/(1024**3)

My mistake, fixed, thanks a lot.

@lixin-wei
Copy link
Contributor Author

lixin-wei commented Aug 11, 2020

@suquark Are there any other problems? Could you please approve this PR so that it can be merged?

@lixin-wei lixin-wei requested a review from suquark August 11, 2020 05:12
@ray-project ray-project deleted a comment from AmplabJenkins Aug 11, 2020
@ericl ericl merged commit 71d2bde into ray-project:master Aug 11, 2020
rkooo567 added a commit to rkooo567/ray that referenced this pull request Aug 12, 2020
edoakes pushed a commit that referenced this pull request Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants