Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent hasher from running out of memory on large files #2451

Merged
merged 5 commits into from
Jul 29, 2018

Conversation

sseveran
Copy link
Contributor

What do these changes do?

When the autoscaler calculates the hash it can run out of memory on large files (3GB in my case). I added support for incremental hashing on files larger than 1GB to fix this. I just picked this threshold arbitrarily and we could turn it down if that is desired.

Related issue number

I did not open an issue.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6764/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6777/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6779/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one comment on the printing.

ip = self.local_scheduler_id_to_ip_map.get(client_id)
if ip:
self.load_metrics.update(ip, static_resources, dynamic_resources)
else:
print("Warning: could not find ip for client {}."
.format(client_id))
for keys, values in self.local_scheduler_id_to_ip_map.items():
print(keys)
print(values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a single print of the entire map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my bad. This was some later debugging I was doing. Didn't know it would automatically end up here. I will remove it.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6841/
Test FAILed.

@sseveran
Copy link
Contributor Author

@ericl I don't understand why the test is failing

@ericl
Copy link
Contributor

ericl commented Jul 27, 2018

�[32;1mThe command "cd .." exited with 0.�[0m
travis_time:start:047f9cbc
�[0K$ flake8 --exclude=python/ray/core/src/common/flatbuffers_ep-prefix/,python/ray/core/generated/,src/common/format/,doc/source/conf.py,python/ray/cloudpickle/
./python/ray/autoscaler/autoscaler.py:653:80: E501 line too long (85 > 79 characters)

travis_time:end:047f9cbc:start=1532559737674351176,finish=1532559750107885280,duration=12433534104
�[0K
�[31;1mThe command "flake8 --exclude=python/ray/core/src/common/flatbuffers_ep-prefix/,python/ray/core/generated/,src/common/format/,doc/source/conf.py,python/ray/cloudpickle/" exited with 1.�[0m
travis_time:start:004128c0
�[0K$ .travis/yapf.sh --all
From https://github.com/ray-project/ray

  • branch master -> FETCH_HEAD
  • [new branch] master -> upstream/master
    --- python/ray/autoscaler/autoscaler.py (original)
    +++ python/ray/autoscaler/autoscaler.py (reformatted)
    @@ -650,7 +650,8 @@
    for name in filenames:
    hasher.update(name.encode("utf-8"))
    with open(os.path.join(dirpath, name), "rb") as f:
  •                    if os.path.getsize(os.path.join(dirpath, name)) < 1000000000:
    
  •                    if os.path.getsize(os.path.join(dirpath,
    
  •                                                    name)) < 1000000000:
                           hasher.update(binascii.hexlify(f.read()))
                       else:
                           for chunk in iter(lambda: f.read(8192), b''):
    

travis_time:end:004128c0:start=1532559750112843161,finish=1532559783434014143,duration=33321170982
�[0K
�[31;1mThe command ".travis/yapf.sh --all" exited with 1.�[0m

Done. Your build exited with 1.

@ericl
Copy link
Contributor

ericl commented Jul 27, 2018

Some lint errors, other tests look unrelated.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6906/
Test FAILed.

@sseveran
Copy link
Contributor Author

Lint error fixed.

@ericl
Copy link
Contributor

ericl commented Jul 29, 2018

Looks good, thanks!

@ericl ericl merged commit f1b4ea6 into ray-project:master Jul 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants