-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix log monitor read error #5221
fix log monitor read error #5221
Conversation
|
Test PASSed. |
Test PASSed. |
|
python/ray/log_monitor.py
Outdated
@@ -93,7 +101,9 @@ def update_log_filenames(self): | |||
|
|||
for log_filename in log_filenames: | |||
full_path = os.path.join(self.logs_dir, log_filename) | |||
if full_path not in self.log_filenames: | |||
filename = full_path.split("/")[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.basename?
python/ray/log_monitor.py
Outdated
@@ -93,7 +101,9 @@ def update_log_filenames(self): | |||
|
|||
for log_filename in log_filenames: | |||
full_path = os.path.join(self.logs_dir, log_filename) | |||
if full_path not in self.log_filenames: | |||
filename = full_path.split("/")[-1] | |||
if (file_should_monitor(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't even run right? self.file_should_monitor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use glob instead?
thanks @richardliaw for the comments. Is there any way to running the UT tests in locally? I have run the
|
Test PASSed. |
if full_path not in self.log_filenames: | ||
self.log_filenames.add(full_path) | ||
# we only monior worker log files | ||
log_file_paths = glob.glob("{}/worker*[.out|.err]".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't there other log files that the log_monitor tracks? like the monitor log, the raylet log?
@@ -195,7 +199,7 @@ def check_log_files_and_publish_updates(self): | |||
# Record the current position in the file. | |||
file_info.file_position = file_info.file_handle.tell() | |||
|
|||
if len(lines_to_publish) > 0 and is_worker: | |||
if len(lines_to_publish) > 0: | |||
self.redis_client.publish( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardliaw we only publish worker log file, not sure why. Please correct me if I am not right.
@ConeyLiu, thanks for submitting the PR. You're right that we currently only stream the worker logs back to the driver. However, in the future I think it would make sense to collect all of the process logs and store them in some central database or something like that. The PR looks good to me. |
thanks a lot @richardliaw @robertnishihara. |
What do these changes do?
Currently, we will encounter some errors when monitoring all the file under logs. And we also just need the worker logfile from the code. In this patch, we only read those file which matches a worker log file format.
Related issue number
Closes #5220
Linter
scripts/format.sh
to lint the changes in this PR.