-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray Log] remove glog dependency #16077
[Ray Log] remove glog dependency #16077
Conversation
@qicosmos Can you try to send SIGABRT to raylet and paste the stack trace logs with and without glog here? |
We should delete this and all glog-related patches. Lines 184 to 194 in 498da13
|
src/ray/util/logging.cc
Outdated
local_stack.resize(50); | ||
absl::GetStackTrace(local_stack.data(), 50, 1); | ||
static constexpr size_t buf_size = 1 << 14; | ||
std::unique_ptr<char[]> buf(new char[buf_size]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need unique_ptr here?
src/ray/util/logging.cc
Outdated
std::vector<void *> local_stack; | ||
local_stack.resize(50); | ||
absl::GetStackTrace(local_stack.data(), 50, 1); | ||
static constexpr size_t buf_size = 1 << 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static constexpr size_t buf_size = 1 << 14; | |
static constexpr size_t buf_size = 16 * 1024; |
src/ray/util/logging.cc
Outdated
google::InitGoogleLogging(app_name_.c_str()); | ||
int level = GetMappedSeverity(static_cast<RayLogLevel>(severity_threshold_)); | ||
#endif | ||
#ifdef RAY_USE_SPDLOG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove RAY_USE_SPDLOG
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#ifdef RAY_USE_SPDLOG typedef ray::SpdLogMessage LoggingProvider; #else typedef ray::CerrLog LoggingProvider; #endif
If we don't define RAY_USE_SPDLOG, ray::CerrLog will be used. If we remove RAY_USE_SPDLOG, how to provide a default logging provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think CerrLog
is being used. We can delete it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with just deleting glog to make iteration faster btw. (and we can remove Cerrlog in the separate PR)
return; | ||
} | ||
google::InstallFailureSignalHandler(); | ||
google::InstallFailureWriter(&WriteFailureMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove this, who will call WriteFailureMessage
? I see that you removed all the signal handler-related code, but if we don't install the signal handler, how would we print stack trace on receiving a signal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should not be removed, i will test signal handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to rewrite singalhandler if these functions of glog should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, i will deal with signal handler later.
src/ray/util/logging.h
Outdated
/// have added a patch to allow glog to return the current call stack information | ||
/// through the internal interface. This function `GetCallTrace` is a wrapper | ||
/// providing a new detection function for debug or something like that. | ||
/// This function return the current call stack information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// This function return the current call stack information. | |
/// This function returns the current call stack information. |
I will do a test. |
…ve_glog_dependency
Can you also post some of stacktrace example? (e.g., show me how stacktrace looks like when SIGBUS happens) |
@qicosmos can you post how the stacktrace looks like? Also does it make sense to write a python test to see if it prints logs on some signals? (especially for SIGSEGV, SIGBART, and SIGBUS) |
signal_test will print the stacktrace, like this: [ RUN ] SignalTest.SendBusSignalTest |
@qicosmos Could you also provide the output if we use glog? One thing I'm concerned about is that the thread ID is missing in your pasted output, so we cannot identify which thread receives/sends the signal. Thread ID is quite useful when we encounter SIGSEGV. BTW, there's no assertion logic in @qicosmos Could you run a simple Ray application and then kill the Raylet and take the stack trace logs from the Raylet log files? |
bazel/ray_deps_setup.bzl
Outdated
"query": split_on_query[1].split("&") if len(split_on_query) > 1 else None, | ||
"fragment": split_on_anchor[1] if len(split_on_anchor) > 1 else None, | ||
} | ||
"scheme" : split_on_scheme[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert unrelated changes in this file
src/ray/util/logging_test.cc
Outdated
@@ -116,10 +116,13 @@ std::string TestFunctionLevel2() { | |||
#ifndef _WIN32 | |||
TEST(PrintLogTest, CallstackTraceTest) { | |||
auto ret0 = TestFunctionLevel0(); | |||
std::cout << ret0 << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple last comments!
std::string output; | ||
for (auto &stack : local_stack) { | ||
if (absl::Symbolize(stack, buf, buf_size)) { | ||
output.append(" ").append(buf).append("\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey is there any good way to add line information using addr2line here? https://linux.die.net/man/1/addr2line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just curious if that's possible since it looks like we can access each stack trace
src/ray/util/logging.cc
Outdated
google::InitGoogleLogging(app_name_.c_str()); | ||
int level = GetMappedSeverity(static_cast<RayLogLevel>(severity_threshold_)); | ||
#endif | ||
#ifdef RAY_USE_SPDLOG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with just deleting glog to make iteration faster btw. (and we can remove Cerrlog in the separate PR)
src/ray/util/logging_test.cc
Outdated
EXPECT_TRUE(ret1.find("TestFunctionLevel1") != std::string::npos); | ||
auto ret2 = TestFunctionLevel2(); | ||
std::cout << ret2 << std::endl; | ||
EXPECT_TRUE(ret2.find("TestFunctionLevel2") != std::string::npos); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a python test? Maybe doing like
Start a raylet -> send a signal -> check log file -> make sure stacktrace is in the .err file
This PR logging out:
The glog logging out:
|
os.kill(pid, SIGKILL) | ||
wait_for_pid_to_exit(pid) | ||
thread = check_thread(raylet_out_path) | ||
thread.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to join the thread.
Q1: Why do we need a thread here?
Q2: Will the exception thrown in the thread be populated to the main thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: #16077 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still didn't answer Q1 and Q2.
class check_thread(threading.Thread): | ||
def __init__(self, raylet_out_path): | ||
threading.Thread.__init__(self) | ||
self._daemonic = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _daemonic make the thread detach.
If we use thread.join() here, the log of SIGTERM can not be flushed, and we do not have a log flush interface, use detach thread can avoid the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this case, the test will pass without waiting for the thread to finish. How do you ensure the signal is in the log file?
s = f.read() | ||
assert len(s) > 0 | ||
print(s) | ||
assert "SIGTERM" in s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-Windows environments, you send a SIGKILL to Raylet. Why do we always assert SIGTERM here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows no SIGABRT, but has SIGTERM, use SIGTERM to support windows and linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Linux and macOS, you send SIGKILL according to signal.SIGKILL if sys.platform != "win32" else signal.SIGTERM
, right? But you assert SIGTERM
here. So I suppose the test will fail on Linux and macOS.
@rkooo567 Any more comments? |
Why are these changes needed?
We use spdlog for logging now, however we still rely on glog, because we need its GetCallTrace method, actually it can be replaced with absl.stacktrace, so this pr will remove glog dependency.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.