Skip to content

[lldb-dap] Addressing the order of events during disconnect to flush output. #128583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 24, 2025

Conversation

ashgti
Copy link
Contributor

@ashgti ashgti commented Feb 24, 2025

The TestDAP_ouput test is flaky due to the order of events during shutdown. We were stopping the output and error handle redirection after we finished the disconnect request, which can cause us to miss output events due to timing. Moving when we stop the redirection to ensure we have consistent output prior to disconnect responding.

Fixes #128567

… we shutdown the debugger output handle and error handle redirection before disconnect completes.

The TestDAP_ouput test is flaky due to the order of events during shutdown.  We were stopping the output and error handle redirection after we finished the disconnect request, which can cause us to miss output events due to timing. Moving when we stop the redirection to ensure we have consistent output prior to disconnect responding.
@ashgti ashgti marked this pull request as ready for review February 24, 2025 21:49
@ashgti ashgti requested a review from JDevlieghere as a code owner February 24, 2025 21:49
@llvmbot
Copy link
Member

llvmbot commented Feb 24, 2025

@llvm/pr-subscribers-lldb

Author: John Harrison (ashgti)

Changes

The TestDAP_ouput test is flaky due to the order of events during shutdown. We were stopping the output and error handle redirection after we finished the disconnect request, which can cause us to miss output events due to timing. Moving when we stop the redirection to ensure we have consistent output prior to disconnect responding.

Fixes #128567


Full diff: https://github.com/llvm/llvm-project/pull/128583.diff

3 Files Affected:

  • (modified) lldb/test/API/tools/lldb-dap/output/TestDAP_output.py (+4)
  • (modified) lldb/tools/lldb-dap/DAP.cpp (+6-5)
  • (modified) lldb/tools/lldb-dap/DAP.h (+2-2)
diff --git a/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py b/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py
index eae7629409844..fe54511d1e21f 100644
--- a/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py
+++ b/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py
@@ -16,6 +16,7 @@ def test_output(self):
         program = self.getBuildArtifact("a.out")
         self.build_and_launch(
             program,
+            disconnectAutomatically=False,
             exitCommands=[
                 # Ensure that output produced by lldb itself is not consumed by the OutputRedirector.
                 "?script print('out\\0\\0', end='\\r\\n', file=sys.stdout)",
@@ -33,6 +34,9 @@ def test_output(self):
 
         self.continue_to_exit()
 
+        # Disconnecting from the server to ensure any pending IO is flushed.
+        self.dap_server.request_disconnect()
+
         output += self.get_stdout(timeout=lldbdap_testcase.DAPTestCaseBase.timeoutval)
         self.assertTrue(output and len(output) > 0, "expect program stdout")
         self.assertIn(
diff --git a/lldb/tools/lldb-dap/DAP.cpp b/lldb/tools/lldb-dap/DAP.cpp
index 9b22b60a68d94..7a5cd39535617 100644
--- a/lldb/tools/lldb-dap/DAP.cpp
+++ b/lldb/tools/lldb-dap/DAP.cpp
@@ -223,10 +223,7 @@ llvm::Error DAP::ConfigureIO(std::FILE *overrideOut, std::FILE *overrideErr) {
   return llvm::Error::success();
 }
 
-void DAP::StopIO() {
-  out.Stop();
-  err.Stop();
-
+void DAP::StopEventHandlers() {
   if (event_thread.joinable()) {
     broadcaster.BroadcastEventByType(eBroadcastBitStopEventThread);
     event_thread.join();
@@ -853,13 +850,17 @@ lldb::SBError DAP::Disconnect(bool terminateDebuggee) {
 
   SendTerminatedEvent();
 
+  // Stop forwarding the debugger output and error handles.
+  out.Stop();
+  err.Stop();
+
   disconnecting = true;
 
   return error;
 }
 
 llvm::Error DAP::Loop() {
-  auto stop_io = llvm::make_scope_exit([this]() { StopIO(); });
+  auto cleanup = llvm::make_scope_exit([this]() { StopEventHandlers(); });
   while (!disconnecting) {
     llvm::json::Object object;
     lldb_dap::PacketStatus status = GetNextObject(object);
diff --git a/lldb/tools/lldb-dap/DAP.h b/lldb/tools/lldb-dap/DAP.h
index 18de39838f218..d0e1217aaeb8e 100644
--- a/lldb/tools/lldb-dap/DAP.h
+++ b/lldb/tools/lldb-dap/DAP.h
@@ -224,8 +224,8 @@ struct DAP {
   llvm::Error ConfigureIO(std::FILE *overrideOut = nullptr,
                           std::FILE *overrideErr = nullptr);
 
-  /// Stop the redirected IO threads and associated pipes.
-  void StopIO();
+  /// Stop event handler threads.
+  void StopEventHandlers();
 
   // Serialize the JSON value into a string and send the JSON packet to
   // the "out" stream.

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashgti ashgti merged commit e8f1623 into llvm:main Feb 24, 2025
14 checks passed
@ashgti ashgti deleted the lldb-dap-output-test branch February 24, 2025 22:43
@labath
Copy link
Collaborator

labath commented Feb 25, 2025

It looks like it's still flaky: https://lab.llvm.org/buildbot/#/builders/18/builds/11957

I think there's still at least one possible way this can race (return without reading all data):

  1. read thread returns from the read call (reads some random data) and gets suspended
  2. we begin shutting down: set the flag, and write the sentinel to the pipe
  3. read thread wakes up, checks the flag and decides to exit (without reading our sentinel, or any other remaining data)

I think that one way to improve this would be to do a non-blocking read to drain the pipe before returning from the read thread.. which reminds me that I wanted to fix the problem which prevented you from using the Pipe class ~~> #128719

SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Mar 8, 2025
…output. (llvm#128583)

The TestDAP_ouput test is flaky due to the order of events during
shutdown. We were stopping the output and error handle redirection after
we finished the disconnect request, which can cause us to miss output
events due to timing. Moving when we stop the redirection to ensure we
have consistent output prior to disconnect responding.

Fixes llvm#128567
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Mar 20, 2025
…output. (llvm#128583)

The TestDAP_ouput test is flaky due to the order of events during
shutdown. We were stopping the output and error handle redirection after
we finished the disconnect request, which can cause us to miss output
events due to timing. Moving when we stop the redirection to ensure we
have consistent output prior to disconnect responding.

Fixes llvm#128567
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Apr 2, 2025
…output. (llvm#128583)

The TestDAP_ouput test is flaky due to the order of events during
shutdown. We were stopping the output and error handle redirection after
we finished the disconnect request, which can cause us to miss output
events due to timing. Moving when we stop the redirection to ensure we
have consistent output prior to disconnect responding.

Fixes llvm#128567
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Apr 17, 2025
…output. (llvm#128583)

The TestDAP_ouput test is flaky due to the order of events during
shutdown. We were stopping the output and error handle redirection after
we finished the disconnect request, which can cause us to miss output
events due to timing. Moving when we stop the redirection to ensure we
have consistent output prior to disconnect responding.

Fixes llvm#128567
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Apr 30, 2025
…output. (llvm#128583)

The TestDAP_ouput test is flaky due to the order of events during
shutdown. We were stopping the output and error handle redirection after
we finished the disconnect request, which can cause us to miss output
events due to timing. Moving when we stop the redirection to ensure we
have consistent output prior to disconnect responding.

Fixes llvm#128567
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request May 15, 2025
…output. (llvm#128583)

The TestDAP_ouput test is flaky due to the order of events during
shutdown. We were stopping the output and error handle redirection after
we finished the disconnect request, which can cause us to miss output
events due to timing. Moving when we stop the redirection to ensure we
have consistent output prior to disconnect responding.

Fixes llvm#128567
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request May 29, 2025
…output. (llvm#128583)

The TestDAP_ouput test is flaky due to the order of events during
shutdown. We were stopping the output and error handle redirection after
we finished the disconnect request, which can cause us to miss output
events due to timing. Moving when we stop the redirection to ensure we
have consistent output prior to disconnect responding.

Fixes llvm#128567
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Jun 13, 2025
…output. (llvm#128583)

The TestDAP_ouput test is flaky due to the order of events during
shutdown. We were stopping the output and error handle redirection after
we finished the disconnect request, which can cause us to miss output
events due to timing. Moving when we stop the redirection to ensure we
have consistent output prior to disconnect responding.

Fixes llvm#128567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[lldb-dap] TestDAP_output test is flaky
4 participants