Skip to content

[LLDB][Minidump]Update MinidumpFileBuilder to read and write in chunks #129307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 8, 2025

Conversation

Jlalond
Copy link
Contributor

@Jlalond Jlalond commented Feb 28, 2025

I recently received an internal error report that LLDB was OOM'ing when creating a Minidump. In my 64b refactor we made a decision to acquire buffers the size of the largest memory region so we could read all of the contents in one call. This made error handling very simple (and simpler coding for me!) but had the trade off of large allocations if huge pages were enabled.

This patch is one I've had on the back burner for awhile, but we can read and write the Minidump memory sections in discrete chunks which we already do for writing to disk.

I had to refactor the error handling a bit, but it remains the same. We make a best effort attempt to read as much of the memory region as possible, but fail immediately if we receive an error writing to disk. I did not add new tests for this because our existing test suite is quite good, but I did manually verify a few Minidumps couldn't read beyond the red_zone.

(lldb) reg read $sp
     rsp = 0x00007fffffffc3b0
(lldb) p/x 0x00007fffffffc3b0 - 128
(long) 0x00007fffffffc330
(lldb) memory read 0x00007fffffffc330
0x7fffffffc330: 60 c3 ff ff ff 7f 00 00 60 cd ff ff ff 7f 00 00  `.......`.......
0x7fffffffc340: 60 c3 ff ff ff 7f 00 00 65 e6 26 00 00 00 00 00  `.......e.&.....
(lldb) memory read 0x00007fffffffc329
error: could not parse memory info (Success!)

I'm not sure how to quantify the memory improvement other than we would allocate the largest size regardless of the size. So a 2gb unreadable region would cause a 2gb allocation even if we were reading 4096 kb. Now we will take the range size or the max chunk size of 128 mb.

@llvmbot
Copy link
Member

llvmbot commented Feb 28, 2025

@llvm/pr-subscribers-lldb

Author: Jacob Lalonde (Jlalond)

Changes

I recently received an internal error report that LLDB was OOM'ing when creating a Minidump. In my 64b refactor we made a decision to acquire buffers the size of the largest memory region so we could read all of the contents in one call. This made error handling very simple (and simpler coding for me!) but had the trade off of large allocations if huge pages were enabled.

This patch is one I've had on the back burner for awhile, but we can read and write the Minidump memory sections in discrete chunks which we already do for writing to disk.

I had to refactor the error handling a bit, but it remains the same. We make a best effort attempt to read as much of the memory region as possible, but fail immediately if we receive an error writing to disk. I did not add new tests for this because our existing test suite is quite good, but I did manually verify a few Minidumps couldn't read beyond the red_zone.

(lldb) reg read $sp
     rsp = 0x00007fffffffc3b0
(lldb) p/x 0x00007fffffffc3b0 - 128
(long) 0x00007fffffffc330
(lldb) memory read 0x00007fffffffc330
0x7fffffffc330: 60 c3 ff ff ff 7f 00 00 60 cd ff ff ff 7f 00 00  `.......`.......
0x7fffffffc340: 60 c3 ff ff ff 7f 00 00 65 e6 26 00 00 00 00 00  `.......e.&.....
(lldb) memory read 0x00007fffffffc329
error: could not parse memory info

I'm not sure how to quantify the memory improvement other than we would allocate the largest size regardless of the size. So a 2gb unreadable region would cause a 2gb allocation even if we were reading 4096 kb. Now we will take the range size or the max chunk size of 128 mb.


Full diff: https://github.com/llvm/llvm-project/pull/129307.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp (+90-40)
  • (modified) lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.h (+7)
diff --git a/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp b/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp
index c5013ea5e3be4..e88b606f279cd 100644
--- a/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp
+++ b/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp
@@ -969,12 +969,82 @@ Status MinidumpFileBuilder::DumpDirectories() const {
   return error;
 }
 
-static uint64_t
-GetLargestRangeSize(const std::vector<CoreFileMemoryRange> &ranges) {
-  uint64_t max_size = 0;
-  for (const auto &core_range : ranges)
-    max_size = std::max(max_size, core_range.range.size());
-  return max_size;
+Status MinidumpFileBuilder::ReadWriteMemoryInChunks(
+    const lldb_private::CoreFileMemoryRange &range, uint64_t *bytes_read) {
+  Log *log = GetLog(LLDBLog::Object);
+  lldb::addr_t addr = range.range.start();
+  lldb::addr_t size = range.range.size();
+  // First we set the byte tally to 0, so if we do exit gracefully
+  // the caller doesn't think the random garbage on the stack is a
+  // success.
+  if (bytes_read)
+    *bytes_read = 0;
+
+  uint64_t bytes_remaining = size;
+  uint64_t total_bytes_read = 0;
+  auto data_up = std::make_unique<DataBufferHeap>(
+      std::min(bytes_remaining, MAX_WRITE_CHUNK_SIZE), 0);
+  Status error;
+  while (bytes_remaining > 0) {
+    // Get the next read chunk size as the minimum of the remaining bytes and
+    // the write chunk max size.
+    const size_t bytes_to_read =
+        std::min(bytes_remaining, MAX_WRITE_CHUNK_SIZE);
+    const size_t bytes_read_for_chunk =
+        m_process_sp->ReadMemory(range.range.start() + total_bytes_read,
+                                 data_up->GetBytes(), bytes_to_read, error);
+    if (error.Fail() || bytes_read_for_chunk == 0) {
+      LLDB_LOGF(log,
+                "Failed to read memory region at: %" PRIx64
+                ". Bytes read: %zu, error: %s",
+                addr, bytes_read_for_chunk, error.AsCString());
+      // If we've only read one byte we can just give up and return
+      if (total_bytes_read == 0)
+        return Status();
+
+      // If we've read some bytes, we stop trying to read more and return
+      // this best effort attempt
+      bytes_remaining = 0;
+    } else if (bytes_read_for_chunk != bytes_to_read) {
+      LLDB_LOGF(
+          log, "Memory region at: %" PRIx64 " failed to read %" PRIx64 " bytes",
+          addr, size);
+
+      // If we've read some bytes, we stop trying to read more and return
+      // this best effort attempt
+      bytes_remaining = 0;
+    }
+
+    // Write to the minidump file with the chunk potentially flushing to disk.
+    // this is the only place we want to return a true error, so that we can
+    // fail. If we get an error writing to disk we can't easily gaurauntee
+    // that we won't corrupt the minidump.
+    error = AddData(data_up->GetBytes(), bytes_read_for_chunk);
+    if (error.Fail())
+      return error;
+
+    // This check is so we don't overflow when the error code above sets the
+    // bytes to read to 0 (the graceful exit condition).
+    if (bytes_remaining > 0)
+      bytes_remaining -= bytes_read_for_chunk;
+
+    total_bytes_read += bytes_read_for_chunk;
+    // If the caller wants a tally back of the bytes_read, update it as we
+    // write. We do this in the loop so if we encounter an error we can
+    // report the accurate total.
+    if (bytes_read)
+      *bytes_read += bytes_read_for_chunk;
+
+    // We clear the heap per loop, without checking if we
+    // read the expected bytes this is so we don't allocate
+    // more than the MAX_WRITE_CHUNK_SIZE. But we do check if
+    // this is even worth clearing before we return and
+    // destruct the heap.
+    if (bytes_remaining > 0)
+      data_up->Clear();
+  }
+
+  return error;
 }
 
 Status
@@ -987,8 +1057,6 @@ MinidumpFileBuilder::AddMemoryList_32(std::vector<CoreFileMemoryRange> &ranges,
 
   Log *log = GetLog(LLDBLog::Object);
   size_t region_index = 0;
-  auto data_up =
-      std::make_unique<DataBufferHeap>(GetLargestRangeSize(ranges), 0);
   for (const auto &core_range : ranges) {
     // Take the offset before we write.
     const offset_t offset_for_data = GetCurrentDataEndOffset();
@@ -1003,18 +1071,15 @@ MinidumpFileBuilder::AddMemoryList_32(std::vector<CoreFileMemoryRange> &ranges,
     ++region_index;
 
     progress.Increment(1, "Adding Memory Range " + core_range.Dump());
-    const size_t bytes_read =
-        m_process_sp->ReadMemory(addr, data_up->GetBytes(), size, error);
-    if (error.Fail() || bytes_read == 0) {
-      LLDB_LOGF(log, "Failed to read memory region. Bytes read: %zu, error: %s",
-                bytes_read, error.AsCString());
-      // Just skip sections with errors or zero bytes in 32b mode
+    uint64_t bytes_read;
+    error = ReadWriteMemoryInChunks(core_range, &bytes_read);
+    if (error.Fail())
+      return error;
+
+    // If we completely failed to read this range
+    // we can just omit any of the book keeping.
+    if (bytes_read == 0)
       continue;
-    } else if (bytes_read != size) {
-      LLDB_LOGF(
-          log, "Memory region at: %" PRIx64 " failed to read %" PRIx64 " bytes",
-          addr, size);
-    }
 
     MemoryDescriptor descriptor;
     descriptor.StartOfMemoryRange =
@@ -1026,11 +1091,6 @@ MinidumpFileBuilder::AddMemoryList_32(std::vector<CoreFileMemoryRange> &ranges,
     descriptors.push_back(descriptor);
     if (m_thread_by_range_end.count(end) > 0)
       m_thread_by_range_end[end].Stack = descriptor;
-
-    // Add the data to the buffer, flush as needed.
-    error = AddData(data_up->GetBytes(), bytes_read);
-    if (error.Fail())
-      return error;
   }
 
   // Add a directory that references this list
@@ -1106,8 +1166,6 @@ MinidumpFileBuilder::AddMemoryList_64(std::vector<CoreFileMemoryRange> &ranges,
 
   Log *log = GetLog(LLDBLog::Object);
   size_t region_index = 0;
-  auto data_up =
-      std::make_unique<DataBufferHeap>(GetLargestRangeSize(ranges), 0);
   for (const auto &core_range : ranges) {
     const addr_t addr = core_range.range.start();
     const addr_t size = core_range.range.size();
@@ -1120,27 +1178,19 @@ MinidumpFileBuilder::AddMemoryList_64(std::vector<CoreFileMemoryRange> &ranges,
     ++region_index;
 
     progress.Increment(1, "Adding Memory Range " + core_range.Dump());
-    const size_t bytes_read =
-        m_process_sp->ReadMemory(addr, data_up->GetBytes(), size, error);
-    if (error.Fail()) {
-      LLDB_LOGF(log, "Failed to read memory region. Bytes read: %zu, error: %s",
-                bytes_read, error.AsCString());
-      error.Clear();
+    uint64_t bytes_read;
+    error = ReadWriteMemoryInChunks(core_range, &bytes_read);
+    if (error.Fail())
+      return error;
+
+    if (bytes_read == 0) {
       cleanup_required = true;
       descriptors[region_index].DataSize = 0;
     }
     if (bytes_read != size) {
-      LLDB_LOGF(
-          log, "Memory region at: %" PRIx64 " failed to read %" PRIx64 " bytes",
-          addr, size);
       cleanup_required = true;
       descriptors[region_index].DataSize = bytes_read;
     }
-
-    // Add the data to the buffer, flush as needed.
-    error = AddData(data_up->GetBytes(), bytes_read);
-    if (error.Fail())
-      return error;
   }
 
   // Early return if there is no cleanup needed.
diff --git a/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.h b/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.h
index 48293ee1bf5e5..b9ebb0d9a28b7 100644
--- a/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.h
+++ b/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.h
@@ -142,6 +142,13 @@ class MinidumpFileBuilder {
   lldb_private::Status AddDirectory(llvm::minidump::StreamType type,
                                     uint64_t stream_size);
   lldb::offset_t GetCurrentDataEndOffset() const;
+
+  // Read a memory region from the process and write it to the file
+  // in fixed size chunks.
+  lldb_private::Status
+  ReadWriteMemoryInChunks(const lldb_private::CoreFileMemoryRange &range,
+                          uint64_t *bytes_read);
+
   // Stores directories to fill in later
   std::vector<llvm::minidump::Directory> m_directories;
   // When we write off the threads for the first time, we need to clean them up

@Jlalond Jlalond changed the title Update MinidumpFileBuilder to read and write in chunks [LLDB][Minidump]Update MinidumpFileBuilder to read and write in chunks Feb 28, 2025
@Jlalond Jlalond requested review from labath, clayborg, mbucko and jeffreytan81 and removed request for JDevlieghere February 28, 2025 21:34
@Jlalond
Copy link
Contributor Author

Jlalond commented Feb 28, 2025

@labath / @clayborg to preempt some expected feedback, I intend to expose this chunk size in a future setting

@Jlalond Jlalond force-pushed the minidump-file-builder-read-in-chunks branch from 1b6444a to 4d55b06 Compare March 3, 2025 19:58
Comment on lines 992 to 993
const size_t bytes_to_read =
std::min(bytes_remaining, MAX_WRITE_CHUNK_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of this -- you are trying to ensure the reading won't exceed data_up buffer size, then you should use data_up->GetSize() here instead of MAX_WRITE_CHUNK_SIZE. What is the data_up is smaller than MAX_WRITE_CHUNK_SIZE? Then you are relying on the fact that size from the range ensures data_up size. You can see there are two much coupling logic between caller and callee here.

I would change to std::min(bytes_remaining, data_up->GetSize());. That's the purpose of passing data_up into this function anyway, otherwise, we should pass data_up->GetBytes() into this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This was a missed change on my part, I moved the buffer declaration back up into the AddMemory_ methods, and we should've used the upper bound of the buffer here, not MAX_WRITE. I have fixed accordingly.


// If we've read some bytes, we stop trying to read more and return
// this best effort attempt
bytes_remaining = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply return from here? This logic relies on the indirect condition that loop terminates with "bytes_remaining = 0", which can easily break during future refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the later code still executes "AddData(data_up->GetBytes(), bytes_read_for_chunk);" after error.Fail() then overwrites error object. Another reason we should early return here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jeffreytan81, my thought process here is we managed to read any bytes, we should append those to the buffer and return bytes read up to the point.

So in my hypothetical we try to read 1000 bytes from an address, and only read 500. I still wanted to save off those 500. Are you against this proposal? It's why in error conditions other than AddData() I just set the loop to not continue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with another exit statement, the bytes_remaining == 0 does feel quite brittle. But my primary concern here is do we include or exclude partial reads. In the existing system we would just skip them. So there is a precedence for it, I think by trying to include the extra bytes we can improve the user experience.

I will update the check that if bytes_read_for_chunk == 0 we also just quick return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the error case to be an early but non fatal exit. I added some comments why it's non fatal while AddData() is. I added an explicit case for partial reads, where we set a flag that will terminate the loop, but we still add the data, update the bytes_read count and then discontinue any further reading.

I also changed the overflow to an error, which I'm much happier with.

Comment on lines 1013 to 1014
log, "Memory region at: %" PRIx64 " failed to read %" PRIx64 " bytes",
addr, size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging the fact of partial read, and bytes_read_for_chunk variable instead of simply saying "failed to read".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

for (const auto &core_range : ranges)
max_size = std::max(max_size, core_range.range.size());
return max_size;
Status MinidumpFileBuilder::ReadWriteMemoryInChunks(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we build this into PostMortemProcess? It would be nice to not have to replicate this funtionality in every core file implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not opposed, but because this patch has been open for quite some time can I make that a follow up patch?

max_size = std::max(max_size, core_range.range.size());
return max_size;
Status MinidumpFileBuilder::ReadWriteMemoryInChunks(
const lldb_private::CoreFileMemoryRange &range, uint64_t *bytes_read) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change uint64_t *bytes_read to uint64_t &bytes_read

@@ -969,6 +969,83 @@ Status MinidumpFileBuilder::DumpDirectories() const {
return error;
}

Status MinidumpFileBuilder::ReadWriteMemoryInChunks(
const std::unique_ptr<lldb_private::DataBufferHeap> &data_up,
const lldb_private::CoreFileMemoryRange &range, uint64_t *bytes_read) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to a uint64_t &bytes_read

@@ -969,6 +969,83 @@ Status MinidumpFileBuilder::DumpDirectories() const {
return error;
}

Status MinidumpFileBuilder::ReadWriteMemoryInChunks(
const std::unique_ptr<lldb_private::DataBufferHeap> &data_up,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to a lldb_private::DataBufferHeap &data. Unless we are doing something with the unique_ptr here that makes sense, there is no reason to use a unique pointer in the args.

Comment on lines 1067 to 1050
auto data_up = std::make_unique<DataBufferHeap>(
std::min(GetLargestRangeSize(ranges), MAX_WRITE_CHUNK_SIZE), 0);
for (const auto &core_range : ranges) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to create a unique pointer here. Just create a local object:

DataBufferHeap data(std::min(GetLargestRangeSize(ranges), MAX_WRITE_CHUNK_SIZE), 0);

LLDB_LOGF(log, "Failed to read memory region. Bytes read: %zu, error: %s",
bytes_read, error.AsCString());
// Just skip sections with errors or zero bytes in 32b mode
uint64_t bytes_read;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init with zero:

uint64_t bytes_read = 0;

Comment on lines 1160 to 1142
auto data_up = std::make_unique<DataBufferHeap>(
std::min(GetLargestRangeSize(ranges), MAX_WRITE_CHUNK_SIZE), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't create a unique pointer, just create an instance like above

Comment on lines 145 to 152

// Read a memory region from the process and write it to the file
// in fixed size chunks.
lldb_private::Status ReadWriteMemoryInChunks(
const std::unique_ptr<lldb_private::DataBufferHeap> &data_up,
const lldb_private::CoreFileMemoryRange &range, uint64_t *bytes_read);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into the PostMortemProcess.h/.cpp and have it take a lambda callback:

lldb_private::Status ReadMemoryInChunks(
    lldb_private::CoreFileMemoryRange &range, 
    uint64_t chunk_size, 
    lldb_private::DataBufferHeap &data, // Re-use between calls if needed, might expand to chunk_size
    const std::function<Status(const uint8_t *bytes, uint64_t length)> &chunk_callback,
    uint64_t &bytes_read); 

Then chunk_callback gets called for each chunk that is read and your ProcessMinidump can pass in a lambda that just calls AddData(...)

There is no need to create a const std::unique_ptr<lldb_private::DataBufferHeap> &data_up, just create a local lldb_private::DataBufferHeap heap(...). The unique pointer does nothing useful here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with Greg offline, I want to add this to process, but the effort to unit test it will delay this patch more than I would like, so I'll follow up this patch with a refactor of this beahvior into Process

@Jlalond Jlalond force-pushed the minidump-file-builder-read-in-chunks branch from a3ef677 to 63ddb80 Compare March 14, 2025 21:40
@Jlalond Jlalond force-pushed the minidump-file-builder-read-in-chunks branch from 63ddb80 to 42e84ed Compare April 1, 2025 16:43
@Jlalond
Copy link
Contributor Author

Jlalond commented Apr 1, 2025

Update

After discussing with Greg, I've centralized the read memory loop logic in Process, and defined a callback that has all the relevant information from the last memory read. I've refactored the Minidump method to capture the fatal error in the outer scope and let the underlying process implementation handle the looping and reading.

I also have found we're using ReadMemory (not just in Minidump) when we should be using ReadMemoryFromInferior, so I'll have several follow up patches to fix this.

@@ -2233,6 +2233,40 @@ size_t Process::ReadMemoryFromInferior(addr_t addr, void *buf, size_t size,
return bytes_read;
}

size_t Process::ReadMemoryInChunks(lldb::addr_t vm_addr, DataBufferHeap &data,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to pass in a DataBufferHeap into this API. The API should just take the chunk size as a parameter:

size_t Process::ReadMemoryInChunks(lldb::addr_t vm_addr, lldb::addr_t vm_size, lldb::addr_t chunk_size, ReadMemoryChunkCallback callback);

Then you can create a local DataBufferHeap object in this function outside of the loop that reads the chunks, so the same buffer gets re-used.

Comment on lines 1601 to 1602
typedef std::function<IterationAction(lldb_private::Status &,
lldb_private::DataBufferHeap &,
lldb::addr_t, uint64_t, uint64_t)>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just pass the bytes without requiring someone to use a DataBufferHeap. I suggest a change below that passes the chunk size into the Process::ReadMemoryInChunks(...) call, so we don't need the bytes_to_read argument. So this callback can be:

typedef std::function<IterationAction(
  Status &error, /// The error for the current memory chunk that was read
  lldb::addr_t bytes_addr, /// The address corresponding to the bytes supplied
  const uint8_t *bytes, /// The bytes for the current chunk
  uint64_t bytes_size, /// The number of bytes in the bytes argument
)>

size_t ReadMemoryInChunks(lldb::addr_t vm_addr, DataBufferHeap &data,
size_t size, ReadMemoryChunkCallback callback);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to have people give is a DataBufferHeap, just need the chunk size:

size_t ReadMemoryInChunks(lldb::addr_t vm_addr, lldb::offset_t vm_size, lldb::offset_t chunk_size, ReadMemoryChunkCallback callback);

The ReadMemoryInChunks function can create a local DataBufferHeap outside of the loop.

Comment on lines 979 to 981
[&](Status &error, DataBufferHeap &data, lldb::addr_t current_addr,
uint64_t bytes_to_read,
uint64_t bytes_read_for_chunk) -> lldb_private::IterationAction {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to suggested callback prototype above.

if (addDataError.Fail())
return lldb_private::IterationAction::Stop;

if (bytes_read_for_chunk != bytes_to_read) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will pass in a chunk_size which can be captured in the lambda and you compare the "bytes_size" to your "chunk_size".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already passing bytes_read_for_chunk and bytes_to_read which is the expected size, so I didn't need to change the lambda here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want bytes_to_read in the callback, it isn't needed. Use data_buffer->GetByteSize() instead, you are capturing anything needed by reference in your lambda already.

@Jlalond Jlalond force-pushed the minidump-file-builder-read-in-chunks branch from 5af0047 to b35e42f Compare April 7, 2025 22:35
@Jlalond Jlalond force-pushed the minidump-file-builder-read-in-chunks branch from b35e42f to f01e8c1 Compare April 7, 2025 22:36
@Jlalond Jlalond force-pushed the minidump-file-builder-read-in-chunks branch from f01e8c1 to 13581a6 Compare April 7, 2025 22:45
if (addDataError.Fail())
return lldb_private::IterationAction::Stop;

if (bytes_read_for_chunk != bytes_to_read) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want bytes_to_read in the callback, it isn't needed. Use data_buffer->GetByteSize() instead, you are capturing anything needed by reference in your lambda already.

Status addDataError;
Process::ReadMemoryChunkCallback callback =
[&](Status &error, lldb::addr_t current_addr, const void *buf,
uint64_t bytes_read_for_chunk) -> lldb_private::IterationAction {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now rename bytes_read_for_chunk to bytes_read

if (bytes_read_for_chunk != data_buffer.GetByteSize() &&
current_addr + bytes_read_for_chunk != size) {
LLDB_LOGF(log,
"Memory region at: %" PRIx64 " partiall read %" PRIx64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/partiall/partial/

You also want all addresses to be prefixed with 0x here, so the format string should be:

"Memory region at: 0x%" PRIx64 " partiall read 0x%" PRIx64

Same for format string below.

uint64_t bytes_read_for_chunk) -> lldb_private::IterationAction {
if (error.Fail() || bytes_read_for_chunk == 0) {
LLDB_LOGF(log,
"Failed to read memory region at: %" PRIx64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want addresses to have 0x prefix, so:

"Failed to read memory region at: 0x%" PRIx64

/// vm_addr, \a buf, and \a size updated appropriately. Zero is
/// returned in the case of an error.
size_t ReadMemoryInChunks(lldb::addr_t vm_addr, void *buf,
lldb::addr_t chunk_size, size_t total_size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use size_t, this is 32 bits on 32 bit systems. Use lldb::offset_t or 'lldb::addr_t` which is 64 bits all the time.

Comment on lines 986 to 987
". Bytes read: %zu, error: %s",
current_addr, bytes_read_for_chunk, error.AsCString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use size_t, use lldb::offset_t:

". Bytes read: %" PRIu64 ", error: %s",
                current_addr, bytes_read_for_chunk, error.AsCString());

/// size, then this function will get called again with \a
/// vm_addr, \a buf, and \a size updated appropriately. Zero is
/// returned in the case of an error.
size_t ReadMemoryInChunks(lldb::addr_t vm_addr, void *buf,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use size_t, it is 32 bits on 32 bits architectures. Use lldb::offset_t which is always 64 bits.

// the write chunk max size.
const size_t bytes_to_read = std::min(bytes_remaining, chunk_size);
const lldb::addr_t current_addr = vm_addr + bytes_read;
const size_t bytes_read_for_chunk =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use size_t

@Jlalond Jlalond force-pushed the minidump-file-builder-read-in-chunks branch 2 times, most recently from fea4d05 to cab1efa Compare April 8, 2025 00:13
Copy link

github-actions bot commented Apr 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Jlalond Jlalond force-pushed the minidump-file-builder-read-in-chunks branch from cab1efa to 4aed212 Compare April 8, 2025 00:24
Comment on lines 1594 to 1597
// Status, the status returned from ReadMemoryFromInferior
// void*, pointer to the bytes read
// addr_t, the bytes_addr, start + bytes read so far.
// bytes_size, the count of bytes read for this chunk
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-order comments here to match new args ordering

…e read chunk size. Also don't log if a truncated read is the final read
@Jlalond Jlalond force-pushed the minidump-file-builder-read-in-chunks branch from 4aed212 to a85c33e Compare April 8, 2025 00:29
@Jlalond
Copy link
Contributor Author

Jlalond commented Apr 8, 2025

Did some additional validation offline, where I save a jumbo Minidump (50gb). We have improved disk IO and a flat memory profile.

@Jlalond Jlalond merged commit f869d6e into llvm:main Apr 8, 2025
10 checks passed
@Jlalond Jlalond deleted the minidump-file-builder-read-in-chunks branch April 8, 2025 18:17
Jlalond added a commit that referenced this pull request May 29, 2025
In #129307, we introduced read write in chunks, and during the final
revision of the PR I changed the behavior for 64b memory regions and did
not test an actual 64b memory range.

This caused LLDB to crash whenever we generated a 64b memory region. 

64b regions has been a problem in testing for some time as it's a waste
of test resources to generation a 5gb+ Minidump. I will work with
@clayborg and @labath to come up with a way to specify creating a 64b
list instead of a 32b list (likely via the yamilizer).
svkeerthy pushed a commit that referenced this pull request May 29, 2025
In #129307, we introduced read write in chunks, and during the final
revision of the PR I changed the behavior for 64b memory regions and did
not test an actual 64b memory range.

This caused LLDB to crash whenever we generated a 64b memory region. 

64b regions has been a problem in testing for some time as it's a waste
of test resources to generation a 5gb+ Minidump. I will work with
@clayborg and @labath to come up with a way to specify creating a 64b
list instead of a 32b list (likely via the yamilizer).
google-yfyang pushed a commit to google-yfyang/llvm-project that referenced this pull request May 29, 2025
)

In llvm#129307, we introduced read write in chunks, and during the final
revision of the PR I changed the behavior for 64b memory regions and did
not test an actual 64b memory range.

This caused LLDB to crash whenever we generated a 64b memory region. 

64b regions has been a problem in testing for some time as it's a waste
of test resources to generation a 5gb+ Minidump. I will work with
@clayborg and @labath to come up with a way to specify creating a 64b
list instead of a 32b list (likely via the yamilizer).
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Jun 3, 2025
)

In llvm#129307, we introduced read write in chunks, and during the final
revision of the PR I changed the behavior for 64b memory regions and did
not test an actual 64b memory range.

This caused LLDB to crash whenever we generated a 64b memory region. 

64b regions has been a problem in testing for some time as it's a waste
of test resources to generation a 5gb+ Minidump. I will work with
@clayborg and @labath to come up with a way to specify creating a 64b
list instead of a 32b list (likely via the yamilizer).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants