Improve DataBlock memory consumption and validation #1089

cltnschlosser · 2022-12-14T21:14:17Z

What and why?

Details in #1091

This leads me to 3 possibilities:

Datadog is generating a huge buffer.
Datadog buffers are being leaked somehow (or just need more tight autorelease)
Something in my application is using a lot of memory and Datadog is the last straw that is being hit.

NOTE: Just realized we didn't start seeing this until after 1.12.0 upgrade.
Yeah, after digging into the 1.12.0 changes, I'm realizing that a bug causing the first possibility is the most likely. The data is already in memory at this point, and it's just being copied into a buffer. The crash is occurring during that buffer creation. It's likely that the length being provided is incorrect.

How?

I address these:

Add a 10MB safety check (elsewhere it looks like file size should be limited to 4MB), currently the only check is Int(exactly:) which could still be huge. BlockSize is a UInt32 which is 4GB.
Make the queues that are referencing these loaded buffers have a strict autorelease policy. (This wouldn't fix a leak, but I'm hoping there isn't one for now.)
I don't think this is likely given that I'm not seeing other low memory issues. Additionally 90PCT peak memory usage hasn't increased.

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration)
Make sure each commit and the PR mention the Issue number or JIRA reference
Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

Run unit tests
Run integration tests
Run smoke tests

…uce large copies

ncreated · 2022-12-16T10:39:22Z

Hello @cltnschlosser 👋. Thank you for opening PR - at first glance it looks OK and it indeed makes sense to control the size of a buffer beforehand. Given that we already have similar logic controlling the "write size" we need some time within the team to discuss best strategy of incorporating this change. Stay tuned, we'll get back to you.

ncreated · 2022-12-19T13:31:09Z

Sources/Datadog/DatadogCore/Storage/DataBlock.swift

-        var bytes = [UInt8](repeating: 0, count: length)
-        let count = stream.read(&bytes, maxLength: length)
+        guard length > 0 else {
+            return Data()
+        }
+
+        // Load from stream to data without unnecessary copies
+        var data = Data(repeating: 0, count: length)
+        let count = try data.withUnsafeMutableBytes { (bytes: UnsafeMutableRawBufferPointer) in
+            guard let buffer = bytes.assumingMemoryBound(to: UInt8.self).baseAddress else {
+                throw DataBlockError.dataAllocationFailure
+            }
+            return stream.read(buffer, maxLength: length)
+        }


@cltnschlosser We wonder what's the reasoning behind this change? The comment refers to unnecessary copy, but I don't see directly how this was addressed with changing from [UInt8](repeating: 0, count: length) to Data(repeating: 0, count: length) and accessing raw pointers.

So before there was 3 copies of the data:

file.read() read the entire contents into a Data.

bytes the [UInt8] buffer

The Data that was initialized with bytes. This is a copy.

So 2 was very short lived and it's possible that the swift optimizer removes that copy, but I don't know for sure.

1 was solved by using the InputStream api that loads directly from a file, instead of from Data. 2 and 3 were then merged using Data(repeating: 0, count: length) and then reading directly to that Data.

There is also probably an optimization where you use an uninitialized Data, but I was having some trouble and didn't want to spend too much time on it. I just noticed this api which is probably better than using repeating: 0 as well.

If you're uncomfortable with this, I can revert it. I don't think these extra copies were the cause of my memory issue since they are temporary, but it probably improves performance slightly since it doesn't have to copy the data twice (to [UInt8], then Data. Just directly to Data).

Thanks for explanation 👍 - now it makes sense given this broader picture. I was wrong by thinking that "copies" relate to local scope in this method.

Sources/Datadog/DatadogCore/Storage/Files/File.swift

maxep

Thank you very much @cltnschlosser, this is a very valuable contribution 🙏
It's ok for me to merge it, I will change the target branch so we can replace the MAX_BLOCK_SIZE with the performance preset configuration before merging it to develop. Thanks again!

Sources/Datadog/DatadogCore/Storage/DataBlock.swift

ncreated

Thank you again @cltnschlosser ! This PR is in a good shape to merge it - we will then continue this work on our side 🙌.

cltnschlosser · 2023-01-09T20:02:51Z

@ncreated Hey, just checking in on your timeline for the updates?

maxep · 2023-01-10T13:20:46Z

Hey @cltnschlosser 👋

I'm currently working on replacing the MAX_BLOCK_SIZE with the performance preset configuration. We can't give you an ETA, but it will be part of the next release.
Thanks again for this contribution 🙏

More strict memory management for data read, write, and upload

8f01b32

cltnschlosser requested a review from a team as a code owner December 14, 2022 21:14

cltnschlosser added 2 commits December 15, 2022 12:37

Add test

edd7805

Use file based InputStream api and Data.withUnsafeMutableBytes to red…

d5d3ab5

…uce large copies

cltnschlosser changed the title ~~More strict memory management for data read, write, and upload~~ Improve DataBlock memory consumption and validation Dec 15, 2022

ncreated reviewed Dec 19, 2022

View reviewed changes

maxep approved these changes Dec 20, 2022

View reviewed changes

Sources/Datadog/DatadogCore/Storage/DataBlock.swift Outdated Show resolved Hide resolved

maxep changed the base branch from develop to feature/optimize-tlv-read December 20, 2022 09:05

PR comments

1fae7ca

ncreated approved these changes Dec 21, 2022

View reviewed changes

ncreated merged commit 2d7568e into DataDog:feature/optimize-tlv-read Dec 21, 2022

maxep mentioned this pull request Jan 11, 2023

RUMM-2819 TLV Lenght Limit #1124

Merged

6 tasks

ncreated mentioned this pull request Jan 17, 2023

Crash: Could not allocate memory #1091

Closed

ncreated mentioned this pull request Oct 31, 2023

RUM-1837 Update logic to send N batches sequentially in each cycle #1531

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve DataBlock memory consumption and validation #1089

Improve DataBlock memory consumption and validation #1089

cltnschlosser commented Dec 14, 2022 •

edited

Loading

ncreated commented Dec 16, 2022

ncreated Dec 19, 2022

cltnschlosser Dec 20, 2022

cltnschlosser Dec 20, 2022

ncreated Dec 21, 2022

maxep left a comment

ncreated left a comment

cltnschlosser commented Jan 9, 2023

maxep commented Jan 10, 2023

Improve DataBlock memory consumption and validation #1089

Improve DataBlock memory consumption and validation #1089

Conversation

cltnschlosser commented Dec 14, 2022 • edited Loading

What and why?

How?

Review checklist

Custom CI job configuration (optional)

ncreated commented Dec 16, 2022

ncreated Dec 19, 2022

Choose a reason for hiding this comment

cltnschlosser Dec 20, 2022

Choose a reason for hiding this comment

cltnschlosser Dec 20, 2022

Choose a reason for hiding this comment

ncreated Dec 21, 2022

Choose a reason for hiding this comment

maxep left a comment

Choose a reason for hiding this comment

ncreated left a comment

Choose a reason for hiding this comment

cltnschlosser commented Jan 9, 2023

maxep commented Jan 10, 2023

cltnschlosser commented Dec 14, 2022 •

edited

Loading