Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DataBlock memory consumption and validation #1089

Conversation

cltnschlosser
Copy link
Contributor

@cltnschlosser cltnschlosser commented Dec 14, 2022

What and why?

Details in #1091

This leads me to 3 possibilities:

  1. Datadog is generating a huge buffer.
  2. Datadog buffers are being leaked somehow (or just need more tight autorelease)
  3. Something in my application is using a lot of memory and Datadog is the last straw that is being hit.

NOTE: Just realized we didn't start seeing this until after 1.12.0 upgrade.
Yeah, after digging into the 1.12.0 changes, I'm realizing that a bug causing the first possibility is the most likely. The data is already in memory at this point, and it's just being copied into a buffer. The crash is occurring during that buffer creation. It's likely that the length being provided is incorrect.

How?

I address these:

  1. Add a 10MB safety check (elsewhere it looks like file size should be limited to 4MB), currently the only check is Int(exactly:) which could still be huge. BlockSize is a UInt32 which is 4GB.
  2. Make the queues that are referencing these loaded buffers have a strict autorelease policy. (This wouldn't fix a leak, but I'm hoping there isn't one for now.)
  3. I don't think this is likely given that I'm not seeing other low memory issues. Additionally 90PCT peak memory usage hasn't increased.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

  • Run unit tests
  • Run integration tests
  • Run smoke tests

@cltnschlosser cltnschlosser requested a review from a team as a code owner December 14, 2022 21:14
@cltnschlosser cltnschlosser changed the title More strict memory management for data read, write, and upload Improve DataBlock memory consumption and validation Dec 15, 2022
@ncreated
Copy link
Collaborator

Hello @cltnschlosser 👋. Thank you for opening PR - at first glance it looks OK and it indeed makes sense to control the size of a buffer beforehand. Given that we already have similar logic controlling the "write size" we need some time within the team to discuss best strategy of incorporating this change. Stay tuned, we'll get back to you.

Comment on lines 135 to 159
var bytes = [UInt8](repeating: 0, count: length)
let count = stream.read(&bytes, maxLength: length)
guard length > 0 else {
return Data()
}

// Load from stream to data without unnecessary copies
var data = Data(repeating: 0, count: length)
let count = try data.withUnsafeMutableBytes { (bytes: UnsafeMutableRawBufferPointer) in
guard let buffer = bytes.assumingMemoryBound(to: UInt8.self).baseAddress else {
throw DataBlockError.dataAllocationFailure
}
return stream.read(buffer, maxLength: length)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cltnschlosser We wonder what's the reasoning behind this change? The comment refers to unnecessary copy, but I don't see directly how this was addressed with changing from [UInt8](repeating: 0, count: length) to Data(repeating: 0, count: length) and accessing raw pointers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before there was 3 copies of the data:

  1. file.read() read the entire contents into a Data.
  2. bytes the [UInt8] buffer
  3. The Data that was initialized with bytes. This is a copy.

So 2 was very short lived and it's possible that the swift optimizer removes that copy, but I don't know for sure.

1 was solved by using the InputStream api that loads directly from a file, instead of from Data. 2 and 3 were then merged using Data(repeating: 0, count: length) and then reading directly to that Data.

There is also probably an optimization where you use an uninitialized Data, but I was having some trouble and didn't want to spend too much time on it. I just noticed this api which is probably better than using repeating: 0 as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're uncomfortable with this, I can revert it. I don't think these extra copies were the cause of my memory issue since they are temporary, but it probably improves performance slightly since it doesn't have to copy the data twice (to [UInt8], then Data. Just directly to Data).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation 👍 - now it makes sense given this broader picture. I was wrong by thinking that "copies" relate to local scope in this method.

Copy link
Member

@maxep maxep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @cltnschlosser, this is a very valuable contribution 🙏
It's ok for me to merge it, I will change the target branch so we can replace the MAX_BLOCK_SIZE with the performance preset configuration before merging it to develop. Thanks again!

Sources/Datadog/DatadogCore/Storage/DataBlock.swift Outdated Show resolved Hide resolved
@maxep maxep changed the base branch from develop to feature/optimize-tlv-read December 20, 2022 09:05
Copy link
Collaborator

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again @cltnschlosser ! This PR is in a good shape to merge it - we will then continue this work on our side 🙌.

@ncreated ncreated merged commit 2d7568e into DataDog:feature/optimize-tlv-read Dec 21, 2022
@cltnschlosser
Copy link
Contributor Author

@ncreated Hey, just checking in on your timeline for the updates?

@maxep
Copy link
Member

maxep commented Jan 10, 2023

Hey @cltnschlosser 👋

I'm currently working on replacing the MAX_BLOCK_SIZE with the performance preset configuration. We can't give you an ETA, but it will be part of the next release.
Thanks again for this contribution 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants