Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix use-after-free on implicit temporary FileOptions #8571

Closed
wants to merge 2 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Jul 21, 2021

Summary: FileOptions has an implicit conversion from EnvOptions and some
internal APIs take const FileOptions& and save the reference, which is
counter to Google C++ guidelines,

Avoid defining functions that require a const reference parameter to outlive the call, because const reference parameters bind to temporaries. Instead, find a way to eliminate the lifetime requirement (for example, by copying the parameter), or pass it by const pointer and document the lifetime and non-null requirements.

This is at least a problem for repair.cc, which passes an EnvOptions to
TableCache(), which would save a reference to the temporary copy as
FileOptions. This was unfortunately only caught as a side effect of
changes in #8544.

This change fixes the repair.cc case and updates the involved internal
APIs that save a reference to use const FileOptions* instead.

Unfortunately, I don't know how to get any of our sanitizers to reliably
report bugs like this, so I can't rule out more existing in our
codebase.

Test Plan: Test that issues seen with #8544 are fixed (can reproduce on
AWS EC2)

Summary: FileOptions has an implicit conversion from EnvOptions and some
internal APIs take `const FileOptions&` and save the reference, which is
counter to Google C++ guidelines,

> Avoid defining functions that require a const reference parameter to outlive the call, because const reference parameters bind to temporaries. Instead, find a way to eliminate the lifetime requirement (for example, by copying the parameter), or pass it by const pointer and document the lifetime and non-null requirements.

This is at least a problem for repair.cc, which passes an EnvOptions to
TableCache(), which would save a reference to the temporary copy as
FileOptions. This was unfortunately only caught as a side effect of
changes in facebook#8544.

This change fixes the repair.cc case and updates the involved internal
APIs that save a reference to use `const FileOptions*` instead.

Unfortunately, I don't know how to get any of our sanitizers to reliably
report bugs like this, so I can't rule out more existing in our
codebase.

Test Plan: Test that issues seen with facebook#8544 are fixed (can reproduce on
AWS EC2)
Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks more or less fine to me, but wonder how much overkill it is. Would you have essentially the same outcome by changing the env_options_ in Repairer to file_options_, without other changes?

@pdillinger
Copy link
Contributor Author

This change looks more or less fine to me, but wonder how much overkill it is. Would you have essentially the same outcome by changing the env_options_ in Repairer to file_options_, without other changes?

It leaves another accident waiting to happen. And it wouldn't be so concerning if we had a predictable way to catch this undefined behavior but it appears we have no means of automatic detection for this. Do you have an alternative idea for avoiding this type of bug?

@pdillinger
Copy link
Contributor Author

In general, changing a couple dozen LoC to follow our prescribed style guide does not seem like overkill.

@mrambacher
Copy link
Contributor

This change looks more or less fine to me, but wonder how much overkill it is. Would you have essentially the same outcome by changing the env_options_ in Repairer to file_options_, without other changes?

It leaves another accident waiting to happen. And it wouldn't be so concerning if we had a predictable way to catch this undefined behavior but it appears we have no means of automatic detection for this. Do you have an alternative idea for avoiding this type of bug?

I don't think there is a good solution to necessarily avoid this type of bug in general terms. For this specific case, you could eliminate the constructor FileOptions(const EnvOptions&) and see what blows up. Eliminating that constructor would prevent the "hidden conversion" of one type to another. If the conversion is needed, you could make an explicit static method -- e.g. static FileOptions FileOptions::FromEnvOptions(const EnvOptions&) -- to do the conversion.

@mrambacher
Copy link
Contributor

In general, changing a couple dozen LoC to follow our prescribed style guide does not seem like overkill.

I do not see where a "const FileOptions&" violates the style guide. From my reading of the style guide, it is preferred over "const FileOptions*" (https://google.github.io/styleguide/cppguide.html#Inputs_and_Outputs).

My concern is that someone will come along later and look at that code and not understand why it is a pointer versus a reference and does not follow the style of the other code and change it, thereby potentially re-introducing this problem.

And I do not think the problem is the pointer versus reference there, it is storing a reference in the class

@pdillinger
Copy link
Contributor Author

We shouldn't just remove public APIs for internal clean-up. I highly suspect (but have not confirmed, @anand1976 #5761) that the implicit conversion was added to ease some API compatibility issues in code using RocksDB. Because we never save *Options objects from public APIs without copying, we do not have to worry about the same kind of bug from unexpected temporary copy when entering via public APIs.

And I think it's a great convention not to save references to const& parameters. (Make them const* if they're going to be saved for use after call.)

@mrambacher
Copy link
Contributor

We shouldn't just remove public APIs for internal clean-up. I highly suspect (but have not confirmed, @anand1976 #5761) that the implicit conversion was added to ease some API compatibility issues in code using RocksDB. Because we never save *Options objects from public APIs without copying, we do not have to worry about the same kind of bug from unexpected temporary copy when entering via public APIs.

And I think it's a great convention not to save references to const& parameters. (Make them const* if they're going to be saved for use after call.)

I agree with both of those statements. Why not change TableCache to store a FileOptions rather than FileOptions& then and avoid most of the rest of these changes?

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I hadn't thought of using pointer argument to make contracts about lifetime safer (from implicit conversions and accidental misuse in general).

I agree with both of those statements. Why not change TableCache to store a FileOptions rather than FileOptions& then and avoid most of the rest of these changes?

The current PR keeps the intended existing behavior of not copying FileOptions into TableCache. I also don't think that behavior needs to be kept, but it's not the purpose of this PR, so am ok with leaving it.

@pdillinger
Copy link
Contributor Author

FIXME: some other memory leaks still reported somehow with #8544 changes

Fixed in #8589

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in 74b7c0d.

jaepil pushed a commit to deepsearch-hq/rocksdb-backup that referenced this pull request Aug 1, 2021
Summary:
FileOptions has an implicit conversion from EnvOptions and some
internal APIs take `const FileOptions&` and save the reference, which is
counter to Google C++ guidelines,

> Avoid defining functions that require a const reference parameter to outlive the call, because const reference parameters bind to temporaries. Instead, find a way to eliminate the lifetime requirement (for example, by copying the parameter), or pass it by const pointer and document the lifetime and non-null requirements.

This is at least a problem for repair.cc, which passes an EnvOptions to
TableCache(), which would save a reference to the temporary copy as
FileOptions. This was unfortunately only caught as a side effect of
changes in facebook#8544.

This change fixes the repair.cc case and updates the involved internal
APIs that save a reference to use `const FileOptions*` instead.

Unfortunately, I don't know how to get any of our sanitizers to reliably
report bugs like this, so I can't rule out more existing in our
codebase.

Pull Request resolved: facebook#8571

Test Plan:
Test that issues seen with facebook#8544 are fixed (can reproduce on
AWS EC2)

Reviewed By: ajkr

Differential Revision: D29943890

Pulled By: pdillinger

fbshipit-source-id: 95f9c5251548777b4dc994c1a083dd2add5799c9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants