Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative way to detect AMD bug #48

Merged
merged 2 commits into from
Jun 30, 2019
Merged

Conversation

josephlr
Copy link
Member

@josephlr josephlr commented Jun 29, 2019

This is an alternative way to do the AMD bug detection "fixed" in #43.

@newpavlov feel free to immediately close this if you think this way is overly complex.

Before we checked the return value of RDRAND to detect the bug. However, this will occasionally return a false positive (every (1/2)^63 invocations), so we use the existing retry loop to try again. I also added comments explaining that:

  • The issue here is AMD not setting the CF flag properly (as RDRAND is allowed to fail)
  • Why we perform this check on all targets.

@josephlr josephlr mentioned this pull request Jun 29, 2019
@dhardy
Copy link
Member

dhardy commented Jun 29, 2019

So in practical terms, this change means that anyone with an affected CPU cannot use the RDRAND implementation at all. Is the presence of a sporadic bug (on resume from standby) sufficient justification to not trust RDRAND on those CPUs at all, while still trusting it on other CPUs? @tarcieri thoughts? I haven't read up on the issue, but this seems unnecessary.

Question: is there a reason we can't discard "obviously bad" values and try again? There are three aspects to this question:

  • Can the CPU "recover" and return other values? Edit: no
  • In this case, could the occurrence of the bug reduce security of the result?
  • What is the expected number of false-positives of the guard removed by this PR? (2^-63 times the number of uses of RDRAND via this lib — so unless servers actually make millions of requests the number of false positives is likely to be 0 or very close.)

There is also the question: is it likely other (possibly future) CPUs might exhibit a similar bug?


My view is that the most robust solution would be to trap the !0 value, but retry a couple of times (we're already in a loop, so continue is sufficient).

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think disabling RDRAND completely on affected CPUs is too extreme of a measure. Especially considering that failure can be reliably detected.

Personally I like approach of detecting CPU family and doing value check only for affected CPUs. I think we can write it like this (note the changed rdrand signature). But I understand the complexity concern, so @dhardy's proposal is a viable alternative.

BTW maybe we should use a dedicated error code for this problem instead of Error::UNKNOWN?

src/rdrand.rs Outdated Show resolved Hide resolved
@josephlr
Copy link
Member Author

I updated this PR and the description to incorporate the above suggestions

I think disabling RDRAND completely on affected CPUs is too extreme of a measure. Especially considering that failure can be reliably detected.

I agree. Checking the output value and trying again seems avoid false positives and being overly aggressive.

Personally I like approach of detecting CPU family and doing value check only for affected CPUs. I think we can write it like this (note the changed rdrand signature). But I understand the complexity concern, so @dhardy's proposal is a viable alternative.

The currently version has the check in place for all targets. This bug could be present in the future and performing the check is very cheep (just a lea and cmp for both checks)

There is also the question: is it likely other (possibly future) CPUs might exhibit a similar bug?

This is a good point. Intel's own documentation says that 0x00000000 is returned when RDRAND fails. Now they set the CF flag correctly. However, failing to set a CF flag is a relatively easy failure mode, so I think keeping this check for all implementations is reasonable.

@newpavlov newpavlov merged commit cbc44ee into rust-random:master Jun 30, 2019
@josephlr josephlr deleted the amd branch July 1, 2019 07:32
@josephlr josephlr mentioned this pull request Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants