-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement RandomStringUtils on top of SecureRandom#getInstanceStrong() #1235
Reimplement RandomStringUtils on top of SecureRandom#getInstanceStrong() #1235
Conversation
The previous implementation used ThreadLocalRandom#current()
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1235 +/- ##
============================================
+ Coverage 92.07% 92.52% +0.44%
- Complexity 7587 7757 +170
============================================
Files 200 200
Lines 15844 15894 +50
Branches 2890 2819 -71
============================================
+ Hits 14589 14706 +117
+ Misses 682 650 -32
+ Partials 573 538 -35 ☔ View full report in Codecov by Sentry. |
SecureRandoms take a lot longer to initialize, which will slow down the loading of this class, which may impact performance for users who don't care about cryptographic security of the generated text. Where does the crypto security requirement come from? We already direct users elsewhere for more sophisticated use cases. In [math] now I guess in rng, we used to provide separate impls so users who just wanted random text did not have to pay for the overhead of SecureRandom. |
This came from a conversation Gary and I were having on Slack (which I need to take to dev@). Researching with a colleague in my dayjob (@fabrice102), the number of use cases of RandomStringUtils that are ignoring the documentation of the class appears much higher than the number of use cases that are my original 2000s intention of do testing. Really feels like we need to take the road less desirable here. Having an alternative for the apparant minority who want speed over security feels good too. Something that doesn't kick off the SecureRandom initializing. |
I would expect the number of users who want / need cryptographic security to be tiny. So why make that the default behavior? I have not benchmarked recent JDKs, but it used to be a lot slower. I remember tomcat ripping out some SecureRandom uses for performance reasons. And it also kind of gives me the willies that the instance is initialized (the most expensive part) in a static initializer. |
How is that, @garydgregory? Doesn't getInstanceStrong initialize the instance? I thought it blocked waiting for /dev/random. Blocking in class initialization could be problematic. Or am I misreading the code or missing something? |
I believe @psteitz is correct that the initialisation of SecureRandom can be slow and potentially blocking. The way it is done in Commons RNG and the JDK source code for SplittableRandom (when it is configured for secure seeding via system properties) is to use the default constructor: SecureRandom rng = new SecureRandom(); See The Right Way to Use SecureRandom. A less intrusive modification to this PR would be a similar mechanism to the JDK seeding RNG routine where the source of seeding randomness is chosen based on system properties. But you still have to decide whether the default is secure or non-secure. Note: A StringSampler implementation was discussed under RNG 54. There is a table showing some relative performance of Lang and Text here: RNG-54 comment 2. Currently String sampling is not implemented in RNG although the ticket remains open. For fast random strings it is optimal to use a power of 2 alphabet. But this limits use cases to just outputting random junk text. I presume the case for SecureRandom is for password/salt generation. In this context a power of 2 alphabet is restrictive and a method using a uniform [0, n) choice for each character of alphabet size n is requied. There are samplers in RNG that will outperform use of |
Hello @psteitz |
Using ThreadLocal still has a potential blocking call when a random method is invoked. |
Hi @aherbert, |
Sorry, clarification: It is the first call to ThreadLocal.get() that may block on the instantiation of the SecureRandom (depending on how it is created). After that each thread has a SecureRandom and will be fine. |
I tried to research the risk of blocking when instantiating SecureRandom. TL;DR: On modern OSes and common platforms, blocking is unlikely, unless the program is called very early in the boot process or (maybe but unlikely) just after boot finishes. Even then, blocking would only happen the first time a method from RandomStringUtils is called. Other calls to methods of RandomStringUtils in the same thread or other threads would not block. In all cases, if RandomStringUtils is not used, this PR has no performance impact on any system. On Unix, the default SUN cryptographic providers SUN/NativePRNG (obtained when using new SecureRandom() to instantiate SecureRandom) and SUN/NativePRNGBlocking (obtained when using SecureRandom.getInstanceStrong()) appear to be using /dev/urandom and /dev/random respectively. On modern Linux kernels (5.6 or newer, 5.6 has been released in 2020), the only time /dev/random would block is if the system never got enough entropy to initialize the randomness pool. In particular, only the first thread making calls to RandomStringUtils may block. Furthermore, on most commonly used systems, the CPU Jitter source entropy would initialize /dev/random less than 1s after boot. So on commonly-used modern Linux systems, there should be virtually no performance impact at initialization of SecureRandom. Since at least 2016 (most likely even before that), FreeBSD and macOS seem to have identical /dev/urandom and /dev/random . In addition, /dev/random blocks only on initialization like on modern linux kernels. But I could not find a first-party source confirming this. At least since OpenBSD 6.3 (released in 2018), /dev/urandom and /dev/random are the same and never block. I did not check what happens on Windows in details. I could not find significant complaints about Windows SecureRandom blocking. It is possible it never blocks. |
We used to have a mailing list where we discussed changes like this. Sorry for the snarky comment, but this is a good discussion that will be hard to follow for anyone researching what led to this change. With no threading, we can't really separate the issues either, which are 0) init performance cost and potential to block waiting for entropy (great analysis @fabrice102 , could be no longer an issue for most users) 1) gen time performance cost 2) whether it might be better to create SecureRandomStringUtils or otherwise support fast generation without cyrptographic security. |
This commit improves the performance of RandomStringUtils: * Reduces the number of random bytes generated and the number of calls to the random number generator, by using a cache system `AmortizedRandomBits`. * Optimizes the case of alphanumerical strings, reducing the number of rejections in the rejection sampling. See comments in code for details.
Thanks @psteitz ! I am not sure where is the best place to post the following comment. If you think it’s more appropriate to post it to the mailing list, let me know and I will post it there. To address the point "1) gen time performance cost", I’ve written some optimizations on top of the current PR: garydgregory#2 Benchmarking using JMH with JDK 8 on an AWS c6a.xlarge instance with cryptographic providers SUN/NativePRNG, SUN/SHA1PRNG, ACCP 1.6, and ACCP 2.3, we have the following results. Compared to the current state (with the default java.util.Random/ThreadLocalRandom number generator), when any of the above SecureRandom is used with the optimized code, the time to generate random alphabetic, alphanumeric, and numeric strings of at least 16 characters is less than three times slower (and sometimes even 5x faster). For 4-character strings, we observe a slow-down of at most 8x. If the optimized RandomStringUtils is called concurrently in multiple threads, the SUN/NativePrng SecureRandom implementation (the default on Linux) may be slower as calls are serialized (due to access to/dev/random. This seems an unlikely scenario in practice. In addition, switching to ACCP 2 for SecureRandom would fix the performance issue in that case. |
On Fri, Jun 14, 2024 at 4:13 PM Fabrice Benhamouda ***@***.***> wrote:
Thanks @psteitz <https://github.com/psteitz> ! I am not sure where is the
best place to post the following comment. If you think it’s more
appropriate to post it to the mailing list, let me know and I will post it
there.
To address the point "1) gen time performance cost", I’ve written some
optimizations on top of the current PR: garydgregory#2
<garydgregory#2>
Benchmarking using JMH with JDK 8 on an AWS c6a.xlarge instance with
cryptographic providers SUN/NativePRNG, SUN/SHA1PRNG, ACCP
<https://github.com/corretto/amazon-corretto-crypto-provider> 1.6, and
ACCP <https://github.com/corretto/amazon-corretto-crypto-provider> 2.3,
we have the following results. Compared to the current state (with the
default java.util.Random/ThreadLocalRandom number generator), when any of
the above SecureRandom is used with the optimized code, the time to
generate random alphabetic, alphanumeric, and numeric strings of at least
16 characters is less than three times slower (and sometimes even 5x
faster). For 4-character strings, we observe a slow-down of at most 8x.
If the optimized RandomStringUtils is called concurrently in multiple
threads, the SUN/NativePrng SecureRandom implementation (the default on
Linux) may be slower as calls are serialized (due to access to/dev/random.
This is the use case I would be most concerned with. Many web applications
use this class in multi-threaded app server settings. I would not use the
class in web apps after this change for that reason. If we do make the
change, we should make it clear in the class javadoc that it is using
SecureRandom and prone to occasional stalls.
Since there was no discussion about why this change is being made, I will
ask again why it is being proposed. The only rationale that I can see is
that somehow some users are using the class to generate things that should
be cryptographically secure, like passwords or hash salt, so to protect
these users, we make all other users take a potential performance hit.
Much better, IMO, would be to make the recommended improvements in
performance to the vanilla Random-based impl and add a new class or somehow
shoehorn in a secure impl for users that want / need that.
Phil
This seems an unlikely scenario in practice. In addition, switching to ACCP
2 for SecureRandom would fix the performance issue in that case.
… —
Reply to this email directly, view it on GitHub
<#1235 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALJJ2W2FHEF7BLURVCM3HDZHN2LFAVCNFSM6AAAAABJJE3MF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYHA3DSOJYGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Phil,
Please see the Commons security list. Are you not subscribed? You should be
since you are on the PMC. I did not want to copy paste the info Hen Y
brought up with me separately just in case.
Gary
…On Sat, Jun 15, 2024, 3:39 PM Phil Steitz ***@***.***> wrote:
On Fri, Jun 14, 2024 at 4:13 PM Fabrice Benhamouda ***@***.***>
wrote:
> Thanks @psteitz <https://github.com/psteitz> ! I am not sure where is
the
> best place to post the following comment. If you think it’s more
> appropriate to post it to the mailing list, let me know and I will post
it
> there.
>
> To address the point "1) gen time performance cost", I’ve written some
> optimizations on top of the current PR: garydgregory#2
> <garydgregory#2>
>
> Benchmarking using JMH with JDK 8 on an AWS c6a.xlarge instance with
> cryptographic providers SUN/NativePRNG, SUN/SHA1PRNG, ACCP
> <https://github.com/corretto/amazon-corretto-crypto-provider> 1.6, and
> ACCP <https://github.com/corretto/amazon-corretto-crypto-provider> 2.3,
> we have the following results. Compared to the current state (with the
> default java.util.Random/ThreadLocalRandom number generator), when any
of
> the above SecureRandom is used with the optimized code, the time to
> generate random alphabetic, alphanumeric, and numeric strings of at
least
> 16 characters is less than three times slower (and sometimes even 5x
> faster). For 4-character strings, we observe a slow-down of at most 8x.
>
> If the optimized RandomStringUtils is called concurrently in multiple
> threads, the SUN/NativePrng SecureRandom implementation (the default on
> Linux) may be slower as calls are serialized (due to access
to/dev/random.
>
This is the use case I would be most concerned with. Many web applications
use this class in multi-threaded app server settings. I would not use the
class in web apps after this change for that reason. If we do make the
change, we should make it clear in the class javadoc that it is using
SecureRandom and prone to occasional stalls.
Since there was no discussion about why this change is being made, I will
ask again why it is being proposed. The only rationale that I can see is
that somehow some users are using the class to generate things that should
be cryptographically secure, like passwords or hash salt, so to protect
these users, we make all other users take a potential performance hit.
Much better, IMO, would be to make the recommended improvements in
performance to the vanilla Random-based impl and add a new class or
somehow
shoehorn in a secure impl for users that want / need that.
Phil
This seems an unlikely scenario in practice. In addition, switching to
ACCP
2 for SecureRandom would fix the performance issue in that case.
> —
> Reply to this email directly, view it on GitHub
> <
#1235 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AALJJ2W2FHEF7BLURVCM3HDZHN2LFAVCNFSM6AAAAABJJE3MF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYHA3DSOJYGM>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#1235 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJB6N3KLMPFPDVOYS4VLFTZHSJ6FAVCNFSM6AAAAABJJE3MF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZQGU4TGNZYGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
To clarify, the case where there may be a slow-down is the following: Multiple threads concurrently, at the same time, call Generation of a random 16-alphanumerical-character string takes less than 1us on the test machine (c6a.xlarge). (And with a non-optimized implementation using the default |
* Fix 2 checkstyle errors. * Apply suggestions from review for #2 * Improve comments in new (non-public) class `AmortizedRandomBits` to match comments in other classes.
…manceOptimization Performance optimizations for RandomStringUtils
- Rename package-private class - Whitespace - Add null check - Add serialVersionUID - Remove redunant type cast - Throw IllegalStateException, not RuntimeException - nextBytes() should throw NullPointerException per contract - Javadoc: Use longer lines
Thanks @psteitz for raising issues with this PR. We are using RandomStringUtils to generate a string for a non secure use. We just need unique strings and this change made our code to block /dev/random where there was not enough entropy. Some of our code took forever to run after this change and we had to force downgrade to previous version. I would hope that there would be an implementation that uses a semi-secure or non-secure random to generate a string as that would be our requirement. |
@nvsnvikram |
thanks. we will have to change our code to use that but should be fine. |
I'll experiment with a more friendly API on my end... |
Here is a solution to provide support for both secure and insecure modes : #1250 |
Just fyi: I've seen our builds taking 4-16 hours after this change instead of typical 30min in a Jenkins dedicated Docker env where we heavily use RandomStringUtils to generate arbitrary 100-200 character strings in tests. |
I will push out a new release candidate today hopefully where you can call .insecure() to get the old faster and insecure implementation. Right now you can test with 3.16.0-SNAPSHOT. |
SecureRandom.getInstanceStrong() in RandomUtils while use NativePRNGBlocking default |
I don't think you to change the implementation of Random. If this package is used by other components, it will cause a chain reaction |
Open a discussion about this issue? |
This is currently being discussed on an internal (private) mailing list.
Gary
…On Sun, Aug 18, 2024 at 9:31 PM HelloDhero ***@***.***> wrote:
@garydgregory <https://github.com/garydgregory>
—
Reply to this email directly, view it on GitHub
<#1235 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJB6NYBTE37KJ4CCGSZ623ZSFDIRAVCNFSM6AAAAABJJE3MF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGUYDKMJRGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@knoobie and @HelloDhero, you reported important slow-down following the changes in this PR. What OS/kernel are your systems using? In particular, do they use pre-5.6 Linux kernel? If this is the case, the issue is most likely that In that case, switching to A temporary workaround may be to add |
Modifying the securerandom.strongAlgorithms configuration in a production environment is not a good practice, and we should avoid this situation at the code level |
@fabrice102 CentOS7 / Kernel 3.10.0-1160.114.2.el7.x86_64 - so your comment about "pre-5.6" Linux Kernel might be correct. Even tho they are probably still widely spread with RHEL 7 and 8 also using pre-5.6 Kernels. |
FYI: You can experiment now with git master and 3.17.0-SNAPSHOT builds to get: Random[String]Utils.secure() now uses SecureRandom() instead of SecureRandom.getInstanceStrong()
|
The previous implementation used ThreadLocalRandom#current()