-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Reduce lock contention in X509_STORE #28599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Do we have an existing test in perftools that shows a performance improvement? x509storeissuer.c? |
e070d2e to
9a5a9a3
Compare
yes, that's the one I've used for the perf table. |
|
I think this is a good direction and is one of the things I had in mind when the saga around #23224, python/cpython#114572, and CVE-2024-0397 was going on. @bob-beck and I were actually just talking about this on Monday. Alas, there is a compatibility risk here, which is why I suggested in #23224 that you all deprecate Not only does It's paired with this operation here. They want to unload the CRL they previously loaded and then replace it with a new one. (They also don't do this under I believe this PR would break OpenVPN. The silly thing is that they don't actually need to structure their application this way at all. They just want every verify operation to use a specific single CRL list. They can do this by not adding CRLs to the Back when #23244 was going on, I wrote a patch for OpenVPN to do this, but it looks like I forgot to do anything with it. I'll see about upstreaming it this week, though you all may still be concerned about the impact to older OpenVPNs. (There are other issues with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this change. Thank you!
I have given you a terrible idea inline. my apologies in advance.
crypto/x509/x509_local.h
Outdated
| CRYPTO_EX_DATA ex_data; | ||
| CRYPTO_REF_COUNT references; | ||
| CRYPTO_RWLOCK *lock; | ||
| int objs_need_sync; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field desperately needs a comment explaining what it's for, because it's a core part of the lovely magic ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm going to flesh it out.
|
I quite like the (slight) performance improvement in the single threaded case. Deprecation or removal of I'm leaning towards removing the errant API in 4.0 and spelling out the alternative in the migration guide. @davidben has just about written the text. |
I would be +1 to this. |
|
Going straight to removal seems too aggressive to me and is counter to our policies. Deprecation seems more appropriate. |
|
Something in the middle would be to deprecate but also always return NULL from that function. The problem is that the function is highly problematic for multithreaded usage but also implementing it properly would require the hack @bob-beck is proposing in #28599 (comment) |
|
Assuming we can't immediately deprecate, doing the hack as I suggest is not
too problematic for the code complexity in the short term, the only issue
is really ensuring we keep it thread safe.
That doesn't mean get0_objects needs to be thread safe itself, it is
already a hot mess casting away const, running around the room juggling
scissors, running chainsaws, and loaded revolvers as far as that goes, so I
don't think that needs to be fixed if it is going to be deprecated and is
on it's way out the door, and this is a temporary measure.
We just need to make sure that the hack does not introduce *new* thread
safety issues if you are using the rcu version safely, even if that means
all works great until you call get0 and then everything serializes to
garbage. the point is not to break it, not optimize for it.
…On Fri, Sep 19, 2025 at 7:06 AM Tomáš Mráz ***@***.***> wrote:
*t8m* left a comment (openssl/openssl#28599)
<#28599 (comment)>
Something in the middle would be to deprecate but also always return NULL
from that function. The problem is that the function is highly problematic
for multithreaded usage but also implementing it properly would require the
hack @bob-beck <https://github.com/bob-beck> is proposing in #28599
(comment)
<#28599 (comment)>
—
Reply to this email directly, view it on GitHub
<#28599 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB57M5M4TNJM334QPRKZEK33TP5UBAVCNFSM6AAAAACG3CRSSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMJSGEZDMMZQGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
On Fri, Sep 19, 2025 at 6:08 AM Tomáš Mráz ***@***.***> wrote:
*t8m* left a comment (openssl/openssl#28599)
<#28599 (comment)>
I'm leaning towards removing the errant API in 4.0 and spelling out the
alternative in the migration guide. @davidben
<https://github.com/davidben> has just about written the text.
I would be +1 to this.
As hinted at in the horrible Idea, I am also very much +1 to killing it as
soon as possible. The hack is merely if we want the performance win in
places we can not deprecate it.
… —
Reply to this email directly, view it on GitHub
<#28599 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB57M5JB5BZ77JVUR2G5IE33TPW4DAVCNFSM6AAAAACG3CRSSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMJRHE2DQMZQHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There are two ways to load CRLs in OpenSSL. They can be loaded at the X509_STORE, shared across verifications, or loaded per verification at the X509_STORE_CTX. OpenVPN currently does the former. However, it also supports CRL reloading, and tries to reload the CRL file before each connection. OpenSSL does not really have a good way to unload objects from an X509_STORE. OpenVPN currently does it by grabbing the STACK_OF(X509_OBJECT) out of the X509_STORE and manually deleting all the CRLs from it. This mutates an OpenSSL internal object which bumps into problems if OpenSSL ever switches to a more efficient representation. See openssl/openssl#28599 (It's also not thread-safe, though it doesn't look like that impacts OpenVPN? Actually even reading that list doesn't work. See CVE-2024-0397. This OpenSSL API was simply broken.) Additionally, this seems to cause two OpenVPN features to not work together. I gather backend_tls_ctx_reload_crl is trying to clear the CRLs loaded from last time it ran. But tls_ctx_load_ca with a ca_file can also load CRLs. tls_ctx_load_ca with ca_path will also pick up CRLs and backend_tls_ctx_reload_crl actually ends up clobbering some state X509_LOOKUP_hash_dir internally maintains on the X509_STORE. Likewise, tls_verify_crl_missing can get confused between backend_tls_ctx_reload_crl's crl_file-based CRLs and CRLs from tls_ctx_load_ca. Avoid all this by tracking the two CRLs separately. crl_file-based CRLs now go onto a STACK_OF(X509_CRL) tracked on the tls_root_ctx. Now this field can be freely reloaded by OpenVPN without reconfiguring OpenSSL. Instead, pass the current value into OpenSSL at verification time. To do so, we need to use the SSL_CTX_set_cert_verify_callback, which allows swapping out the X509_verify_cert call, and also tweaking the X509_STORE_CTX configuration before starting certificate verification. Context: SSL_CTX_set_cert_verify_callback and the existing verify_callback are not the same. SSL_CTX_set_cert_verify_callback wraps the verification while verify_callback is called multiple times throughout verification. It's too late to reconfigure X509_STORE_CTX in verify_callback. verify_callback is usually not what you want. Sometimes current_cert and error_depth don't quite line up, and cert_hash_remember may end up called multiple times for a single certificate. I suspect some of the other verify_callback logic would also be better done in the new callback, but I've left it alone to keep this change minimal. verify_callback is really only usable for suppressing errors. Application bookkeeping is better down elsewhere. Signed-off-by: David Benjamin <davidben@google.com>
I could live with this. Otherwise seek a policy exception. I'd initiate the exception process but I've no idea how. |
|
I'm going to test another two options without hashmap.
|
|
My gut feel is hashmap will win here because you know and have control of
what you're comparing.
I added stuff recently to the stack sorting thing so it wouldn't get
cleared all the time, and more importantly, would not clear if you inserted
the items in sorted order.
So if you actually do a find_ex before inserting anything, and use the
position from find_ex, you may not ever have to sort.
…On Mon, Sep 22, 2025 at 5:10 AM Nikola Pajkovsky ***@***.***> wrote:
*npajkovsky* left a comment (openssl/openssl#28599)
<#28599 (comment)>
I'm going to test another two options without hashmap.
1. many functions just call sk_X509_OBJECT_sort without even checking
if stack is sorted. I'm going to check if the stack is sorted, and if not,
sort it.
2. Sort the stack on insert only.
—
Reply to this email directly, view it on GitHub
<#28599 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB57M5PO4EJPEMSTDH2Z2N33T7KLPAVCNFSM6AAAAACG3CRSSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMJYGI4TENBSHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The 2nd option is on par with hashmap but that's because it does not have write-locks anymore. I still don't know how much insert got slower.
Excelent. That's what I need now. |
Please don't do this. This is an incredibly inefficient way to sort. |
fda6cf5 to
4975b1d
Compare
529c64f to
dfb6b6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the apostrophe in store's on line 35 in X509_STORE_get0_param.pod is a UTF-16 char rather than UTF-8 which is breaking pod2man.
Other than that, looks good
dfb6b6d to
3ca20d8
Compare
|
ping @openssl/committers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good I just need a clarification on oldval argument when inserting to hash table. thanks a lot.
crypto/x509/x509_lu.c
Outdated
| HT_SET_KEY_FIELD(&key, xn_canon, xn->canon_enc); | ||
| HT_SET_KEY_FIELD(&key, xn_canon_enclen, xn->canon_enclen); | ||
| val.value = (void *)objs; | ||
| ret = ossl_ht_insert(store->objs_ht, TO_HT_KEY(&key), &val, &oldval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the &oldval argument indicates we are going to replace existing item in hashtable with newly inserted one (&val). I just wonder if we should be doing sk_X509_OBJECT_free(oldval.value)).
I think alternative approach here would be to pass NULL instead of oldval.
I admit I might be wrong (very wrong here) as it is completely new area for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be always get stack and append, or create a new one, but no stack replacement. Therefore, the NULL should be passed to ossl_ht_insert
Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org>
X509_STORE was using STACK_OF(X509_OBJECT) which is not ideal structure. The better solution is to use hashmap. The performance gains come from the fact that sorting was removed and therefore read lock is just enough for looking up objects/cert/crls from hashmap. When X509_STORE_get0_objects() is called, the hashmap converts back to the STACK_OF(X509_OBJECT), and goes back to the original implementation with the performance hit on lookup side because stack is not sorted anymore. Note, hashmap maps X509_NAME to STACK_OF(X509_OBJECT), and the stack is never sorted which may lead to performance impact if stack contains a huge of objects. Before the change | Threads | mean/us | var/us | |---------+-----------+---------| | 1 | 2.434803 | .034190 | | 2 | 3.033588 | .247471 | | 4 | 6.551132 | .150209 | | 6 | 12.548113 | .258445 | | 8 | 17.566257 | .168508 | | 10 | 22.782846 | .182674 | | 12 | 27.928990 | .426779 | | 14 | 32.844572 | .307754 | | 16 | 37.816247 | .660630 | | 18 | 42.662465 | .434926 | After the change | Threads | mean/us | var/us | |---------+----------+---------| | 1 | 2.385398 | .015329 | | 2 | 2.775794 | .172223 | | 4 | 3.071882 | .126400 | | 6 | 3.174147 | .139685 | | 8 | 3.479235 | .297154 | | 10 | 4.206260 | .149006 | | 12 | 5.044039 | .194108 | | 14 | 5.890640 | .185817 | | 16 | 6.447808 | .256179 | | 18 | 7.489261 | .149204 | Resolves: openssl/project#1275 Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org>
Resolves openssl/project#1369 Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org>
3ca20d8 to
73dbf77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. thanks.
|
This pull request is ready to merge |
|
merged to master, thank you! |
Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org> Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Saša Nedvědický <sashan@openssl.org> (Merged from #28599)
X509_STORE was using STACK_OF(X509_OBJECT) which is not ideal structure. The better solution is to use hashmap. The performance gains come from the fact that sorting was removed and therefore read lock is just enough for looking up objects/cert/crls from hashmap. When X509_STORE_get0_objects() is called, the hashmap converts back to the STACK_OF(X509_OBJECT), and goes back to the original implementation with the performance hit on lookup side because stack is not sorted anymore. Note, hashmap maps X509_NAME to STACK_OF(X509_OBJECT), and the stack is never sorted which may lead to performance impact if stack contains a huge of objects. Before the change | Threads | mean/us | var/us | |---------+-----------+---------| | 1 | 2.434803 | .034190 | | 2 | 3.033588 | .247471 | | 4 | 6.551132 | .150209 | | 6 | 12.548113 | .258445 | | 8 | 17.566257 | .168508 | | 10 | 22.782846 | .182674 | | 12 | 27.928990 | .426779 | | 14 | 32.844572 | .307754 | | 16 | 37.816247 | .660630 | | 18 | 42.662465 | .434926 | After the change | Threads | mean/us | var/us | |---------+----------+---------| | 1 | 2.385398 | .015329 | | 2 | 2.775794 | .172223 | | 4 | 3.071882 | .126400 | | 6 | 3.174147 | .139685 | | 8 | 3.479235 | .297154 | | 10 | 4.206260 | .149006 | | 12 | 5.044039 | .194108 | | 14 | 5.890640 | .185817 | | 16 | 6.447808 | .256179 | | 18 | 7.489261 | .149204 | Resolves: openssl/project#1275 Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org> Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Saša Nedvědický <sashan@openssl.org> (Merged from #28599)
Resolves openssl/project#1369 Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org> Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Saša Nedvědický <sashan@openssl.org> (Merged from #28599)
X509_STORE was using STACK_OF(X509_OBJECT) which is not ideal structure. The
better solution is to use hashmap. The performance gains come from the fact that
sorting was removed and therefore read lock is just enough for looking up
objects/cert/crls from hashmap.
When X509_STORE_get0_objects() is called, the hashmap converts back to
the STACK_OF(X509_OBJECT), and goes back to the original
implementation with the performance hit on lookup side because stack is not
sorted anymore.
Note, hashmap maps X509_NAME to STACK_OF(X509_OBJECT), and the stack is never
sorted which may lead to performace impact if stack contains a huge of objects.
Before the change
After the change