Skip to content

Conversation

@npajkovsky
Copy link

@npajkovsky npajkovsky commented Sep 18, 2025

X509_STORE was using STACK_OF(X509_OBJECT) which is not ideal structure. The
better solution is to use hashmap. The performance gains come from the fact that
sorting was removed and therefore read lock is just enough for looking up
objects/cert/crls from hashmap.

When X509_STORE_get0_objects() is called, the hashmap converts back to
the STACK_OF(X509_OBJECT), and goes back to the original
implementation with the performance hit on lookup side because stack is not
sorted anymore.

Note, hashmap maps X509_NAME to STACK_OF(X509_OBJECT), and the stack is never
sorted which may lead to performace impact if stack contains a huge of objects.

Before the change

| Threads |   mean/us |  var/us |
|---------+-----------+---------|
|       1 |  2.434803 | .034190 |
|       2 |  3.033588 | .247471 |
|       4 |  6.551132 | .150209 |
|       6 | 12.548113 | .258445 |
|       8 | 17.566257 | .168508 |
|      10 | 22.782846 | .182674 |
|      12 | 27.928990 | .426779 |
|      14 | 32.844572 | .307754 |
|      16 | 37.816247 | .660630 |
|      18 | 42.662465 | .434926 |

After the change

| Threads |  mean/us |  var/us |
|---------+----------+---------|
|       1 | 2.385398 | .015329 |
|       2 | 2.775794 | .172223 |
|       4 | 3.071882 | .126400 |
|       6 | 3.174147 | .139685 |
|       8 | 3.479235 | .297154 |
|      10 | 4.206260 | .149006 |
|      12 | 5.044039 | .194108 |
|      14 | 5.890640 | .185817 |
|      16 | 6.447808 | .256179 |
|      18 | 7.489261 | .149204 |

@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label Sep 18, 2025
@kroeckx
Copy link
Member

kroeckx commented Sep 18, 2025

Do we have an existing test in perftools that shows a performance improvement? x509storeissuer.c?

@npajkovsky
Copy link
Author

Do we have an existing test in perftools that shows a performance improvement? x509storeissuer.c?

yes, that's the one I've used for the perf table.

@davidben
Copy link
Contributor

davidben commented Sep 18, 2025

I think this is a good direction and is one of the things I had in mind when the saga around #23224, python/cpython#114572, and CVE-2024-0397 was going on. @bob-beck and I were actually just talking about this on Monday.

Alas, there is a compatibility risk here, which is why I suggested in #23224 that you all deprecate X509_STORE_get0_objects. Looks like that half slipped through. (I don't feel I can unilaterally make deprecation decisions on behalf of you all, so I usually just point out problematic APIs that need deprecation and rely on OpenSSL to follow through.)

Not only does X509_STORE_get0_objects give visibility into X509_STORE's internal representation, but it also gives mutable access! I am aware of one project, OpenVPN, that mutates the STACK_OF(X509_OBJECT) in an attempt to erase CRLs from the X509_STORE!
https://github.com/OpenVPN/openvpn/blob/06919a60ae61d6d88546b23b52092f742599a8ae/src/openvpn/ssl_openssl.c#L1328-L1341

It's paired with this operation here. They want to unload the CRL they previously loaded and then replace it with a new one.
https://github.com/OpenVPN/openvpn/blob/06919a60ae61d6d88546b23b52092f742599a8ae/src/openvpn/ssl_openssl.c#L1383

(They also don't do this under X509_STORE_lock. As far as I know, no external code uses X509_STORE_get0_objects under X509_STORE_lock. That was an after-the-fact rationalization. When X509_STORE_get0_objects was added, X509_STORE_lock didn't even exist. Folks just didn't realize that API was broken. See #979.)

I believe this PR would break OpenVPN. The silly thing is that they don't actually need to structure their application this way at all. They just want every verify operation to use a specific single CRL list. They can do this by not adding CRLs to the X509_STORE at all and instead using X509_STORE_CTX_set0_crls on a per-X509_STORE_CTX basis. That can be paired with SSL_CTX_set_cert_verify_callback to configure the CRLs at that point.

Back when #23244 was going on, I wrote a patch for OpenVPN to do this, but it looks like I forgot to do anything with it. I'll see about upstreaming it this week, though you all may still be concerned about the impact to older OpenVPNs.

(There are other issues with X509_STORE and CRLs. I made some notes a year or so ago but haven't had time to chase them down yet.)

Copy link
Contributor

@bob-beck bob-beck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this change. Thank you!

I have given you a terrible idea inline. my apologies in advance.

CRYPTO_EX_DATA ex_data;
CRYPTO_REF_COUNT references;
CRYPTO_RWLOCK *lock;
int objs_need_sync;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field desperately needs a comment explaining what it's for, because it's a core part of the lovely magic ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm going to flesh it out.

@paulidale
Copy link
Contributor

I quite like the (slight) performance improvement in the single threaded case.

Deprecation or removal of X509_STORE_get0_objects() seems more than reasonable. @bob-beck's suggestion has a lot of merit but I wonder about the complexity it's introducing. That's not a good thing for an API we desperately want to disappear.

I'm leaning towards removing the errant API in 4.0 and spelling out the alternative in the migration guide. @davidben has just about written the text.

@t8m
Copy link
Member

t8m commented Sep 19, 2025

I'm leaning towards removing the errant API in 4.0 and spelling out the alternative in the migration guide. @davidben has just about written the text.

I would be +1 to this.

@mattcaswell
Copy link
Member

Going straight to removal seems too aggressive to me and is counter to our policies. Deprecation seems more appropriate.

@t8m
Copy link
Member

t8m commented Sep 19, 2025

Something in the middle would be to deprecate but also always return NULL from that function. The problem is that the function is highly problematic for multithreaded usage but also implementing it properly would require the hack @bob-beck is proposing in #28599 (comment)

@bob-beck
Copy link
Contributor

bob-beck commented Sep 19, 2025 via email

@bob-beck
Copy link
Contributor

bob-beck commented Sep 19, 2025 via email

davidben added a commit to davidben/openvpn that referenced this pull request Sep 19, 2025
There are two ways to load CRLs in OpenSSL. They can be loaded at the
X509_STORE, shared across verifications, or loaded per verification at
the X509_STORE_CTX.

OpenVPN currently does the former. However, it also supports CRL
reloading, and tries to reload the CRL file before each connection.
OpenSSL does not really have a good way to unload objects from an
X509_STORE. OpenVPN currently does it by grabbing the
STACK_OF(X509_OBJECT) out of the X509_STORE and manually deleting all
the CRLs from it.

This mutates an OpenSSL internal object which bumps into problems if
OpenSSL ever switches to a more efficient representation. See
openssl/openssl#28599

(It's also not thread-safe, though it doesn't look like that impacts
OpenVPN? Actually even reading that list doesn't work. See
CVE-2024-0397. This OpenSSL API was simply broken.)

Additionally, this seems to cause two OpenVPN features to not work
together. I gather backend_tls_ctx_reload_crl is trying to clear the
CRLs loaded from last time it ran. But tls_ctx_load_ca with a ca_file
can also load CRLs. tls_ctx_load_ca with ca_path will also pick up CRLs
and backend_tls_ctx_reload_crl actually ends up clobbering some state
X509_LOOKUP_hash_dir internally maintains on the X509_STORE. Likewise,
tls_verify_crl_missing can get confused between
backend_tls_ctx_reload_crl's crl_file-based CRLs and CRLs from
tls_ctx_load_ca.

Avoid all this by tracking the two CRLs separately. crl_file-based CRLs
now go onto a STACK_OF(X509_CRL) tracked on the tls_root_ctx. Now this
field can be freely reloaded by OpenVPN without reconfiguring OpenSSL.
Instead, pass the current value into OpenSSL at verification time.  To
do so, we need to use the SSL_CTX_set_cert_verify_callback, which allows
swapping out the X509_verify_cert call, and also tweaking the
X509_STORE_CTX configuration before starting certificate verification.

Context: SSL_CTX_set_cert_verify_callback and the existing
verify_callback are not the same. SSL_CTX_set_cert_verify_callback wraps
the verification while verify_callback is called multiple times
throughout verification. It's too late to reconfigure X509_STORE_CTX in
verify_callback. verify_callback is usually not what you want.
Sometimes current_cert and error_depth don't quite line up, and
cert_hash_remember may end up called multiple times for a single
certificate.

I suspect some of the other verify_callback logic would also be better
done in the new callback, but I've left it alone to keep this change
minimal. verify_callback is really only usable for suppressing errors.
Application bookkeeping is better down elsewhere.

Signed-off-by: David Benjamin <davidben@google.com>
@paulidale
Copy link
Contributor

paulidale commented Sep 21, 2025

Something in the middle would be to deprecate but also always return NULL from that function.

I could live with this. Otherwise seek a policy exception. I'd initiate the exception process but I've no idea how.

@npajkovsky
Copy link
Author

I'm going to test another two options without hashmap.

  1. many functions just call sk_X509_OBJECT_sort without even checking if stack is sorted. I'm going to check if the stack is sorted, and if not, sort it.
  2. Sort the stack on insert only.

@bob-beck
Copy link
Contributor

bob-beck commented Sep 22, 2025 via email

@npajkovsky
Copy link
Author

My gut feel is hashmap will win here because you know and have control of what you're comparing.

The 2nd option is on par with hashmap but that's because it does not have write-locks anymore.

| Threads |  mean/us |  var/us |
|---------+----------+---------|
|       1 | 2.473687 | .023216 |
|       2 | 3.189462 | .172104 |
|       4 | 3.450176 | .108646 |
|       6 | 3.455258 | .073102 |
|       8 | 3.722448 | .148536 |
|      10 | 4.579252 | .141417 |
|      12 | 5.451128 | .138260 |
|      14 | 6.265318 | .207494 |
|      16 | 7.099082 | .217839 |
|      18 | 7.805925 | .196117 |

I still don't know how much insert got slower.

I added stuff recently to the stack sorting thing so it wouldn't get cleared all the time, and more importantly, would not clear if you inserted the items in sorted order. So if you actually do a find_ex before inserting anything, and use the position from find_ex, you may not ever have to sort.

Excelent. That's what I need now.

@paulidale
Copy link
Contributor

So if you actually do a find_ex before inserting anything, and use the position from find_ex, you may not ever have to sort.

Please don't do this. This is an incredibly inefficient way to sort.

@npajkovsky npajkovsky force-pushed the x509-store3 branch 2 times, most recently from 529c64f to dfb6b6d Compare October 3, 2025 11:43
Copy link
Contributor

@nhorman nhorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the apostrophe in store's on line 35 in X509_STORE_get0_param.pod is a UTF-16 char rather than UTF-8 which is breaking pod2man.

Other than that, looks good

@npajkovsky
Copy link
Author

ping @openssl/committers

nhorman
nhorman previously approved these changes Oct 14, 2025
Copy link
Contributor

@Sashan Sashan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good I just need a clarification on oldval argument when inserting to hash table. thanks a lot.

HT_SET_KEY_FIELD(&key, xn_canon, xn->canon_enc);
HT_SET_KEY_FIELD(&key, xn_canon_enclen, xn->canon_enclen);
val.value = (void *)objs;
ret = ossl_ht_insert(store->objs_ht, TO_HT_KEY(&key), &val, &oldval);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the &oldval argument indicates we are going to replace existing item in hashtable with newly inserted one (&val). I just wonder if we should be doing sk_X509_OBJECT_free(oldval.value)).

I think alternative approach here would be to pass NULL instead of oldval.

I admit I might be wrong (very wrong here) as it is completely new area for me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be always get stack and append, or create a new one, but no stack replacement. Therefore, the NULL should be passed to ossl_ht_insert

Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org>
X509_STORE was using STACK_OF(X509_OBJECT) which is not ideal structure. The
better solution is to use hashmap. The performance gains come from the fact that
sorting was removed and therefore read lock is just enough for looking up
objects/cert/crls from hashmap.

When X509_STORE_get0_objects() is called, the hashmap converts back to
the STACK_OF(X509_OBJECT), and goes back to the original
implementation with the performance hit on lookup side because stack is not
sorted anymore.

Note, hashmap maps X509_NAME to STACK_OF(X509_OBJECT), and the stack is never
sorted which may lead to performance impact if stack contains a huge of objects.

Before the change

| Threads |   mean/us |  var/us |
|---------+-----------+---------|
|       1 |  2.434803 | .034190 |
|       2 |  3.033588 | .247471 |
|       4 |  6.551132 | .150209 |
|       6 | 12.548113 | .258445 |
|       8 | 17.566257 | .168508 |
|      10 | 22.782846 | .182674 |
|      12 | 27.928990 | .426779 |
|      14 | 32.844572 | .307754 |
|      16 | 37.816247 | .660630 |
|      18 | 42.662465 | .434926 |

After the change

| Threads |  mean/us |  var/us |
|---------+----------+---------|
|       1 | 2.385398 | .015329 |
|       2 | 2.775794 | .172223 |
|       4 | 3.071882 | .126400 |
|       6 | 3.174147 | .139685 |
|       8 | 3.479235 | .297154 |
|      10 | 4.206260 | .149006 |
|      12 | 5.044039 | .194108 |
|      14 | 5.890640 | .185817 |
|      16 | 6.447808 | .256179 |
|      18 | 7.489261 | .149204 |

Resolves: openssl/project#1275
Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org>
Resolves openssl/project#1369
Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org>
Copy link
Contributor

@Sashan Sashan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. thanks.

@vavroch2010 vavroch2010 linked an issue Oct 15, 2025 that may be closed by this pull request
@nhorman nhorman added approval: done This pull request has the required number of approvals branch: master Merge to master branch labels Oct 15, 2025
@openssl-machine openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels Oct 16, 2025
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

@nhorman
Copy link
Contributor

nhorman commented Oct 16, 2025

merged to master, thank you!

@nhorman nhorman closed this Oct 16, 2025
openssl-machine pushed a commit that referenced this pull request Oct 16, 2025
Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org>

Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from #28599)
openssl-machine pushed a commit that referenced this pull request Oct 16, 2025
X509_STORE was using STACK_OF(X509_OBJECT) which is not ideal structure. The
better solution is to use hashmap. The performance gains come from the fact that
sorting was removed and therefore read lock is just enough for looking up
objects/cert/crls from hashmap.

When X509_STORE_get0_objects() is called, the hashmap converts back to
the STACK_OF(X509_OBJECT), and goes back to the original
implementation with the performance hit on lookup side because stack is not
sorted anymore.

Note, hashmap maps X509_NAME to STACK_OF(X509_OBJECT), and the stack is never
sorted which may lead to performance impact if stack contains a huge of objects.

Before the change

| Threads |   mean/us |  var/us |
|---------+-----------+---------|
|       1 |  2.434803 | .034190 |
|       2 |  3.033588 | .247471 |
|       4 |  6.551132 | .150209 |
|       6 | 12.548113 | .258445 |
|       8 | 17.566257 | .168508 |
|      10 | 22.782846 | .182674 |
|      12 | 27.928990 | .426779 |
|      14 | 32.844572 | .307754 |
|      16 | 37.816247 | .660630 |
|      18 | 42.662465 | .434926 |

After the change

| Threads |  mean/us |  var/us |
|---------+----------+---------|
|       1 | 2.385398 | .015329 |
|       2 | 2.775794 | .172223 |
|       4 | 3.071882 | .126400 |
|       6 | 3.174147 | .139685 |
|       8 | 3.479235 | .297154 |
|      10 | 4.206260 | .149006 |
|      12 | 5.044039 | .194108 |
|      14 | 5.890640 | .185817 |
|      16 | 6.447808 | .256179 |
|      18 | 7.489261 | .149204 |

Resolves: openssl/project#1275
Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org>

Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from #28599)
openssl-machine pushed a commit that referenced this pull request Oct 16, 2025
Resolves openssl/project#1369
Signed-off-by: Nikola Pajkovsky <nikolap@openssl.org>

Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from #28599)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval: ready to merge The 24 hour grace period has passed, ready to merge branch: master Merge to master branch severity: fips change The pull request changes FIPS provider sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make X509_STORE_lock less contentious