-
-
Notifications
You must be signed in to change notification settings - Fork 627
Fix FasterGetOrderForNames and add tests. #4331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused by the root cause of this bug. I think the fix in this branch was adding the LIMIT 1
to the legacy query (used when features.FasterGetOrderForNames
is disabled) but I don't understand how if that was the required fix that we didn't see this bug prior to this branch. Can you add some more context to the PR description or elsewhere?
Thanks for the reminder to add more context about what failed, and why we didn't catch it! Glad to have that written down in more detail, since it was subtle. I've updated it in the issue and in the PR desc (same content). I also filed a new issue for a nicer integration test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jsha!
When I originally made this branch, I wrote a patch on top of #4326, then rebased that patch on top of master. That rebase had conflicts, naturally, and I resolved those conflicts but evidently messed it up. This brings back the missing changes by: git show -v 9fa3607 | patch -p1 There was one rejected hunk; I manually made the changes from that hunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A field name typo is causing CI to fail.
Merging with one 🔍 based on available reviewers. |
In #4331 I introduced this new more efficient query for GetOrderForNames, and commented about why we needed an ORDER BY... ASC to efficiently use the index. However, the actually query did not match the comment, and it used DESC. This fixes the query. To demonstrate that the index is actually used with the ASC version, here's the EXPLAIN output after filling up the table with a bunch of failed orders: MariaDB [boulder_sa_integration]> explain select orderID, registrationID FROM orderFqdnSets -> WHERE setHash = UNHEX('B60FE34E4A6735D5A575D81C97F4DFED2102DC179B34252E4AA18F6E2A375C98') -> AND expires > NOW() ORDER BY EXPIRES ASC LIMIT 1 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: orderFqdnSets type: range possible_keys: setHash_expires_idx key: setHash_expires_idx key_len: 37 ref: NULL rows: 1500 Extra: Using index condition 1 row in set (0.000 sec) MariaDB [boulder_sa_integration]> explain select orderID, registrationID FROM orderFqdnSets -> WHERE setHash = UNHEX('B60FE34E4A6735D5A575D81C97F4DFED2102DC179B34252E4AA18F6E2A375C98') -> AND expires > NOW() ORDER BY EXPIRES DESC LIMIT 1 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: orderFqdnSets type: range possible_keys: setHash_expires_idx key: setHash_expires_idx key_len: 37 ref: NULL rows: 1500 Extra: Using where 1 row in set (0.000 sec)
In #4331 I introduced this new more efficient query for GetOrderForNames, and commented about why we needed an ORDER BY... ASC to efficiently use the index. However, the actually query did not match the comment, and it used DESC. This fixes the query. To demonstrate that the index is actually used with the ASC version, here's the EXPLAIN output after filling up the table with a bunch of failed orders: MariaDB [boulder_sa_integration]> explain select orderID, registrationID FROM orderFqdnSets -> WHERE setHash = UNHEX('B60FE34E4A6735D5A575D81C97F4DFED2102DC179B34252E4AA18F6E2A375C98') -> AND expires > NOW() ORDER BY EXPIRES ASC LIMIT 1 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: orderFqdnSets type: range possible_keys: setHash_expires_idx key: setHash_expires_idx key_len: 37 ref: NULL rows: 1500 Extra: Using index condition 1 row in set (0.000 sec) MariaDB [boulder_sa_integration]> explain select orderID, registrationID FROM orderFqdnSets -> WHERE setHash = UNHEX('B60FE34E4A6735D5A575D81C97F4DFED2102DC179B34252E4AA18F6E2A375C98') -> AND expires > NOW() ORDER BY EXPIRES DESC LIMIT 1 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: orderFqdnSets type: range possible_keys: setHash_expires_idx key: setHash_expires_idx key_len: 37 ref: NULL rows: 1500 Extra: Using where 1 row in set (0.000 sec)
This rolls forward #4326 after it was reverted in #4328.
Resolves #4329
The older query didn't have a
LIMIT 1
so it was returning multiple results, but gorp'sSelectOne
was okay with multiple results when the selection was going into anint64
. When I changed this to astruct
in #4326, gorp started producing errors.For this bug to manifest, an account needs to create an order, then fail validation, twice in a row for a given domain name, then create an order once more for the same domain name - that third request will fail because there are multiple orders in the orderFqdnSets table for that domain.
Note that the bug condition doesn't happen when an account does three successful issuances in a row, because finalizing an order (that is, issuing a certificate for it) deletes the row in orderFqdnSets. Failing an authorization does not delete the row in orderFqdnSets. I believe this was an intentional design decision because an authorization can participate in many orders, and those orders can have many other authorizations, so computing the updated state of all those orders would be expensive (remember, order state is not persisted in the DB but is calculated dynamically based on the authorizations it contains).
This wasn't detected in integration tests because we don't have any tests that fail validation for the same domain multiple times. I filed an issue for an integration test that would have incidentally caught this: #4332. There's also a more specific test case in #4331.