-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #224. I believe this is a Postgres only issue #231
Conversation
I'm pausing on this for just a little as I did a little more digging (probably should have checked before coding a fix....) and there are some significant concerns around adding sub-transactions / savepoint for high volume deployments, especially those with replication / failover. While my proposed solution is database agnostic (ignoring the Postgres performance issues referenced above), the tests cannot be. Both SqLite and mySQL work without a testable failure with either my PR or the original SolidQueue implementation. There look to be some non-nested transaction / savepoint implementations, but they all require different syntax for each of the 3 supported databases. I'm open to investigating those alternatives, but it would help to understand if the SolidQueue team would be open to a PR based on database specific syntax and how to organize the code for any PRs with this approach. Open issues that I could use some direction on:
Thanks, Hal |
32758f8
to
6411faa
Compare
I've revamped this PR to avoid nested transaction. Postgres, unlike mysql and sqlite, invalidates the current transaction on
|
Hey @hms, so sorry for the delay here! I've been totally focused on other stuff (and other projects) but I'm finally trying to wrap up everything for v1.0 and would like to include this fix. This last approach looks great to me. I have some suggestions for the implementation, but it's possible that, being so long ago, you no longer have time (or interest!) to implement these. In that case just let me know, happy to add these myself. Thank you so much for figuring out this one 🙏 |
I'd love any feedback / suggestions to make the PR better and more durable for the project. I'm happy to get this wrapped up to releaseable standards. Hal |
@@ -183,7 +183,46 @@ class ConcurrencyControlsTest < ActiveSupport::TestCase | |||
assert job.reload.ready? | |||
end | |||
|
|||
if ActiveRecord::Base.connection.adapter_name == "PostgreSQL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have TARGET_DB
, I think I'd use that here, like here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is better. I'll submit the change.
SolidQueue::Semaphore.create!(key: key, value: limit - 1, expires_at: expires_at) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... actually, I'm not completely sure if we need this test at all, since it seems it's testing more something internal of Active Record, rather than something we have in Solid Queue 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this really is more of a test of AR, its value is that we have a confirmation of outcomes with the two different ways to add records to the Semaphore table and we show the two mechanisms "side-by-side". I realized that it was missing an assert to meet that goal. See what you think with the extra line.
In the end, if you still feel strongly about cleaning this test up a little, I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I don't feel strongly about this one 😊
@@ -41,16 +41,30 @@ def signal | |||
end | |||
|
|||
private | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this project I'm following the style of not having a space after private
when the class has public and private methods, so this one was on purpose 😅
app/models/solid_queue/semaphore.rb
Outdated
limit == 1 ? false : attempt_decrement | ||
end | ||
|
||
if ActiveRecord::Base.connection.adapter_name == "PostgreSQL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there's a method that checks directly whether the adapter supports unique_by
exposed in Active Record's Abstract adapter, that we could use simply as
connection.supports_insert_conflict_target?
I think this might be more robust than checking the adapter name, because other adapters for PostgreSQL might have a different name, like the PostGIS adapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So funny and just goes to show a bit of tunnel vision. I spent a fair amount of time digging around for an alternative to the original nested transactions methodology and in that digging I came across this method in the Rails adaptors.
The tunnel vision part was no thinking to use those methods as the much, much better way to test which of the code paths to take.
Thanks for this -- it's a great reminder to slow down and think before coding.
app/models/solid_queue/semaphore.rb
Outdated
if ActiveRecord::Base.connection.adapter_name == "PostgreSQL" | ||
alias attempt_creation attempt_creation_with_insert_on_conflict | ||
else | ||
alias attempt_creation attempt_creation_with_create_and_exception_handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that this all could perhaps live in the Semaphore
class in a simpler way. Something like this method:
diff --git a/app/models/solid_queue/semaphore.rb b/app/models/solid_queue/semaphore.rb
index 34bf8dc..1448e67 100644
--- a/app/models/solid_queue/semaphore.rb
+++ b/app/models/solid_queue/semaphore.rb
@@ -17,6 +17,17 @@ module SolidQueue
def signal_all(jobs)
Proxy.signal_all(jobs)
end
+
+
+ def create_unique_by(attributes)
+ if connection.supports_insert_conflict_target?
+ insert(attributes, unique_by: :key).any?
+ else
+ create!(attributes)
+ end
+ rescue ActiveRecord::RecordNotUnique
+ false
+ end
end
class Proxy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then that could be used in Proxy
class like this (BTW I might remove that class in the future and leave everything in Semaphore
... it was not to have to pass all the job attributes around, but still... anyway, not the goal of this PR):
def attempt_creation
if Semaphore.create_unique_by(key: key, value: limit - 1, expires_at: expires_at)
true
else
check_limit_or_decrement
end
end
def check_limit_or_decrement
if limit == 1 then false
else
attempt_decrement
end
end
And you wouldn't need to define aliases.
Thank you for your feedback and suggestions. The PR is significantly better because of them. I have a PR ready. How do you feel about rebasing and squishing the older commits that, given the last set of changes, don't add much value from a history perspective? The only downside is I think your comments might get lost in that process and given how important they are to the finished PR, I'm not sure how I feel about that. Hal |
Hey @hms! Thanks a lot for that! 🙏 No need to squash the commits, totally as you prefer. Feel free to push things and keep the original history. |
Postgres, unlike mysql and sqlite, invalidates the current transaction on a database contraint violation such as an attempt to insert a duplicate key into a unique index. SolidQueue uses a pretty standard design pattern of application level conflict detection and resolution via error handling as part of it's concurrency management in Semaphore::Proxy#attempt create (via create! and rescue). For Postgres based implementations when: * enqueing a job inside a transaction * under load such that the race condition triggerng the simultanious conflicted insert the transaction would become unrecoverable at the application level. There are two simple paths to address this issue in Postgres: * Nested transactions * Insert using 'on conflict do nothing' Nested transaction are the easiest to implement and are portable across all 3 of the Rails standard databases. Unfortunately, nested transactions have a known and very signifant performance problem, especially for databases under load and/or replicated databases with long running queries. Each of the big three database inplements the ANSI standard Insert on conflict do nothing in a very different way. Because of this, this PR only replaces the current implementation of application level conflict handling for Postgres. It takes advantage of the Rails standard ActiveRecord::Persistence::Insert method supporting on conflict semantics.
Postgres only Issue: Jobs that meet all of the following conditions will result in the database connection entering an invalid and unrecoverable state, requiring a rollback before any further database requests (reads or writes) can be made: 1. Use SolidQueue's 'limits_concurrency' macro 2. Are enqueued inside an application-level transaction 3. The limits_concurrency key already exists in the Semaphore table (i.e., this is job 2 - N for a given key) SolidQueue uses the following design pattern to implement the conflict detection of the limits_concurrency macro which works 100% correctly, 100% of the time, with MySQL and SQLite: begin Semaphore.create!(...) no_conflict_path() rescue ActiveRecord::RecordNotUnique handle_concurrency_conflict() end The problem is Postgres: It's not possible to rescue and recover from an insert failing due to an conflict on unique index (or any other database constraint). Until a rollback is executed, the database connection is in an invalid state and unusable. Possible Solutions: 1. Nested transactions + Easiest to implement + Portable across all three standard Rails databases - Has significant performance issues, especially for databases under load or replicated databases with long-running queries (Postgres specific) - Requires using Raise and Rescue for flow of control (performance, less than ideal coding practice) - Requires issuing a rollback (performance, additional command that must round trip from the client to database and back) 2. Insert into the Semaphore table using 'INSERT INTO table (..) VALUES (...) ON CONFLICT DO NOTHING' syntax + ANSI standard syntax (not a Postgres 'one off' solution) + Rails natively supports identifying database adaptors that support this syntax, making the implementation portable and maintainable + Supported by Postgres and allows this issue to be addressed + Does not require Raise and Rescue for flow of control + Performant (native, fast path database functionality) Solution: Leverage Rails connection adaptors allowing code to easily identifying supported database features/functionality to implement strategy rails#2 (INSERT ON CONFLICT) for those databases that support it (i.e., Postgres) and leave the original implementation of rescuing RecordNotUnique for those that do not. Note: Although I'd love to take credit for the "quality" of this implementation, all that credit belongs to @rosa who's excellent feedback on an earlier implementation resulted in this significantly better implementation.
096ae0e
to
c817349
Compare
@rosa I could use some help here. The tests run clean in my local environment over multiple runs. Interestingly, at least part of the error output refers to the Postgres transactional error that was the root cause for this PR. I'm a RSpec person, so I'm not nearly as familiar with mini-test as I should be. Does it wrap tests in transactions? If so, then any test that hits a database managed constraint (unique, checks, foreign keys, etc) are all going to be at risk for trigging: PG::InFailedSqlTransaction: ERROR: current transaction is aborted, commands ignored until end of transaction block (ActiveRecord::StatementInvalid) Hal |
Ohhh yes! I think some of those failures are legit and not related to your changes. I've got a similar fix to what you did here in the works in 3d40b1a |
Thanks again for this, @hms, amazing work 🙏 🙇 |
Per all of my earlier discussion, Postgres transactions become invalid upon any database contraint invalidating an operation and make all follow on database interactions invalid until a rollback is executed.
Semephore#attempt_creation uses a rather common database pattern to rely on database locks to reliably syncrhronize based on uniqueness of values. The catch is that Postgresql requires a little more handholding to use this pattern. By wrapping just the Semaphore.create statement in a SAVEPOINT (a pretend nested transaction), we have what appears to be a transparent operation that works across all of the other supported databases without any known consequences AND allows Postgres to continue to work as expected per the existing code.
This change works across the Solid supported databases (at least based on the tests) and I believe the extra overhead for the non-postgres databases is small enough that special casing the code or resorting to the complexity of dependency injection is just not worth it.
I added two tests to the concurrency_controls_test.rb. One to show the error is real and is easy to reproduce. The other to show the fix fixes it.