Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbal] Use concurrent fetch message approach (no transaction, no pessimistic lock) #613

Merged
merged 7 commits into from
Nov 15, 2018

Conversation

makasim
Copy link
Member

@makasim makasim commented Nov 3, 2018

The consumption logic could be made even simpler and at the same time more performant (still to be proved) cuz there would be fewer round trips, fewer locks.

  • The first select returns an array of 100 message ids. No transaction, no pessimistic lock.
  • Then we walk through the array and try to catch one free message. We can do it with an `UPDATE ... SET deliveryId=ourUniqueId WHERE id=currentMessageId AND deliveryId IS NULL.
  • Check whether we succeeded in previous step or not

fixes #605

@makasim makasim added this to the 0.9 milestone Nov 3, 2018
$this->removeExpiredMessagesLastExecutedAt = microtime(true);
}

if ((microtime(true) - $this->removeExpiredMessagesLastExecutedAt) < 1) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could put that in an elseif so it's not executed needlessly on the first execution

;

if (false == $deliveredMessage) {
throw new \LogicException('There must be a message at all times at this stage but there is no a message.');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not certain. it could be deleted manually or some other concurrent process between the update and the select. unlikely but possible. why not just return null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not happen. Few lines above the message was updated with deliveryId (which is unique and only known by this consumer) therefor if we cannot select it few lines later something went wrong.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I truncate the table, this is gone. And the consumer would fail for no reason.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why keep the exception? This is not race-condition safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I truncate the table, this is gone. And the consumer would fail for no reason.

It would fail, exit and restarted by supervisord or any other process manager.

If we remove the exception we might have a bug or something still would not know about it cuz it would silently continue fetching new messages without a warning.

Truncating a table while there are consumers working on it is not a valid way to do. I could imagine many cases which would go wrong in this case (for example a message has been processed and a consumer tries to acknowledge it but it has gone).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before it used a write lock and thus it protects against concurrent access causing race conditions. Now this is gone, so you need to make your consumers handle that gracefully. Throwing an exception and relying on the consumer to be restarted is just a workaround for a buggy consumer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tobion
Copy link

Tobion commented Nov 4, 2018

The new approach looks ok. Btw, are you sure the old locking approach only locked the first row of the ordered query? Seems questionable considering https://bugs.mysql.com/bug.php?id=67745 and https://hoopercharles.wordpress.com/2011/11/21/select-for-update-in-what-order-are-the-rows-locked/
Also should it not have used SKIP LOCKED so messages can actually be worked off in parrallel?

@makasim
Copy link
Member Author

makasim commented Nov 4, 2018

The new approach looks ok. Btw, are you sure the old locking approach only locked the first row of the ordered query?

I've not been aware of the issue. There is always a room for some surprises. Thank you for pointing out.

The issue comes in only if there is a filesort takes place, right? When the first version (based on pessimistic lock) was released we performed tests to check that it used indexes. We've not run the tests lately.

Also should it not have used SKIP LOCKED so messages can actually be worked off in parrallel?

Is this feature a part of SQL standard? It is supported by MySQL since version 8 only and I am not sure Doctrine DBAL provides an API for using it. I'll look into it.

$this->getConnection()->createQueryBuilder()
->delete($this->getContext()->getTableName())
->andWhere('(time_to_live IS NOT NULL) AND (time_to_live < :now)')
->andWhere('(redelivered = false OR delivery_id IS NULL)')
->setParameter(':now', (int) time(), Type::BIGINT)
->setParameter('redelivered', false, Type::BOOLEAN)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this param is unused


while (microtime() < $endAt) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are comparing string with float here. maybe you meant to use microtime(true)?

@makasim makasim merged commit 1f96246 into master Nov 15, 2018
@makasim makasim deleted the dbal-concurent-fetch branch November 15, 2018 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[dbal] consumption improvements.
2 participants