Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF] Permit lock mechanism having a 0 second timeout and use that in… #31274

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

seamuslee001
Copy link
Contributor

… MailJobs to speed up the aquisition of Child jobs

Overview

This allows the locking mechanism to only try for 0s rather than 3s to get a lock and specifies mailing job lock timeout to be 0 as well.

Before

0 second timeout not permitted by locking mechansim

After

0 second timeout permitted by locking mechanism

I believe AUG have been using this in their production for a while right ? @andrew-cormick-dockery @johntwyman

ping @JoeMurray @totten

… MailJobs to speed up the aquisition of Child jobs
Copy link

civibot bot commented Oct 11, 2024

🤖 Thank you for contributing to CiviCRM! ❤️ We will need to test and review this PR. 👷

Introduction for new contributors...
  • If this is your first PR, an admin will greenlight automated testing with the command ok to test or add to whitelist.
  • A series of tests will automatically run. You can see the results at the bottom of this page (if there are any problems, it will include a link to see what went wrong).
  • A demo site will be built where anyone can try out a version of CiviCRM that includes your changes.
  • If this process needs to be repeated, an admin will issue the command test this please to rerun tests and build a new demo site.
  • Before this PR can be merged, it needs to be reviewed. Please keep in mind that reviewers are volunteers, and their response time can vary from a few hours to a few weeks depending on their availability and their knowledge of this particular part of CiviCRM.
  • A great way to speed up this process is to "trade reviews" with someone - find an open PR that you feel able to review, and leave a comment like "I'm reviewing this now, could you please review mine?" (include a link to yours). You don't have to wait for a response to get started (and you don't have to stop at one!) the more you review, the faster this process goes for everyone 😄
  • To ensure that you are credited properly in the final release notes, please add yourself to contributor-key.yml
  • For more information about contributing, see CONTRIBUTING.md.
Quick links for reviewers...

➡️ Online demo of this PR 🔗

@civibot civibot bot added the master label Oct 11, 2024
@andrew-cormick-dockery
Copy link
Contributor

This patch is particularly useful in the following situations:

  • Regular mailings of a large volume of recipients (>100,000)
  • A business requirement to ensure that such mailing runs happen as quickly as possible (our target = 10,000 emails/minute)
  • A large number of simultaneous processes operating to process these mailings
  • A job size which is much smaller than the recipient number (ours is set to 1000)

@andrew-cormick-dockery
Copy link
Contributor

We have been running this patch in production for some time now, at least 18 months, including during very heavy traffic times, with success, and with no known drawbacks.

@JoeMurray
Copy link
Contributor

Just pinging @eileenmcnaughton in case you have thoughts/concerns.

@andrew-cormick-dockery can you maybe explain a bit more how the sidecar docker containers are configured/used? 10k emails / minute is impressive througput.

@johntwyman
Copy link
Contributor

It's not fundamentally complicated; we have a bash script that loops continuously to execute this command: cv api3 job.process_mailing. The performance comes by virtue of the manner in which we deploy CiviCRM. We deploy to a Kubernetes cluster which allows us to run multiple instances concurrently. For each instance of CiviCRM, we have a mailing sidecar instance that runs the aforementioned script.

As load increases, the cluster creates more instances of CiviCRM, thus creating more mailing sidecars. So we end up with multiple invocations of job.process_mailing and we've found that they don't really trip over each other in any meaningful way.

And that's about the extent of the trick.

In practice what we observe is that small mailings don't trigger the cluster thresholds for spawning more instances but larger mailings (> 100k recipients for example) do. So throughput increases in line with recipient count (which is perfectly fine for us).

Other thoughts

  • We've had smarty functionality disabled for a long time because it was expensive
  • Custom token evaluation is an area for improvement: we still have to migrate several to the Token Processor pattern
  • Large mailings can generate an influx of opens/clicks; this can slow down our system due to table/row contention in the DB
  • We don't even begin to touch the sides of our throughput limits with Amazon SES

@seamuslee001
Copy link
Contributor Author

Just flagging the test fails do seem to be related at least to my quick eye

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants