Skip to content

[vcl] don't set TTL to 0 #28927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 29, 2021
Merged

Conversation

gquintard
Copy link
Contributor

Because of request coalescing, setting the TTL to 0s actually causes
backend request queueing, meaning that for one specific object,
Varnish will only allow one request at a time.

For Varnish 4, set the TTL to 2 minutes, leveraging Hit-for-Pass and
Varnish will transform any hit into a pass for the next 2 minutes,
preventing queueing.

For 5 and 6, the default mechanism is Hit-for-Miss, which works the same
as HfP with the added bonus that if the response changes and we we decide
to cache it, this will override the HfM TTL. So we can set the TTL for
one day.

Contribution checklist (*)

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • All automated tests passed successfully (all builds are green)

Because of request coalescing, setting the TTL to 0s actually causes
backend request queueing, meaning that for one specific object,
Varnish will only allow one request at a time.

For Varnish 4, set the TTL to 2 minutes, leveraging Hit-for-Pass and
Varnish will transform any hit into a pass for the next 2 minutes,
preventing queueing.

For 5 and 6, the default mechanism is Hit-for-Miss, which works the same
as HfP with the added bonus that if the response changes and we we decide
to cache it, this will override the HfM TTL. So we can set the TTL for
one day.
@m2-assistant
Copy link

m2-assistant bot commented Jun 30, 2020

Hi @gquintard. Thank you for your contribution
Here is some useful tips how you can test your changes using Magento test environment.
Add the comment under your pull request to deploy test or vanilla Magento instance:

  • @magento give me test instance - deploy test instance based on PR changes
  • @magento give me 2.4-develop instance - deploy vanilla Magento instance

❗ Automated tests can be triggered manually with an appropriate comment:

  • @magento run all tests - run or re-run all required tests against the PR changes
  • @magento run <test-build(s)> - run or re-run specific test build(s)
    For example: @magento run Unit Tests

<test-build(s)> is a comma-separated list of build names. Allowed build names are:

  1. Database Compare
  2. Functional Tests CE
  3. Functional Tests EE,
  4. Functional Tests B2B
  5. Integration Tests
  6. Magento Health Index
  7. Sample Data Tests CE
  8. Sample Data Tests EE
  9. Sample Data Tests B2B
  10. Static Tests
  11. Unit Tests
  12. WebAPI Tests

You can find more information about the builds here

ℹ️ Please run only needed test builds instead of all when developing. Please run all test builds before sending your PR for review.

For more details, please, review the Magento Contributor Guide documentation.

@gquintard
Copy link
Contributor Author

@magento run all tests

@@ -166,7 +166,7 @@ sub vcl_backend_response {

# cache only successfully responses and 404s
if (beresp.status != 200 && beresp.status != 404) {
set beresp.ttl = 0s;
set beresp.ttl = 120s;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of caching 500 / 503 / 403?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't cache them because of the following set beresp.uncacheable = true;. Instead Varnish will remember that these objects are uncacheable, and will disable request collapsing for them.

If you keep the TTL to 0, you get into something like this:

  • 20 users request with the same object
  • varnish only goes to the backend once (request coalescing)
  • varnish realizes it cannot reuse the object, so it only serves it to one client
  • we now have 19 users request with the same object
  • varnish only goes to the backend once
  • varnish realizes ....
    rinse, repeat. Ultimately, the last request needs to wait 20 round-trip to the backend before being delivered.

If you set the TTL to some positive value AND the uncacheable bit to true:

  • 20 users request with the same object
  • varnish only goes to the backend once (request coalescing)
  • varnish realizes it cannot reuse the object, so it only serves it to one client, but remembers it as uncacheable
  • the 19 remaining requests are woken up, and can all go to the backend in parallel

it's one of the trickiest Varnish gotchas, so please let me know if I wasn't clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gquintard Your explanation sounds fine for me. ✔️

Copy link
Contributor

@lbajsarowicz lbajsarowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ After discussing my concerns with the author.
Thank you for your expertise!

@lbajsarowicz
Copy link
Contributor

@magento run Functional Tests B2B

@magento-engcom-team
Copy link
Contributor

Hi @lbajsarowicz, thank you for the review.
ENGCOM-7765 has been created to process this Pull Request
✳️ @lbajsarowicz, could you please add one of the following labels to the Pull Request?

Label Description
Auto-Tests: Covered All changes in Pull Request is covered by auto-tests
Auto-Tests: Not Covered Changes in Pull Request requires coverage by auto-tests
Auto-Tests: Not Required Changes in Pull Request does not require coverage by auto-tests

@engcom-Alfa
Copy link
Contributor

Hi @lbajsarowicz . Could you put appropriate labels for test coverage #28927 #28928 #28944 ?
Thanks!

@lbajsarowicz lbajsarowicz added the Auto-Tests: Not Required Changes in Pull Request does not require coverage by auto-tests label Jul 24, 2020
@lbajsarowicz
Copy link
Contributor

Automated Tests are not applicable, as the change is related to infrastructure configuration.

@sdzhepa sdzhepa added the Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it label Aug 11, 2020
@sidolov sidolov added Priority: P3 May be fixed according to the position in the backlog. Risk: low Severity: S3 Affects non-critical data or functionality and does not force users to employ a workaround. labels Sep 10, 2020
@engcom-Bravo
Copy link
Contributor

Dev experience is required for testing of this PR. Please note that Manual testing has not been performed.

@hostep
Copy link
Contributor

hostep commented Jun 15, 2021

this PR has low priority and will be processed after all high priority ones

This basically means never, because there will always be higher prio PR's.
I don't think this approach is good and the duration a PR has been open should be considered as well, regardless of priority, no?
This might be a good approach for issues, but definitely not for PR's.

@hostep
Copy link
Contributor

hostep commented Jul 28, 2021

See extra arguments in #33604 (comment) around this change. Maybe the priority should be increased here?

@ihor-sviziev ihor-sviziev added the Area: Perf/Frontend All tickets related with improving frontend performance. label Jul 28, 2021
@bartoszkubicki
Copy link

Should be marked as top priority, it solve possible performance issue

@ihor-sviziev ihor-sviziev added Severity: S1 Affects critical data or functionality and forces users to employ a workaround. and removed Severity: S4 Affects aesthetics, professional look and feel, “quality” or “usability”. labels Jul 28, 2021
@lbajsarowicz
Copy link
Contributor

@gquintard I'm no longer a Magento Maintainer, but I'm really sorry you've been waiting for more than a year to get any feedback. As this PR became... popular (https://twitter.com/IvanChepurnyi/status/1420298850833682432), I'm pretty sure Magento Community Engineering has extra motivation to review and merge it.

For those who encounter the issue - there's always a way to generate Composer Patch: https://patch-diff.githubusercontent.com/raw/magento/magento2/pull/28927.diff and apply that to your project directly without waiting years.

@ThijsFeryn
Copy link

If tests are required to prove the impact of this PR, here are some varnishtest files:

varnishtest "Testing beresp.ttl=0s vs beresp.ttl=120s"
server s1 {
	rxreq
	delay 5
	txresp -status 201 -body "1\n"
} -start
server s2 {
	rxreq
	delay 5
	txresp -status 201 -body "2\n"
} -start
server s3 {
	rxreq
	delay 5
	txresp -status 201 -body "3\n"
} -start
varnish v1 -vcl+backend {
	sub vcl_backend_fetch {
		if (bereq.http.be == "two") {
			set bereq.backend = s2;
		} else if (bereq.http.be == "three") {
			set bereq.backend = s3;
		}
	}
	sub vcl_backend_response {
		if (beresp.status != 200 && beresp.status != 404) {
			set beresp.ttl = 0s;
			set beresp.uncacheable = true;
			return (deliver);
		}
	}
} -start
client c1 {
	txreq
	rxresp
	expect resp.status == 201
	expect resp.body ~ "^1"
} -start
client c2 {
	txreq -hdr "be: two"
	rxresp
	expect resp.status == 201
	expect resp.body ~ "^2"
} -start
client c3 {
	txreq -hdr "be: three"
	rxresp
	expect resp.status == 201
	expect resp.body ~ "^3"
} -start
client c1 -wait
client c2 -wait
client c3 -wait

By storing this test in test1.vtc and running varnishtest test1.vtc, you'll see how it performs. It will take more than 15 seconds because the set beresp.ttl = 0s; causes so-called serialization on the waitinglist.

The servers contain artificial delay which will help us prove our point.

Here's the version with set beresp.ttl = 120s; :

varnishtest "Testing beresp.ttl=0s vs beresp.ttl=120s"
server s1 {
	rxreq
	delay 5
	txresp -status 201 -body "1\n"
} -start
server s2 {
	rxreq
	delay 5
	txresp -status 201 -body "2\n"
} -start
server s3 {
	rxreq
	delay 5
	txresp -status 201 -body "3\n"
} -start
varnish v1 -vcl+backend {
	sub vcl_backend_fetch {
		if (bereq.http.be == "two") {
			set bereq.backend = s2;
		} else if (bereq.http.be == "three") {
			set bereq.backend = s3;
		}
	}
	sub vcl_backend_response {
		if (beresp.status != 200 && beresp.status != 404) {
			set beresp.ttl = 120s;
			set beresp.uncacheable = true;
			return (deliver);
		}
	}
} -start
client c1 {
	txreq
	rxresp
	expect resp.status == 201
	expect resp.body ~ "^1"
} -start
client c2 {
	txreq -hdr "be: two"
	rxresp
	expect resp.status == 201
	expect resp.body ~ "^2"
} -start
client c3 {
	txreq -hdr "be: three"
	rxresp
	expect resp.status == 201
	expect resp.body ~ "^3"
} -start
client c1 -wait
client c2 -wait
client c3 -wait

Store this code in test2.vtc and run varnishtest test2.vtc to see the performance improvement. This test should take a bit over 10 seconds, because 2 out of 3 requests will be handled in parallel.

In a production environment where a lot more concurrency takes place, the performance impact will be more significant than in these isolated test cases.

FYI: the vcl_backend_fetch part was added to the VCL solely to facilitate these tests. The code is required to support parallel processing of requests to the backend server.

@sidolov sidolov added Priority: P2 A defect with this priority could have functionality issues which are not to expectations. and removed Priority: P4 No current plan to fix. Fixing can be deferred as a logical part of more important work. labels Jul 28, 2021
@sidolov
Copy link
Contributor

sidolov commented Jul 28, 2021

Changing priority since PR brings performance improvement for installations using Varnish

@m2-assistant
Copy link

m2-assistant bot commented Jul 29, 2021

Hi @gquintard, thank you for your contribution!
Please, complete Contribution Survey, it will take less than a minute.
Your feedback will help us to improve contribution process.

ihor-sviziev referenced this pull request Oct 28, 2022
* Updated Varnish vcl files
* Included X-Magento-Cache-Debug header in all deployment modes
* Added Web-API test
@ihor-sviziev
Copy link
Contributor

Unfortunately, it got added again in other places in 026e5b2 :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Perf/Frontend All tickets related with improving frontend performance. Auto-Tests: Not Required Changes in Pull Request does not require coverage by auto-tests Award: bug fix Award: category of expertise Component: PageCache Priority: P2 A defect with this priority could have functionality issues which are not to expectations. Release Line: 2.4 Risk: low Severity: S1 Affects critical data or functionality and forces users to employ a workaround. Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it
Projects
None yet
Development

Successfully merging this pull request may close these issues.