-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prebid Cache Cross Data Center Lookup #1620
Comments
Discussed in PBS committee PBC does have a read miss metric but it doesn't distinguish between different reasons like timeout, bad UUID, or wrong datacenter. However, Magnite sees only about 1% read-miss rate, so this doesn't appear to be a major problem. We don't particularly like any of the available measurement solutions, so at this time we're proposing to adopt a wait-and-see approach. If the community has data that shows a more concrete problem, please post it to this issue. |
if you hit the LB'er and go to datacentre 3, instead of 2, what metric is collected there? none? No pubs got anyway to test this, they cant hit the server ip behind the Lb'er, the only thing there going to know if % discrepancies between imps they thought they had V what is/ was "paid" recorded in back systems, there juts going to "take it on the chin" as loss. Oh well 10%, 20%, 5% difference, what can I do? |
We would be seeing cache read misses on datacenter 3. We're not. Are you actually seeing 20% discrepancy between Prebid line items delivered (bids won) and video impressions? If that's the case, then would you be willing to update your ad server creatives to add another parameter? |
We still don't have evidence that this is a problem, but I'll move the ball forward by proposing a relatively small feature based on SyntaxNode's first proposal above:
|
Fwiw, a 1% read miss rate seems like a rather substantial problem to me. |
read misses can come from late or late-and-duplicate requests as well as wrong datacenter. anyhow, appreciate the kick here -- this had dropped off our radar, put it back in the stack of tickets to get done this summer. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This was partly released with PBC-Java 1.13, but there's an outstanding bug where most requests get 'Did not observe any item or terminal signal' errors |
This is a follow-up to #1562 to focus on the situation where a host has multiple Prebid Cache data centers which do not sync with each other and the end user is directed to a different data center for the PUT and GET requests.
Summary
Prebid Cache provides hosts with the ability to configure a variety of different backend storage systems. These storage systems may run in an isolated state or sync with each other. Due to the large amount of data retrieved shortly after being written and the low chance of a cross data center lookup, many hosts including Xandr and Magnite do not sync their data center caches. As @bretg mentioned, it would be impossible (or at least prohibitively expensive) to try to replicate caches of this size globally within milliseconds.
We have not seen evidence of widespread issues with this setup that's been in place for many years, but there are a number of community reports which indicate otherwise. I'd like to begin our investigation by measuring the rate of occurrence to determine if we need to build a solution.
Proposal
Include a new feature for Prebid Cache to determine if a GET request is for a PUT request handled by a different data center. I see two options:
Accept a new query parameter for the GET request which is set the hb_cache_host targeting key via macro resolution. I believe this would be the cleanest solution, but I recognize it requires action to be taken by the publishers. I'm hopeful publishers suspecting this is an issue would be willing to assist in collecting metrics.
Encode the data center into the already automatically generated cache id. Some Prebid Cache calls provide their own cache keys which obviously wouldn't work, but that use case is likely small enough that we can still collect enough metrics.
Thoughts?
The text was updated successfully, but these errors were encountered: