Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prebid Cache Cross Data Center Lookup #1620

Open
SyntaxNode opened this issue Dec 9, 2020 · 8 comments
Open

Prebid Cache Cross Data Center Lookup #1620

SyntaxNode opened this issue Dec 9, 2020 · 8 comments
Labels
Intent to implement An issue describing a plan for a major feature. These are intended for community feedback

Comments

@SyntaxNode
Copy link
Contributor

This is a follow-up to #1562 to focus on the situation where a host has multiple Prebid Cache data centers which do not sync with each other and the end user is directed to a different data center for the PUT and GET requests.

Summary

Prebid Cache provides hosts with the ability to configure a variety of different backend storage systems. These storage systems may run in an isolated state or sync with each other. Due to the large amount of data retrieved shortly after being written and the low chance of a cross data center lookup, many hosts including Xandr and Magnite do not sync their data center caches. As @bretg mentioned, it would be impossible (or at least prohibitively expensive) to try to replicate caches of this size globally within milliseconds.

We have not seen evidence of widespread issues with this setup that's been in place for many years, but there are a number of community reports which indicate otherwise. I'd like to begin our investigation by measuring the rate of occurrence to determine if we need to build a solution.

Proposal

Include a new feature for Prebid Cache to determine if a GET request is for a PUT request handled by a different data center. I see two options:

  1. Accept a new query parameter for the GET request which is set the hb_cache_host targeting key via macro resolution. I believe this would be the cleanest solution, but I recognize it requires action to be taken by the publishers. I'm hopeful publishers suspecting this is an issue would be willing to assist in collecting metrics.

  2. Encode the data center into the already automatically generated cache id. Some Prebid Cache calls provide their own cache keys which obviously wouldn't work, but that use case is likely small enough that we can still collect enough metrics.

Thoughts?

@bretg
Copy link
Contributor

bretg commented Dec 11, 2020

Discussed in PBS committee

PBC does have a read miss metric but it doesn't distinguish between different reasons like timeout, bad UUID, or wrong datacenter. However, Magnite sees only about 1% read-miss rate, so this doesn't appear to be a major problem.

We don't particularly like any of the available measurement solutions, so at this time we're proposing to adopt a wait-and-see approach. If the community has data that shows a more concrete problem, please post it to this issue.

@spormeon
Copy link

if you hit the LB'er and go to datacentre 3, instead of 2, what metric is collected there? none? No pubs got anyway to test this, they cant hit the server ip behind the Lb'er, the only thing there going to know if % discrepancies between imps they thought they had V what is/ was "paid" recorded in back systems, there juts going to "take it on the chin" as loss. Oh well 10%, 20%, 5% difference, what can I do?

@bretg
Copy link
Contributor

bretg commented Dec 12, 2020

if you hit the LB'er and go to datacentre 3, instead of 2, what metric is collected there?

We would be seeing cache read misses on datacenter 3. We're not.

Are you actually seeing 20% discrepancy between Prebid line items delivered (bids won) and video impressions? If that's the case, then would you be willing to update your ad server creatives to add another parameter?

@bretg
Copy link
Contributor

bretg commented Jan 8, 2021

We still don't have evidence that this is a problem, but I'll move the ball forward by proposing a relatively small feature based on SyntaxNode's first proposal above:

Accept a new query parameter for the GET request which is set the hb_cache_host targeting key via macro resolution

  1. support a new "ch"(cache host) parameter on the /cache endpoint
    http://HOST_DOMAIN/cache?uuid=%%PATTERN:hb_uuid%%&ch=%%PATTERN:hb_cache_host%%

  2. the hb_cache_host is set by PBS to the actual direct host name of the cache server

             "hb_cache_host": "pg-prebid-server-aws-usw2.rubiconproject.com:443",
    
  3. when PBC receives requests with the 'ch' parameter, it's validated and processed

    a) if the hostname portion is the localhost, then cool, end-of-line. Look up the uuid as normal.
    b) otherwise, verify that the named host is acceptable. We are not an open redirector. e.g. configure a regex in PBC that ensures that all ch values conform to *.hostdomain.com
    c) if the host is ok, proxy the request but remove the ch parameter. One hop only. No chains allowed. Add the other pieces of the URL as needed -- the "https" protocol, the URI path, and the uuid parameter.
    - when the response comes back, log a metric: pbc.proxy.success or pbc.proxy.failure
    - return the value to the client
    e) if the host did not match the regex, just ignore the ch parameter. Look up the uuid as normal.

@patmmccann
Copy link

Fwiw, a 1% read miss rate seems like a rather substantial problem to me.

@bretg
Copy link
Contributor

bretg commented Jun 4, 2021

read misses can come from late or late-and-duplicate requests as well as wrong datacenter.

anyhow, appreciate the kick here -- this had dropped off our radar, put it back in the stack of tickets to get done this summer.

@stale
Copy link

stale bot commented Jan 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 8, 2022
@SyntaxNode SyntaxNode added the Intent to implement An issue describing a plan for a major feature. These are intended for community feedback label Jan 10, 2022
@stale stale bot removed the stale label Jan 10, 2022
@SyntaxNode SyntaxNode added Intent to implement An issue describing a plan for a major feature. These are intended for community feedback and removed Intent to implement An issue describing a plan for a major feature. These are intended for community feedback labels Jan 10, 2022
@bretg
Copy link
Contributor

bretg commented Dec 12, 2022

This was partly released with PBC-Java 1.13, but there's an outstanding bug where most requests get 'Did not observe any item or terminal signal' errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Intent to implement An issue describing a plan for a major feature. These are intended for community feedback
Projects
Status: Ready for Dev
Development

No branches or pull requests

4 participants