Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #930 - Add new dimension to matchmaking search duration in prometheus #1010

Conversation

benjamin-lawson
Copy link
Contributor

Makes #930 possible by adding the queue population dimension to the search time in prometheus.

@@ -19,8 +19,9 @@


class MatchmakerSearchTimer:
def __init__(self, queue_name):
self.queue_name = queue_name
def __init__(self, queue_name: str, players_in_queue: int):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered changing the constructor to store a reference to the queue itself rather than store the number of players when the search starts, but I didn't want to hold open any potential connections.

@benjamin-lawson benjamin-lawson changed the title Issue 930 - Add new dimension to matchmaking search duration in prometheus Issue #930 - Add new dimension to matchmaking search duration in prometheus Mar 6, 2024
@benjamin-lawson
Copy link
Contributor Author

Closing this PR due to discuss in #930 to move this data to the DB rather than prometheus.

@Brutus5000
Copy link
Member

Prometheus seems the right place for analytical data with historical updating. The db should only contain the "current" state.

@benjamin-lawson
Copy link
Contributor Author

Prometheus seems the right place for analytical data with historical updating. The db should only contain the "current" state.

I think this could use more discussion then as @BlackYps was saying that the DB would be the better location.

@Brutus5000
Copy link
Member

Depends on what is supposed to be done with it. Look at a graph to see the development and derive decision from it? -> Prometheus
Use the information at runtime to adapt the matchmaker dynamically -> db

@benjamin-lawson
Copy link
Contributor Author

Depends on what is supposed to be done with it. Look at a graph to see the development and derive decision from it? -> Prometheus Use the information at runtime to adapt the matchmaker dynamically -> db

My thoughts were to put it into Prometeus to build a model. Once we have that model, we can implement the model predictions into the server code to provide a predicted wait time. I'm not super great with statistics so I'm open to suggestions on tackling this a different way.

@BlackYps
Copy link
Collaborator

BlackYps commented Mar 7, 2024

My idea was to store the date in the db to be able to retrieve up-to-date information.
I'll just post my notes about implementation here for you to read:

Implementation idea:
have deques with rating brackets and party sizes for each queue
whenever a search finishes successfully, store a timestamp, the wait time, rating, newbie count, the average number of people in queue(average of the last ~10 iterations) and the total number of people online in the appropriate deque.

Store the deques in the db once a day and load them on server startup so we don't lose the info on a restart

We could in a first step only collect the info and then get a db dump to analyze the data

For estimates:
make a linear fit of all data in a deque
look up where a certain party would end up on the linear fit to get the wait time

data could be visualized in a rating - queue size plane with the wait time of data points coded as color

@benjamin-lawson
Copy link
Contributor Author

I think we'd be better off going for the estimation option rather than trying to worry about accuracy. We know that even with perfect historical data, we are always going to be off to some degree with a prediction. I'd think the linear regression (or some other regression model) would be much faster to implement and have a decent accuracy.

@BlackYps
Copy link
Collaborator

BlackYps commented Mar 8, 2024

Sorry, but I don't understand what exactly you are comparing and arguing against?
I made only one suggestion for implementation, not two. Can you clarify what you mean?

@benjamin-lawson
Copy link
Contributor Author

My bad. I think I misread your comment. I spent some time last night trying to figure out if we can perform this kind of analysis with Prometheus. I'm not super well versed with Prometheus, but I was having a really hard time trying to figure out a query that works. I think @BlackYps may be right here that we should have a dequeue history table in the DB. Here is a structure of the table I had in mind. Any additional columns we could use in this table?

DB Table

@benjamin-lawson
Copy link
Contributor Author

I took a stab at the database approach. I set up the database table as above. I wrote a stored procedure that takes the number of players in queue and the queue ID and returns the wait time in seconds using linear regression. I've limited it to using 1000 entries as the query is not super optimal. We can fiddle with this number as needed. The image below shows the simulated data in red (random points generated in red) and what the server calculated in blue. I can put up a draft PR for this to demonstrate how this works. My concerns with this approach would be load on the database as the query is not optimal. To mitigate this, we could set up a job to cache the results of this query for each queue and update it at a set interval.

points

@BlackYps
Copy link
Collaborator

BlackYps commented Mar 9, 2024

About the columns:
I don't think we need to reference the player. The only relevant info about the player is what the matchmaker uses for matching. And that is the player's rating and if he is considered a newbie.
To make the prediction more accurate I wonder if we should store the average queue activity around the time the match is found instead of the queue size at exactly the round the match happens. So basically an average of how many people are in queue over the last N iterations.
We could also store the total number of people online at the time to be able to scale the expected wait time better.
I don't think it makes sense to store the events where people simply leave the queue. We can't really gather anything from that.
Lastly, maybe it is better to store the end time and the wait time instead of end time and start time, but I haven't fully thought the implications of this through.

@BlackYps
Copy link
Collaborator

BlackYps commented Mar 9, 2024

Yes, I agree that we should cache the db query in some way to not constantly query the db

@benjamin-lawson
Copy link
Contributor Author

These are good suggestions that I want to respond to one at a time to make sure I don't miss any thing.

I don't think we need to reference the player. The only relevant info about the player is what the matchmaker uses for matching. And that is the player's rating and if he is considered a newbie.

I think this is spot on. I was thinking it may be helpful to know the identity of the player, but we really only care about rating of the player. I thought about factoring in average rating of the queue as well into this prediction.

To make the prediction more accurate I wonder if we should store the average queue activity around the time the match is found instead of the queue size at exactly the round the match happens. So basically an average of how many people are in queue over the last N iterations.

I am thinking we probably don't need this information. Really, this is what the linear regression is doing for us. I think storing the event rather than meta information is better as we can still calculate the metadata from the raw events.

We could also store the total number of people online at the time to be able to scale the expected wait time better. I don't think it makes sense to store the events where people simply leave the queue. We can't really gather anything from that.

Yeah, total online players could play a factor, but I would think this could introduce a lot of error. For instance, if we have some event going on where one queue is seeing a lot more traffic than the others, the calculations may confuse larger online player count with shorter queue times. But in reality, this scenario would have the wait times for other queues be pretty much normal.

Lastly, maybe it is better to store the end time and the wait time instead of end time and start time, but I haven't fully thought the implications of this through.

This one I disagree with. Wait time is but a difference of start time and end time. This is the same thing as above where we can calculate the metadata if we need it, but it would be better to store the raw data and calculate from there.

Yes, I agree that we should cache the db query in some way to not constantly query the db.

I think we can do a generalized estimation for queues (given any player, what will be their wait time?) and do a more nuanced estimation once the player joins the queue or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants