This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Time how long it takes us to do backfill processing #13535
Time how long it takes us to do backfill processing #13535
Changes from all commits
94fb46e
f9dc0dc
2c87008
15081ea
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably want to adjust the fidelity in the buckets once we have some real data.
I've seen the linearizer lock take 45s and calculating the
likely_domains
take 20s for exampleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting the processing time here so we can include the room backfill linearizer lock in the timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem as #13533 (comment)
I don't think this is working how we expect. All of the buckets just have the same values which means it's saying every record is taking less than
1.0s
. It's possible. Did we get thems
tos
conversion wrong?https://prometheus.matrix.org/graph?g0.expr=synapse_federation_backfill_processing_before_time_seconds_bucket&g0.tab=1&g0.stacked=0&g0.show_exemplars=0&g0.range_input=1h&g0.end_input=2022-08-18%2020%3A36%3A02&g0.moment_input=2022-08-18%2020%3A36%3A02
So the graph of the percentiles is just average of all of the buckets below it and we get straight lines, https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1660812148100&to=1660855348100&viewPanel=212
The after timing one looks normal:
https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1660812186785&to=1660855386785&viewPanel=213
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. It gives values like
2.196
when I log it locally.And we do the same thing for existing metrics which look fine, ex.
synapse/synapse/federation/sender/__init__.py
Line 487 in 2c42673
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values in Prometheus are negative: https://prometheus.matrix.org/graph?g0.expr=synapse_federation_backfill_processing_before_time_seconds_sum&g0.tab=1&g0.stacked=0&g0.show_exemplars=0&g0.range_input=1h&g0.end_input=2022-08-22%2017%3A57%3A40&g0.moment_input=2022-08-22%2017%3A57%3A40
Spotted 🕵️♀️ I accidentally did
start - end
🤦♂️Corrected:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #13584
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The after processing can take a while depending on how slow
/state_ids
is.ex.
_process_pulled_events
taking 83s