Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Allow configuration of Synapse's cache without using synctl or environment variables #6391

Merged
merged 66 commits into from
May 11, 2020
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
99ab65a
fix up logging to use rapidjson
hawkowl Oct 24, 2019
d684ec8
benchmarks
hawkowl Oct 24, 2019
135fdaa
update linting
hawkowl Oct 31, 2019
f36f3ab
Merge remote-tracking branch 'origin/develop' into hawkowl/structured…
hawkowl Oct 31, 2019
73cfdeb
fix
hawkowl Oct 31, 2019
babf4ea
Merge branch 'develop' into hawkowl/structured-logging-perf
hawkowl Nov 1, 2019
e55591d
Merge remote-tracking branch 'origin/develop' into hawkowl/structured…
hawkowl Nov 4, 2019
c580eb3
revert this
hawkowl Nov 4, 2019
f888515
Merge remote-tracking branch 'origin/develop' into hawkowl/structured…
hawkowl Nov 4, 2019
a14831d
Move cache configuration into a homeserver config option, instead of …
hawkowl Nov 20, 2019
50fcb4a
Re-sync every so often, in case caches appear
hawkowl Nov 21, 2019
8439ce5
Merge remote-tracking branch 'origin/develop' into hawkowl/structured…
hawkowl Nov 25, 2019
92d4d13
fixes
hawkowl Dec 2, 2019
d70be18
Merge remote-tracking branch 'origin/develop' into hawkowl/structured…
hawkowl Dec 2, 2019
c11c8ad
more fixes
hawkowl Dec 2, 2019
0e368ee
more fixes
hawkowl Dec 2, 2019
f7ec526
add some LruCache benchmarks
hawkowl Dec 2, 2019
e174b2d
fix style
hawkowl Dec 2, 2019
946650c
Merge remote-tracking branch 'origin/develop' into hawkowl/benchmark-…
hawkowl Dec 3, 2019
9735a08
newsfile
hawkowl Dec 3, 2019
c76a412
Merge remote-tracking branch 'origin/develop' into hawkowl/cache-conf…
hawkowl Jan 16, 2020
0a02b2a
cleanup
hawkowl Jan 16, 2020
a21702f
cleanup so it can refer to late-config
hawkowl Jan 16, 2020
125c5a0
Merge branch 'hawkowl/benchmark-lrucache' into hawkowl/cache-config-w…
hawkowl Jan 16, 2020
0b069b7
make expiring caches resizeable
hawkowl Jan 16, 2020
a96b5d9
Load cache factors from the environment, and add a test.
hawkowl Jan 21, 2020
2f4dbfa
document as well as refactor so that CacheMetric is not nested
hawkowl Jan 21, 2020
ac020de
fix style
hawkowl Jan 21, 2020
18c1dbf
don't need this comment
hawkowl Jan 21, 2020
4aeb6fb
Merge remote-tracking branch 'origin/develop' into hawkowl/cache-conf…
hawkowl Feb 17, 2020
5f508e7
add tests for individual cache sizing, and fix up the individual cach…
hawkowl Feb 17, 2020
2619891
fix
hawkowl Feb 17, 2020
0fc0a7d
fix
hawkowl Feb 27, 2020
965e259
fixes
hawkowl Feb 27, 2020
f004cee
Merge remote-tracking branch 'origin/develop' into hawkowl/cache-conf…
hawkowl Feb 27, 2020
546c23e
Merge branch 'develop' of github.com:matrix-org/synapse into hawkowl/…
anoadragon453 Apr 3, 2020
7117b89
Update synapse/config/cache.py
anoadragon453 Apr 3, 2020
4ed7aa1
Move separate CacheConfig global vars into a single global dict
anoadragon453 Apr 3, 2020
be0737a
Move _DEFAULT_CONFIG to class method
anoadragon453 Apr 3, 2020
14fbf27
Merge branch 'hawkowl/cache-config-without-synctl' of github.com:matr…
anoadragon453 Apr 3, 2020
cc1dd98
lint
anoadragon453 Apr 3, 2020
de25659
Fix typing
anoadragon453 Apr 3, 2020
6e979a1
sample config
anoadragon453 Apr 3, 2020
fec13be
Fix lint
anoadragon453 Apr 6, 2020
905c833
Add some mypy ignores
anoadragon453 Apr 6, 2020
f300c08
Merge branch 'develop' of github.com:matrix-org/synapse into hawkowl/…
anoadragon453 Apr 24, 2020
28f7f59
Re-add default_size_factor multipler. Revert _max_size modification
anoadragon453 Apr 24, 2020
5f3eddc
Refactor globals in CacheConfig
anoadragon453 Apr 24, 2020
fcc9c3a
dict() -> {}
anoadragon453 Apr 24, 2020
5acca12
Some variable cleanups in cache.py
anoadragon453 Apr 27, 2020
fdef3ad
Make add_resizable_cache args clearer in cache tests
anoadragon453 Apr 27, 2020
ca98241
lint
anoadragon453 Apr 27, 2020
48709a9
Pull global cache prefix from environment by default
anoadragon453 Apr 30, 2020
9e3a9ba
Make config follow the guidelines
anoadragon453 May 1, 2020
fa56e16
sample config
anoadragon453 May 1, 2020
83cf583
Allow creating an LruCache that's not affected by cache factor
anoadragon453 May 1, 2020
424215a
Fix cache: config block parsing
anoadragon453 May 1, 2020
c7f3bf6
Environment takes precedence over config values
anoadragon453 May 1, 2020
485f177
fix tests
anoadragon453 May 1, 2020
fe89050
Fix and clarify config comments
anoadragon453 May 1, 2020
ae6070b
Move event_cache_size to CacheConfig
anoadragon453 May 1, 2020
4448633
Follow config file conventions. Move cache config location
anoadragon453 May 4, 2020
f8ed432
Attempt to explain the default global cache factor
anoadragon453 May 4, 2020
c1b339b
Merge branch 'develop' of github.com:matrix-org/synapse into hawkowl/…
anoadragon453 May 4, 2020
06fbfc2
Merge branch 'develop' of github.com:matrix-org/synapse into hawkowl/…
anoadragon453 May 5, 2020
edb04df
Merge branch 'develop' of github.com:matrix-org/synapse into hawkowl/…
anoadragon453 May 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/6391.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Synapse's cache factor can now be configured in `homeserver.yaml` by the `caches.global_factor` setting. Additionally, `caches.per_cache_factors` controls the cache factors for individual caches.
1 change: 1 addition & 0 deletions changelog.d/6446.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add benchmarks for LruCache.
4 changes: 2 additions & 2 deletions synapse/api/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
from synapse.api.room_versions import KNOWN_ROOM_VERSIONS
from synapse.config.server import is_threepid_reserved
from synapse.types import StateMap, UserID
from synapse.util.caches import CACHE_SIZE_FACTOR, register_cache
from synapse.util.caches import register_cache
from synapse.util.caches.lrucache import LruCache
from synapse.util.metrics import Measure

Expand Down Expand Up @@ -72,7 +72,7 @@ def __init__(self, hs):
self.store = hs.get_datastore()
self.state = hs.get_state_handler()

self.token_cache = LruCache(CACHE_SIZE_FACTOR * 10000)
self.token_cache = LruCache(10000)
register_cache("cache", "token_cache", self.token_cache)

self._account_validity = hs.config.account_validity
Expand Down
5 changes: 2 additions & 3 deletions synapse/app/homeserver.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@
from synapse.storage import DataStore
from synapse.storage.engines import IncorrectDatabaseSetup
from synapse.storage.prepare_database import UpgradeDatabaseException
from synapse.util.caches import CACHE_SIZE_FACTOR
from synapse.util.httpresourcetree import create_resource_tree
from synapse.util.manhole import manhole
from synapse.util.module_loader import load_module
Expand Down Expand Up @@ -484,8 +483,8 @@ def phone_stats_home(hs, stats, stats_process=_stats_process):

daily_sent_messages = yield hs.get_datastore().count_daily_sent_messages()
stats["daily_sent_messages"] = daily_sent_messages
stats["cache_factor"] = CACHE_SIZE_FACTOR
stats["event_cache_size"] = hs.config.event_cache_size
stats["cache_factor"] = hs.config.caches.global_factor
stats["event_cache_size"] = hs.config.caches.event_cache_size

#
# Performance statistics
Expand Down
112 changes: 112 additions & 0 deletions synapse/config/cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# -*- coding: utf-8 -*-
# Copyright 2019 Matrix.org Foundation C.I.C.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
from typing import Dict

from ._base import Config, ConfigError

_CACHES = {}
_CACHE_PREFIX = "SYNAPSE_CACHE_FACTOR"
DEFAULT_CACHE_SIZE_FACTOR = float(os.environ.get(_CACHE_PREFIX, 0.5))

_DEFAULT_CONFIG = """\
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
# Cache configuration
#
# 'global_factor' controls the global cache factor. This overrides the
# "SYNAPSE_CACHE_FACTOR" environment variable.
#
# 'per_cache_factors' is a dictionary of cache name to cache factor for that
# individual cache.
#
#caches:
# global_factor: 0.5
# per_cache_factors:
# get_users_who_share_room_with_user: 2
#
"""

# Callback to ensure that all caches are the correct size, registered when the
# configuration has been loaded.
_ENSURE_CORRECT_CACHE_SIZING = None


def add_resizable_cache(cache_name, cache_resize_callback):
_CACHES[cache_name.lower()] = cache_resize_callback
if _ENSURE_CORRECT_CACHE_SIZING:
_ENSURE_CORRECT_CACHE_SIZING()


class CacheConfig(Config):
section = "caches"
_environ = os.environ

@staticmethod
def _reset():
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
global DEFAULT_CACHE_SIZE_FACTOR
global _ENSURE_CORRECT_CACHE_SIZING
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved

DEFAULT_CACHE_SIZE_FACTOR = float(os.environ.get(_CACHE_PREFIX, 0.5))
_ENSURE_CORRECT_CACHE_SIZING = None
_CACHES.clear()

def read_config(self, config, **kwargs):
self.event_cache_size = self.parse_size(config.get("event_cache_size", "10K"))

global DEFAULT_CACHE_SIZE_FACTOR
global _ENSURE_CORRECT_CACHE_SIZING

cache_config = config.get("caches", {})

self.global_factor = cache_config.get(
"global_factor", DEFAULT_CACHE_SIZE_FACTOR
)
if not isinstance(self.global_factor, (int, float)):
raise ConfigError("caches.global_factor must be a number.")

# Set the global one so that it's reflected in new caches
DEFAULT_CACHE_SIZE_FACTOR = self.global_factor

# Load cache factors from the environment, but override them with the
# ones in the config file if they exist
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
individual_factors = {
key[len(_CACHE_PREFIX) + 1 :].lower(): float(val)
for key, val in self._environ.items()
if key.startswith(_CACHE_PREFIX + "_")
}

individual_factors_config = cache_config.get("per_cache_factors", {}) or {}
if not isinstance(individual_factors_config, dict):
raise ConfigError("caches.per_cache_factors must be a dictionary")

individual_factors.update(individual_factors_config)

self.cache_factors = dict() # type: Dict[str, float]

for cache, factor in individual_factors.items():
if not isinstance(factor, (int, float)):
raise ConfigError(
"caches.per_cache_factors.%s must be a number" % (cache.lower(),)
)
self.cache_factors[cache.lower()] = factor

# Register the global callback so that the individual cache sizes get set.
def ensure_cache_sizes():
for cache_name, callback in _CACHES.items():
new_factor = self.cache_factors.get(cache_name, self.global_factor)
callback(new_factor)

_ENSURE_CORRECT_CACHE_SIZING = ensure_cache_sizes
_ENSURE_CORRECT_CACHE_SIZING()
2 changes: 0 additions & 2 deletions synapse/config/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,6 @@ class DatabaseConfig(Config):
section = "database"

def read_config(self, config, **kwargs):
self.event_cache_size = self.parse_size(config.get("event_cache_size", "10K"))

# We *experimentally* support specifying multiple databases via the
# `databases` key. This is a map from a label to database config in the
# same format as the `database` config option, plus an extra
Expand Down
2 changes: 2 additions & 0 deletions synapse/config/homeserver.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from ._base import RootConfig
from .api import ApiConfig
from .appservice import AppServiceConfig
from .cache import CacheConfig
from .captcha import CaptchaConfig
from .cas import CasConfig
from .consent_config import ConsentConfig
Expand Down Expand Up @@ -80,4 +81,5 @@ class HomeServerConfig(RootConfig):
RoomDirectoryConfig,
ThirdPartyRulesConfig,
TracerConfig,
CacheConfig,
]
4 changes: 2 additions & 2 deletions synapse/http/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@
from synapse.logging.context import make_deferred_yieldable
from synapse.logging.opentracing import set_tag, start_active_span, tags
from synapse.util.async_helpers import timeout_deferred
from synapse.util.caches import CACHE_SIZE_FACTOR

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -241,7 +240,8 @@ def __getattr__(_self, attr):
# tends to do so in batches, so we need to allow the pool to keep
# lots of idle connections around.
pool = HTTPConnectionPool(self.reactor)
pool.maxPersistentPerHost = max((100 * CACHE_SIZE_FACTOR, 5))
# XXX: Why does this use the cache factor????
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was just a handy way of making it scale for larger instances. The logic went that if you're running a larger instance, you'll bump your cache factor, and will probably want a bigger connection pool too. it's a bit of a hack, but otoh I didn't want to have millions of options that you have to tune.

pool.maxPersistentPerHost = max((100 * hs.config.caches.global_factor, 5))
pool.cachedConnectionTimeout = 2 * 60

# The default context factory in Twisted 14.0.0 (which we require) is
Expand Down
12 changes: 8 additions & 4 deletions synapse/metrics/_exposition.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@

from twisted.web.resource import Resource

from synapse.util import caches

try:
from prometheus_client.samples import Sample
except ImportError:
Expand Down Expand Up @@ -103,13 +105,15 @@ def nameify_sample(sample):


def generate_latest(registry, emit_help=False):
output = []

for metric in registry.collect():
# Trigger the cache metrics to be rescraped, which updates the common
# metrics but do not produce metrics themselves
for collector in caches.collectors_by_name.values():
collector.collect()

if metric.name.startswith("__unused"):
continue
output = []

for metric in registry.collect():
if not metric.samples:
# No samples, don't bother.
continue
Expand Down
4 changes: 3 additions & 1 deletion synapse/push/bulk_push_rule_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
"cache",
"push_rules_delta_state_cache_metric",
cache=[], # Meaningless size, as this isn't a cache that stores values
resizable=False,
)


Expand All @@ -67,7 +68,8 @@ def __init__(self, hs):
self.room_push_rule_cache_metrics = register_cache(
"cache",
"room_push_rule_cache",
cache=[], # Meaningless size, as this isn't a cache that stores values
cache=[], # Meaningless size, as this isn't a cache that stores values,
resizable=False,
)

@defer.inlineCallbacks
Expand Down
4 changes: 2 additions & 2 deletions synapse/push/push_rule_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from six import string_types

from synapse.types import UserID
from synapse.util.caches import CACHE_SIZE_FACTOR, register_cache
from synapse.util.caches import register_cache
from synapse.util.caches.lrucache import LruCache

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -149,7 +149,7 @@ def _get_value(self, dotted_key):


# Caches (glob, word_boundary) -> regex for push. See _glob_matches
regex_cache = LruCache(50000 * CACHE_SIZE_FACTOR)
regex_cache = LruCache(50000)
register_cache("cache", "regex_push_cache", regex_cache)


Expand Down
3 changes: 1 addition & 2 deletions synapse/replication/slave/storage/client_ips.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@

from synapse.storage.data_stores.main.client_ips import LAST_SEEN_GRANULARITY
from synapse.storage.database import Database
from synapse.util.caches import CACHE_SIZE_FACTOR
from synapse.util.caches.descriptors import Cache

from ._base import BaseSlavedStore
Expand All @@ -26,7 +25,7 @@ def __init__(self, database: Database, db_conn, hs):
super(SlavedClientIpStore, self).__init__(database, db_conn, hs)

self.client_ip_last_seen = Cache(
name="client_ip_last_seen", keylen=4, max_entries=50000 * CACHE_SIZE_FACTOR
name="client_ip_last_seen", keylen=4, max_entries=50000
)

def insert_client_ip(self, user_id, access_token, ip, user_agent, device_id):
Expand Down
4 changes: 1 addition & 3 deletions synapse/state/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@
from synapse.storage.data_stores.main.events_worker import EventRedactBehaviour
from synapse.types import StateMap
from synapse.util.async_helpers import Linearizer
from synapse.util.caches import get_cache_factor_for
from synapse.util.caches.expiringcache import ExpiringCache
from synapse.util.metrics import Measure, measure_func

Expand All @@ -53,7 +52,6 @@
KeyStateTuple = namedtuple("KeyStateTuple", ("context", "type", "state_key"))


SIZE_OF_CACHE = 100000 * get_cache_factor_for("state_cache")
EVICTION_TIMEOUT_SECONDS = 60 * 60


Expand Down Expand Up @@ -447,7 +445,7 @@ def __init__(self, hs):
self._state_cache = ExpiringCache(
cache_name="state_cache",
clock=self.clock,
max_len=SIZE_OF_CACHE,
max_len=100000,
expiry_ms=EVICTION_TIMEOUT_SECONDS * 1000,
iterable=True,
reset_expiry_on_get=True,
Expand Down
3 changes: 1 addition & 2 deletions synapse/storage/data_stores/main/client_ips.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
from synapse.metrics.background_process_metrics import wrap_as_background_process
from synapse.storage._base import SQLBaseStore
from synapse.storage.database import Database
from synapse.util.caches import CACHE_SIZE_FACTOR
from synapse.util.caches.descriptors import Cache

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -367,7 +366,7 @@ class ClientIpStore(ClientIpBackgroundUpdateStore):
def __init__(self, database: Database, db_conn, hs):

self.client_ip_last_seen = Cache(
name="client_ip_last_seen", keylen=4, max_entries=50000 * CACHE_SIZE_FACTOR
name="client_ip_last_seen", keylen=4, max_entries=50000
)

super(ClientIpStore, self).__init__(database, db_conn, hs)
Expand Down
2 changes: 1 addition & 1 deletion synapse/storage/data_stores/main/events_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def __init__(self, database: Database, db_conn, hs):
super(EventsWorkerStore, self).__init__(database, db_conn, hs)

self._get_event_cache = Cache(
"*getEvent*", keylen=3, max_entries=hs.config.event_cache_size
"*getEvent*", keylen=3, max_entries=hs.config.caches.event_cache_size
)

self._event_fetch_lock = threading.Condition()
Expand Down
4 changes: 2 additions & 2 deletions synapse/storage/data_stores/main/relations.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@


class RelationsWorkerStore(SQLBaseStore):
@cached(tree=True)
@cached(tree=True, max_entries="event_cache_size")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're changing this to match the event cache size rather than allowing it to be configured separately? This feels like something that should be considered separately (and we should think how to support things that have already configured these).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the installs I've seen it, the cache for get_relations_for_event and such needs to be closer to the event cache size than zero, since a lot of times in sync or whatever, for each event it will also look for the relations and aggregations for it. It was the major thrashing candidate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can believe that, though on matrix.org I've noticed that that's not true on a lot of workers. I'd be tempted to remove it and we can add it back with a way of overriding this behaviour if you set a per cache factor override.

def get_relations_for_event(
self,
event_id,
Expand Down Expand Up @@ -133,7 +133,7 @@ def _get_recent_references_for_event_txn(txn):
"get_recent_references_for_event", _get_recent_references_for_event_txn
)

@cached(tree=True)
@cached(tree=True, max_entries="event_cache_size")
def get_aggregation_groups_for_event(
self,
event_id,
Expand Down
6 changes: 2 additions & 4 deletions synapse/storage/data_stores/state/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@
from synapse.storage.database import Database
from synapse.storage.state import StateFilter
from synapse.types import StateMap
from synapse.util.caches import get_cache_factor_for
from synapse.util.caches.descriptors import cached
from synapse.util.caches.dictionary_cache import DictionaryCache

Expand Down Expand Up @@ -90,11 +89,10 @@ def __init__(self, database: Database, db_conn, hs):
self._state_group_cache = DictionaryCache(
"*stateGroupCache*",
# TODO: this hasn't been tuned yet
50000 * get_cache_factor_for("stateGroupCache"),
50000,
)
self._state_group_members_cache = DictionaryCache(
"*stateGroupMembersCache*",
500000 * get_cache_factor_for("stateGroupMembersCache"),
"*stateGroupMembersCache*", 500000,
)

@cached(max_entries=10000, iterable=True)
Expand Down
Loading