Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a CacheStrategy for multi-instance data caching #3043

Closed
michaelbromley opened this issue Sep 5, 2024 · 2 comments
Closed

Introduce a CacheStrategy for multi-instance data caching #3043

michaelbromley opened this issue Sep 5, 2024 · 2 comments
Assignees
Labels
design 📐 This issue deals with high-level design of a feature type: feature ✨ @vendure/core
Milestone

Comments

@michaelbromley
Copy link
Member

michaelbromley commented Sep 5, 2024

The Problem

There are several places where Vendure core makes use of caching in order to significantly improve performance:

  • We have the TtlCache class which is used in several places to cache calculated values in memory for a given time
  • We have the SelfRefreshingCache type which is used to cache all zones, channels and tax rates in memory because they are accessed extremely frequently

The issue with those solutions is that they are in-memory-only, and therefore local to the specific server/worker instance.

Being in-memory has two major downsides:

  1. Duplication: each instance of the server & worker maintains its own cache. On a typical deployment with 2x server & 2x worker processes, this means 4x the work needs to be done to fill up 4 caches. Load-balancing means that they will be filled unevenly, which is inefficient.
  2. Invalidation: we cannot implement an invalidation method which works across multiple instances. E.g. the TaxRate cache: we would like to keep it cached forever until a TaxRate is changed/added/removed. We could naively set up an EventBus subscriber to clear the cache when we detect a change. However, the change event will only be published on one instance. Any other instances will not know about that event, and their cache is now stale.

This second point is the reason we use TTLs with relatively short times. Again, leading to more work being done on the database.

It is also the reason we do not use caching in other scenarios that could radically improve performance: we cannot reliably invalidate a cache that is not shared by all instances.

Example

This issue was originally motivated by an investigation I am conducting into the performance of the order-related mutations. Using a prototype of this caching approach, I was able to speed up my benchmark by ~2.5x and cut the p(95) response time from 6.98s to 3s.

Proposed Solution

I propose introducing a shared caching mechanism into the core: CacheStrategy. This will be strategy-based allowing you to decide whether you want to store that cache in:

  • the database
  • Redis
  • some other key-value storage system

The CacheStrategy would replace all existing caching mechanisms mentioned above, and would unlock the opportunity to make huge performance gains in currently slow areas like:

  • Checking Promotions on large orders
  • Evaluating collection filters

Because the cache is shared, it means as soon as one instance has cached a value, it will be available to all instances.

Design

At the most basic, the CacheStrategy will implement the typical cache methods: get() add(), delete().

It should also support key eviction via TTL which would be configurable per key.

It should be able to store JSON-like data, i.e. any serializable JS data structure, just like we already support with the job queue.

Here's a sketch of how it would look:

export interface CacheStrategy extends InjectableStrategy {

  get<T>(key: string): Promise<T>;

  add<T>(key: string, value: JsonCompatible<T>, options: { ttl?: number }): Promise<void>

  delete(key: string): Promise<boolean>

  // We could also include convenience methods to replicate the 
  // functionality of the SelfRefreshingCache interface.
  
}

Backward Compatibility

The implementation of CacheStrategy needs to be done in a backward-compatible way, so no changes are needed by the user when upgrading.

  • The createSelfRefreshingCache() function and TtlCache class would be deprecated, and internally their usage would be replaced with CacheStrategy
  • By default we will use an InMemoryCacheStrategy which duplicates the current behaviour.
  • We will also ship an SqlCacheStrategy which stores the cache in a key-value table in the database, using a JSON sql type for the value.
  • We will provide a reference Redis-based strategy for those who want to use Redis.

Summary

This proposal has the following benefits:

  • Consolidates existing caching mechanisms into a single concept
  • Makes it work across horizontally-scaled deployments
  • Enables correct cache invalidation and therefore much more efficient & correct caching
  • Is configurable following our well-used strategy pattern
  • Is 100% backward-compatible
@michaelbromley michaelbromley changed the title Proposal: Introduce a SystemCacheStrategy for multi-instance data caching Proposal: Introduce a CacheStrategy for multi-instance data caching Sep 6, 2024
@dlhck
Copy link
Collaborator

dlhck commented Sep 6, 2024

We should add the support for cache tags. In many scenarios you want to delete items in the cache for a certain namespace, e.g. delete all cached values for a product or a zone. Pimcore has a neat implementation, where we can get some inspiration from: https://pimcore.com/docs/platform/Pimcore/Development_Tools_and_Details/Cache/#overview-of-functionalities

I would also recommend that we take a look at the caching architecture of Symfony as it is a really sophisticated one: https://symfony.com/doc/current/components/cache.html#generic-caching-psr-6

@michaelbromley michaelbromley pinned this issue Sep 10, 2024
michaelbromley added a commit that referenced this issue Sep 10, 2024
Relates to #3043

BREAKING CHANGE: If you are using the `FacetValueChecker` utility class, you should
update your code to get it via the `Injector` rather than directly instantiating it.

Existing code _will_ still work without changes, but by updating you will see improved
performance due to new caching techniques.

```diff
- facetValueChecker = new FacetValueChecker(injector.get(TransactionalConnection));
+ facetValueChecker = injector.get(FacetValueChecker);
```
michaelbromley added a commit that referenced this issue Sep 10, 2024
Relates to #3043. This plugin implements a simple SQL cache strategy to store cache items
in the main database. The implementation needs further testing and potential
performance optimization.
@dlhck dlhck moved this to 📦 Backlog in Vendure OS Roadmap Sep 23, 2024
@dlhck dlhck moved this from 📦 Backlog to 📅 Planned in Vendure OS Roadmap Sep 24, 2024
@dlhck dlhck added this to the v3.1 milestone Sep 24, 2024
@dlhck dlhck added the design 📐 This issue deals with high-level design of a feature label Sep 24, 2024
@michaelbromley
Copy link
Member Author

michaelbromley commented Sep 24, 2024

Cache Tags

Tags are a mechanism of grouping cache items in order to make it possible to invalidate all items based on tags.

Prior Art

Symfony

https://symfony.com/doc/current/components/cache/cache_invalidation.html#using-cache-tags

// invalidate all items related to `tag_1` or `tag_3`
$cache->invalidateTags(['tag_1', 'tag_3']);

// if you know the cache key, you can also delete the item directly
$cache->delete('cache_key');

In the Symphony (& PSR-6 in general) implementation, cache items are wrapped into a CacheItem class, which also allows tags to be set on the item:

// add one or more tags
$item->tag('tag_1');
$item->tag(['tag_2', 'tag_3']);

Laravel

https://laravel.com/docs/11.x/cache

Laravel had a tags implementation but it was recently removed (at least from the documentation):

It looks like their use of tags was badly designed - you can only invalidate by tags when the array of tags exactly matches. explanation

Drupal

https://www.drupal.org/docs/drupal-apis/cache-api/cache-tags

Any cache backend should implement CacheBackendInterface, so when you set a cache item with the ::set() method, provide third and fourth arguments e.g:

$cache_backend->set(
  $cid, $data, Cache::PERMANENT, ['node:5', 'user:7']
);

This stores a cache item with ID $cid permanently (i.e., stored indefinitely), but makes it susceptible to invalidation through either the node:5 or user:7 cache tags.

Redis-tag-cache

A package from Max Stoiber that implements a very simple (1 file) Redis cache with tags. We can use this as inspiration for our Redis version.

https://www.npmjs.com/package/redis-tag-cache

This implements the solution given in this SO answer using a separate list of keys for each tag and then smembers: https://stackoverflow.com/a/40649819/772859

Implementation

The consensus design is that any cache item can be tagged with one or more string tags. Later you can invalidate by tag and all entries that have that tag will be invalidated.

We need to have 3 concrete implementations:

  • in-memory (default)
  • database
  • redis

The common structure for tags will be to have a separate data structure that stores the tag with a list of keys that have that tag.

For the in-memory store, this can be a Map<string, Set<string>> - a map with the tag as the key, and a set of corresponding cache keys as the value.

The redis implementation is similar, and can be seen in the Redis-tag-cache package above.

For the database store, we would need a separate table to store entries associating a tag with a single cache key:

CREATE TABLE cache_tags (
  id SERIAL PRIMARY KEY,
  tag VARCHAR(255) NOT NULL,  -- Tag name
  cache_key VARCHAR(255) NOT NULL,  -- Corresponding cache key
  FOREIGN KEY (cache_key) REFERENCES cache_items(cache_key) ON DELETE CASCADE
);

@dlhck dlhck changed the title Proposal: Introduce a CacheStrategy for multi-instance data caching Introduce a CacheStrategy for multi-instance data caching Sep 27, 2024
michaelbromley added a commit that referenced this issue Oct 29, 2024
michaelbromley added a commit that referenced this issue Oct 29, 2024
michaelbromley added a commit that referenced this issue Oct 30, 2024
Relates to #3043. This commit introduces a new DefaultSessionCacheStrategy, which
delegates to the underlying CacheStrategy. This means that now you only need to
think about the CacheStrategy, and the session cache will use whatever mechanism
is defined there.
michaelbromley added a commit that referenced this issue Oct 31, 2024
@michaelbromley michaelbromley moved this from 📅 Planned to 💯 Ready in Vendure OS Roadmap Oct 31, 2024
@michaelbromley michaelbromley self-assigned this Oct 31, 2024
@michaelbromley michaelbromley unpinned this issue Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design 📐 This issue deals with high-level design of a feature type: feature ✨ @vendure/core
Projects
Status: 💯 Ready
Development

No branches or pull requests

2 participants