Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds caching for services #6082

Merged
merged 104 commits into from
Nov 11, 2022
Merged

Adds caching for services #6082

merged 104 commits into from
Nov 11, 2022

Conversation

cannikin
Copy link
Member

@cannikin cannikin commented Jul 29, 2022

There have always been two hard problems in computer science. After this PR, there is only one.

Yet another Rails-inspired feature! This one lets you easily cache the result of a function in your services.

The intended use is as an LRU cache: don't worry about manually expiring things, just put things in the cache as needed and the cache software itself will eventually evict the least used/oldest ones when memory fills up, or by whatever other metric it follows internally. What this means is there's never a need to manually delete something from the cache—leave old entries forever, who cares. It all depends on the key. Everything boils down to two statements:

  1. If it's already cached, give it to me
  2. If it's not cached, cache it

THAT IS ALL.

This cache only uses two operations: get and set. You either get some data back from the cache matching your key with get, or you save data to the cache using set. This is pretty close to the Memcached interface, but miles away from the 500 functions that Redis exposes. I'm trying to keep things ultra simple, and most of this functionality (I'm looking at you, Redis) falls under YAGNI.

Why

There's going to be lots of times where you have a query for a large dataset from the database that you want to avoid making if possible, but that doesn't change too frequently. Obviously if the data is changing every second, it may not be a great candidate for caching. Unless you're getting 1,000 requests per second for that data, then maybe it does!

But, cache can also be used to avoid large processing tasks, third party API calls, and lots more. Caching just takes the result of those tasks and sticks it in memory (memcached) or disk (redis) both of which are probably orders of magnitude faster to read from than the amount of time you spend doing the original task. So caching is great, but usually hard to implement well. But not for Redwood users!

Clients

This implementation includes two clients out of the box, Memcached and Redis. It's really easy to add your own implementation using a different library or a completely different service. You just create an adapter class that exposes a get() and set() function that does whatever it needs to do to save a Javascript object to your cache and retrieve it. The two included ones will JSON.stringify() your resulting data and then JSON.parse() it back out of the cache, so it needs to be able to survive that process intact (which means no caching of functions, for example).

Setup

There's a setup command (in progress) which creates a file api/src/lib/cache.js. This follows the convention of setting up stuff like logging and auth. You setup your client and call createCache() from which you destructure and export two functions:

import { createCache, MemcachedClient } from '@redwoodjs/api/cache'

const client = new MemcachedClient()

export const { cache, cacheFindMany } = createCache(client)

Usage

cache() is the simplest form, where you just give it a key, a function to execute returning the data you want to cache, and an optional object of options (currently only expires):

import { cache, cacheFindMany } from 'src/lib/cache'
import { db } from 'src/lib/db'

export const post = ({ id }) => {
  return cache(`post-${id}`, () =>
    db.post.findUnique({
      where: { id },
    })
  )
}

// or to expire after 30 seconds

export const post = ({ id }) => {
  return cache(`post-${id}`, () =>
    db.post.findUnique({
      where: { id },
    }),
    { expires: 30 }
  )
}

It's up to you to make sure that you key is unique enough to expire if the data in your object changes and you want to cache the new result instead. Following Rails conventions means that for DB data you include the updatedAt timestamp in the key, since it will only change if some other data in the record changes. In the first example above the cache would never expire since the id of the record is never going to change. It may get evicted at some point if it's not frequently accessed, however.

cacheFindMany() is where things get interesting. This assumes you're executing a findMany() Prisma query and want to cache the results of that query until something in the result set changes. It does this by creating a key that's a composite of the latest record's id and updatedAt time. If any one record in that result set changes, the key will change (because updatedAt will be different), which means a new copy is stored. Using cacheMany() requires that you have some unique id and an updatedAt field that is updated any time the data in the record changes. id and updatedAt are the default field names, but you can configure them in the createCache call if your model has them named something different.

Let's say you wanted to cache the following function:

db.post.findMany({ where: { popular: true } })

You need to transform that function call into an object instead, then give it to cacheMany():

export const posts = async () => {
  return await cacheMany(
    'posts',
    db.post, 
    { conditions: { where: { popular: true } } },
  )
}

You need kind of a new syntax here (a conditions object containing the arguments you would normally send to findMany()) because I need to be able to make a findFirst() call based on the conditions you sent to findMany, but only for a single record, sorted descending by updatedAt time. So you can't make the findMany() call like normal, because I wouldn't be able to pull it apart and get just the conditions you gave it.

Internally I'm doing this:

db.post.findFirst({ 
  where: { popular: true }, 
  orderBy: { updatedAt: 'desc' }, 
  select: { id: true, updatedAt: true } 
})

Which gets the absolute minimum amount of data needed to determine if a recordset has newer data than was last cached and then building the key posts-123-1653400043430 from the single record that's returned. If it doesn't exist in cache then it runs:

db.post.findMany({ where: { popular: true } })

Which is the original query that was intended to be run, caching the result.

Of course, you could do all of this yourself using cache() but this is a nice abstraction that saves you the work. I wish you could just pass the standard db.post.findMany() call like you do with cache(), but alas that's an actual, executable function that returns results.

Side note: in ActiveRecord for Rails, calling the equivalent of findMany doesn't actually execute anything right away, it's just letting the instance know that you intend to call that at some point. You can pass that around and it doesn't actually execute until you need it for display, or give to a method that expects actual data. So the above can be written, in Rails, as:

posts = Post.all
cache(posts) do 
  posts.map do |post|
    # ...
  end
end

And it can automatically look for the newest record first before actually executing Post.all and getting all records from the DB if it turns out the cache is empty.

Caveats

Right now there's some interesting behavior if the memcached or redis service goes offline after the api side has already connected to it:

Memcached: the next request for the cache hangs and never returns...no error is thrown, nothing. I've tried adding a timeout around the code that makes the request to the memcached client, but it doesn't seem to help. This appears to be a known issue: memcachier/memjs#162 Fixed with our own local timeout!

Redis: an error is raised as soon as the Redis server goes away, which appears to crash the api server completely 😬 It seems to happen outside of the caching code so I'm not sure how to catch this:

api | SocketClosedUnexpectedlyError: Socket closed unexpectedly
api |     at Socket.<anonymous> (/Users/rob/Sites/service-caching/node_modules/@redis/client/dist/lib/client/socket.js:182:118)
api |     at Object.onceWrapper (node:events:640:26)
api |     at Socket.emit (node:events:520:28)
api |     at TCP.<anonymous> (node:net:687:12)
api |     at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
api | Emitted 'error' event on Commander instance at:
api |     at RedisSocket.<anonymous> (/Users/rob/Sites/service-caching/node_modules/@redis/client/dist/lib/client/index.js:350:14)
api |     at RedisSocket.emit (node:events:520:28)
api |     at RedisSocket._RedisSocket_onSocketError (/Users/rob/Sites/service-caching/node_modules/@redis/client/dist/lib/client/socket.js:205:10)
api |     at Socket.<anonymous> (/Users/rob/Sites/service-caching/node_modules/@redis/client/dist/lib/client/socket.js:182:107)
api |     at Object.onceWrapper (node:events:640:26)
api |     [... lines matching original stack trace ...]
api |     at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
web | <e> [webpack-dev-server] [HPM] Error occurred while proxying request localhost:8910/auth?method=getToken to http://[::1]:8911/ [ECONNREFUSED] (https://nodejs.org/api/errors.html#errors_common_system_errors)
web | <e> [webpack-dev-server] [HPM] Error occurred while proxying request localhost:8910/graphql to http://[::1]:8911/ [ECONNREFUSED] (https://nodejs.org/api/errors.html#errors_common_system_errors)
web | <e> [webpack-dev-server] [HPM] Error occurred while proxying request localhost:8910/graphql to http://[::1]:8911/ [ECONNREFUSED] (https://nodejs.org/api/errors.html#errors_common_system_errors)
web | <e> [webpack-dev-server] [HPM] Error occurred while proxying request localhost:8910/graphql to http://[::1]:8911/ [ECONNREFUSED] (https://nodejs.org/api/errors.html#errors_common_system_errors)

Any assistance in mitigating these would be greatly appreciated!

Release Notes

Coming soon

@nx-cloud
Copy link

nx-cloud bot commented Jul 29, 2022

☁️ Nx Cloud Report

We didn't find any information for the current pull request with the commit 111c844.
You might need to set the 'NX_BRANCH' environment variable in your CI pipeline.

Check the Nx Cloud Github Integration documentation for more information.


Sent with 💌 from NxCloud.

@netlify
Copy link

netlify bot commented Jul 29, 2022

Deploy Preview for redwoodjs-docs failed.

Name Link
🔨 Latest commit 111c844
🔍 Latest deploy log https://app.netlify.com/sites/redwoodjs-docs/deploys/6307eee74970450008bc9875

@cannikin cannikin added the release:feature This PR introduces a new feature label Jul 29, 2022
@exzachlyvv
Copy link
Contributor

This is awesome! Looking forward to this feature. One thing Laravel does that I always appreciated was to use certain client implementations under the hood automatically based on environment variables. For example, instead of creating and passing in a "MemcachedClient", you set a environment variable such as "CACHE_DRIVER=memcache".

The nice part with this is that no code changes are necessary, and it becomes trivial to use "InmemoryClient" when doing local development (CACHE_DRIVER=local) and redis in your staging and production environments. Have you thought about this approach? It can apply to more than caching too. Logging, database, queues (if supported), etc

Another small win is that developers can just use imports from "@redwood". There is no cognitive load of cache related files in their projects.

Cons are:

  1. More difficult configuration if customization is needed.
  2. Bundle might need to include all client implementations??? Not sure.

Just an idea! Keep up the awesome work.

@cannikin
Copy link
Member Author

  1. More difficult configuration if customization is needed.
  2. Bundle might need to include all client implementations??? Not sure.

Yeah, these are what I was worried about. If someone didn't like the two libs I'm using (memjs and redis) then it makes it more difficult to add your own... do you just have a single custom ENV var setting, and then that somehow lets you do your own? What if you had separate implementations for different environments?

Personally, I would use the same client everywhere (memcached) for parity, including in development. Then your server location(s) are the only ENV var needed:

import { createCache, MemcachedClient } from '@redwoodjs/api/cache'

const client = new MemcachedClient(process.env.MEMCACHED_URL)

export const { cache, cacheLatest } = createCache(client)

@dthyresson
Copy link
Contributor

dthyresson commented Jul 29, 2022

@cannikin Did you consider a mechanism to set the cache key with something that includes session/user info?

That way a service thy queries “my posts” with a where clause on the author can be cached per user?

See: useResponseCache https://www.envelop.dev/plugins/use-response-cache#cache-based-on-sessionuser

Perhaps define a function that returns a sessionId and the developer can define that as needed?

@exzachlyvv
Copy link
Contributor

Yeah, these are what I was worried about...

Makes sense! Abstracting the configuration behind env variables can always be a future enhancement if that is the direction it goes.

@cannikin
Copy link
Member Author

@cannikin Did you consider a mechanism to set the cache key with something that includes session/user info?

@dthyresson Right now the key is just a string so it's up to you to set that for uniquifying the content...context.currentUser.id would work great!

Is there another kind of session identifier we have access to in services besides just context.currentUser?

# Conflicts:
#	packages/api/package.json
#	yarn.lock
Update docs on strict matching
docs/docs/testing.md Outdated Show resolved Hide resolved
@dac09
Copy link
Collaborator

dac09 commented Nov 7, 2022

@cannikin good to go I think. Want to take one last look at the docs and merge?

docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/services.md Outdated Show resolved Hide resolved
docs/docs/services.md Outdated Show resolved Hide resolved
cannikin and others added 2 commits November 7, 2022 10:48
Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>
Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>
Copy link
Contributor

@jtoar jtoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left some typo fixes/spacing in codeblocks

docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/testing.md Outdated Show resolved Hide resolved
docs/docs/services.md Outdated Show resolved Hide resolved
docs/docs/services.md Outdated Show resolved Hide resolved
docs/docs/services.md Outdated Show resolved Hide resolved
docs/docs/services.md Outdated Show resolved Hide resolved
docs/docs/services.md Outdated Show resolved Hide resolved
dac09 and others added 6 commits November 8, 2022 13:48
Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>
Co-authored-by: Rob Cameron <cannikin@fastmail.com>
Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>
Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>
@cannikin cannikin merged commit 34125a5 into main Nov 11, 2022
@cannikin cannikin deleted the rc-service-caching branch November 11, 2022 02:16
@redwoodjs-bot redwoodjs-bot bot added this to the next-release milestone Nov 11, 2022
@redwoodjs-bot
Copy link

redwoodjs-bot bot commented Nov 11, 2022

🔔 @jtoar, @Tobbe—I couldn't cherry pick this one. If you want it in the next release, you'll have to cherry pick it manually.

jtoar added a commit that referenced this pull request Nov 11, 2022
* cache() and cacheLatest() working

* Move clients to standalone files

* Move to closure style to properly export required functions

* Comments

* Adds RedisClient

* Simplify logic to remove separate init() function

* Refactor for more generic client usage, no more init()

* Adds redis package

* Moves cache clients to devDependencies

* Simplify memcached options on init

* Use logger for messages

* Server connection string must be included

* Adds docs on service caching

* Add timeout for cache calls

* Adds setup command for cache

* Updates templates with new timeout option

* Updates docs for new createCache() client

* Comment

* Move errors to separate file

* Allow renaming of id/updatedAt fields, catch error if model has no id/updatedAt, catch if no records in cacheLatest

* Allow adding a global prefix to cache key

* Adds docs for global prefix, options, array key syntax

* Move formatting of cache key to a standalone function, allow cache key as an array

* cacheLatest -> cacheFindMany, exports some additional functions, updates types

* Start of tests

* Adds InMemoryClient for cache testing

* Create base cache client class to extend from, rename clients for consistency

* Adds cache tests

* Doc updates

* --rebuild-fixture

* Update templates for cacheFindMany

* yarn constraints --fix

* Updates lock file with constraints fix

* Refactor to use TS abstract class

* Types in template

* Fixes `setup cache` CLI command

* Export client defaults as named exports

* InMemoryCache is a default export now

* Fix link

* Doc updates

* More doc updates

* Adds docs for `setup cache` command

* Adds some complex types to inputs and results

Thanks @dac09!

* Adds test for no records found

* Adds spys to check on client calls

* Fix some type issues
Fix bugs in tests

* Remove specific types for cache and cacheFindMany result

* Handle redis disconnect

* Pass logger to RedisClient

* Use logger instead of console in Memcached config

* Adds reconnect() function

* Attempt reconnect on timeout error

* Adds test for reconnect()

* Remove commented mock code

* Update docs/docs/cli-commands.md

Co-authored-by: Daniel Choudhury <dannychoudhury@gmail.com>

* Update packages/api/src/cache/clients/BaseClient.ts

Co-authored-by: Daniel Choudhury <dannychoudhury@gmail.com>

* Update packages/cli/src/commands/setup/cache/cache.js

Co-authored-by: Daniel Choudhury <dannychoudhury@gmail.com>

* Update docs/docs/services.md

Co-authored-by: Daniel Choudhury <dannychoudhury@gmail.com>

* Moves addPackagesTask to shared location

* Add memjs/redis package based on client choice during setup

* Fix type issue in BaseClient

* Refactor to use async imports
Introduce connect/disconnect

* fix: reconnect -> disconnect tests

* Adds testing helpers for contents of InMemoryCache

* Updates cache templates to include testing check, moves host URL to ENV var

* Move cache server connection string to .env

* Move adding env var code to shared helper

* Export client for testing

* Fix merge conflicts

* Use addEnvVarTask from cli helpers

* Use listr2 instead

* WIP(testing): Add custom matcher to check cached values

* Add contents and clear to InMemoryCache

* Unused imports in deploy helpers

* Fix bugs in the cli helper

* Add partialMatch helper

* Updates `toHaveCached()` to accept an optional argument

Can include the key you expect the value to have so that you can verify both

* Update custom matcher, comments

* Support multiple values in partialMatch array |
Disable parse(stringfy())

* Provide options to toHaveCached()

* fix: Check string values after serializing
| update types

* docs: Update testing docs

* docs: small update

* fix: Remove matcher options
Update docs on strict matching

* Update docs/docs/testing.md

* fix(cli): Fix Listr import in cache setup cli

* Update docs/docs/services.md

Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>

* Update docs/docs/services.md

Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>

* Apply suggestions from code review

Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>
Co-authored-by: Rob Cameron <cannikin@fastmail.com>

* Apply suggestions from code review

Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>

* Update docs/docs/services.md

Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>

* Code example changes

Co-authored-by: Daniel Choudhury <dannychoudhury@gmail.com>
Co-authored-by: Dominic Saadi <dominiceliassaadi@gmail.com>
@jtoar jtoar modified the milestones: next-release, v3.5.0 Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixture-ok Override the test project fixture check release:feature This PR introduces a new feature topic/performance
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[RFC]: Add ability to cache arbitrary classes in Service Cache
7 participants