Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-4037 Rework client-side caching content #530

Merged
merged 17 commits into from
Sep 27, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions content/develop/connect/clients/client-side-caching.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you envision a quick start with the tabbed examples in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mortensi I wanted to check exactly what you want in the tabbed samples. Are you thinking of stuff from the Google doc like:

client.set("hola", "mundo");
client.set("hello", "world");
client.mget("hola", "hello"); // read from the server
client.mget("hola", "hello"); // cache hit
client.mget("hello", "hola"); // read from server, the order matters
.
.

...or are there other use cases you want to have samples for? I know the Google doc has quite a few examples for Python and Jedis but I'm not sure if we need them in the doc page. Maybe we could have a concrete example at the bottom of some of the entries in this list for extra clarity? I'm not sure that having tabbed samples for these in each language is necessary either, because the samples don't actually do anything useful in themselves, so people wouldn't normally copy/paste them.

However, if we do decide that tabbed samples are a good thing here then I'm confident we can get them ready in good time for the release.

Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
---
categories:
- docs
- develop
- stack
- oss
- rs
- rc
- oss
- kubernetes
- clients
description: Server-assisted, client-side caching in Redis
linkTitle: Client-side caching
title: Client-side caching introduction
weight: 20
---

*Client-side caching* is a technique to reduce network traffic between
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
kaitlynmichael marked this conversation as resolved.
Show resolved Hide resolved
a Redis client and the server. This generally gives better performance.
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
See [Client-side caching compatibility with Redis Software and Redis Cloud]({{< relref "operate/rs/references/compatibility/client-side-caching" >}})
for details of the Redis versions that support CSC.
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
kaitlynmichael marked this conversation as resolved.
Show resolved Hide resolved

By default, an [application server](https://en.wikipedia.org/wiki/Application_server)
(which sits between the user app and the database) contacts the
Redis database server through the client library for every read request.
The diagram below shows the flow of communication from the user app,
through the application server to the database and back again:

{{< image filename="images/csc/CSCNoCache.drawio.svg" >}}

When you use CSC, the client library
maintains its own local cache of data items as it retrieves them
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
from the database. When the same items are needed again, the client
can satisfy the read requests from the cache instead of the database:

{{< image filename="images/csc/CSCWithCache.drawio.svg" >}}

Accessing the cache is much faster than communicating with the database over the
network and it reduces network traffic. Also, this technique reduces
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
the load on the database server, so you may be able to run it using fewer hardware
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
resources.

As with other forms of [caching](https://en.wikipedia.org/wiki/Cache_(computing)),
CSC works well in the very common use case where a small subset of the data
kaitlynmichael marked this conversation as resolved.
Show resolved Hide resolved
gets accessed much more frequently than the rest of the data (according
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
to the [Pareto principle](https://en.wikipedia.org/wiki/Pareto_principle)).

## Updating the cache when the data changes

All caching systems must implement a scheme to update data in the cache
when the corresponding data changes in the main database. Redis uses an
approach called *tracking*.

When CSC is enabled, the Redis server remembers or *tracks* the set of keys
kaitlynmichael marked this conversation as resolved.
Show resolved Hide resolved
that each client connection has previously read. This includes cases where the client
reads data directly, as with the [`GET`]({{< relref "/commands/get" >}})
command, and also where the server calculates values from the stored data,
as with [`STRLEN`]({{< relref "/commands/strlen" >}}). When any client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph seems a bit long. Maybe try to find a way to break it up?

writes new data to a tracked key, the server sends an invalidation message
to all clients that have accessed that key previously. This message warns
the clients that their cached copies of the data are no longer valid and the clients
will evict the stale data in response. Next time a client reads from
the same key, it will access the database directly and refresh its cache
with the updated data.

The sequence diagram below shows how two clients might interact as they
access and update the same key:

{{< image filename="images/csc/CSCSeqDiagram.drawio.svg" >}}

## Which commands can cache data?

All read-only commands (with the `@read`
[ACL category]({{< relref "/operate/oss_and_stack/management/security/acl" >}}))
will use cached data, except for the following:

- Any commands for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be easier to read if you indent the explanations under the bullet point for the command type.

[probabilistic data types]({{< relref "/develop/data-types/probabilistic" >}}).
These types are designed to be updated frequently, which means that caching
them gives little or no benefit.
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
- Non-deterministic commands such as [`HGETALL`]({{< relref "/commands/hgetall" >}}),
[`HSCAN`]({{< relref "/commands/hscan" >}}),
and [`ZRANDMEMBER`]({{< relref "/commands/zrandmember" >}}). By design, these commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can mention HGETALL, being a popular one. @uglide can you validate this?

give different results each time they are called.
- Search and query commands (with the `FT.*` prefix), such as
[`FT.SEARCH`]({{< baseurl >}}/commands/ft.search).

You can use the [`MONITOR`]({{< relref "/commands/monitor" >}}) command to
check the server's behavior when you are using CSC. Because `MONITOR` only
reports activity from the server, you should find that the first cacheable
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
access to a key causes a response from the server. However, subsequent
accesses are satisfied by the cache, and so `MONITOR` should report no
server activity if CSC is working correctly.

## What data gets cached for a command?

Broadly speaking, the data from the *specific response* to a command invocation
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
gets cached after it is used for the first time. Subsets of that data
kaitlynmichael marked this conversation as resolved.
Show resolved Hide resolved
or values calculated from it are retrieved from the server as usual and
then cached separately. For example:

- The whole string retrieved by [`GET`]({{< relref "/commands/get" >}})
is added to the cache. Parts of the same string retrieved by
[`SUBSTR`]({{< relref "/commands/substr" >}}) are calculated on the
server the first time and then cached separately from the original
string.
- Using [`GETBIT`]({{< relref "/commands/getbit" >}}) or
[`BITFIELD`]({{< relref "/commands/bitfield" >}}) on a string
caches the returned values separately from the original string.
- For composite data types accessed by keys
([hash]({{< relref "/develop/data-types/hashes" >}}),
[JSON]({{< relref "/develop/data-types/json" >}}),
[set]({{< relref "/develop/data-types/sets" >}}), and
[sorted set]({{< relref "/develop/data-types/sorted-sets" >}})),
the whole object is cached separately from the individual fields.
So the results of `JSON.GET mykey $` and `JSON.GET mykey $.myfield` create
separate entries in the cache.
- Ranges from [lists]({{< relref "/develop/data-types/lists" >}}),
[streams]({{< relref "/develop/data-types/streams" >}}),
and [sorted sets]({{< relref "/develop/data-types/sorted-sets" >}})
are cached separately from the object they form a part of. Likewise,
subsets returned by [`SINTER`]({{< relref "/commands/sinter" >}}) and
[`SDIFF`]({{< relref "/commands/sdiff" >}}) create separate cache entries.
- For multi-key read commands such as [`MGET`]({{< relref "/commands/mget" >}}),
the ordering of the keys is significant. For example `MGET name:1 name:2` is
cached separately from `MGET name:2 name:1` because the server returns the
values in the order you specify.
- Boolean or numeric values calculated from data types (for example
[`SISMEMBER`]({{< relref "/commands/sismember" >}})) and
[`LLEN`]({{< relref "/commands/llen" >}}) are cached separately from the
object they refer to.

## Usage recommendations

Like any caching system, CSC has some limitations:
kaitlynmichael marked this conversation as resolved.
Show resolved Hide resolved

- The cache has only a limited amount of memory available. When the limit
is reached, the client must *evict* potentially useful items from the
cache to make room for new ones.
- Cache misses, tracking, and invalidation messages always add a slight
performance penalty.

Below are some guidelines to help you use CSC efficiently, within these
limitations:

- **Use a separate connection for data that is not cache-friendly**:
Caching gives the most benefit
for keys that are read frequently and updated infrequently. However, you
may also have data such as counters and scoreboards that receive frequent
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
updates. In cases like this, the performance overhead of the invalidation
messages can be greater than the savings made by caching. Avoid this problem
by using a separate connection *without* CSC for any data that is
not cache-friendly.
- **Estimate how many items you can cache**: The client libraries let you
specify the maximum number of items you want to hold in the cache. You
can calculate an estimate for this number by dividing the
maximum desired size of the
cache in memory by the average size of the items you want to store
(use the [`MEMORY USAGE`]({{< relref "/commands/memory-usage" >}})
command to get the memory footprint of a key). So, if you had
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
10MB (or 10485760 bytes) available for the cache, and the average
size of an item was 80 bytes, you could fit approximately
10485760 / 80 = 131072 items in the cache. Monitor memory usage
on your server with a realistic test load to adjust your estimate
up or down.

## Reference

The Redis server implements extra features for CSC that are not used by
the main Redis clients, but may be useful for custom clients and other
advanced applications. See
[Client-side caching reference]({{< relref "/develop/reference/client-side-caching" >}})
for a full technical guide to all the options available for CSC.
108 changes: 108 additions & 0 deletions content/develop/connect/clients/java/jedis.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,114 @@ public class Main {
}
```

## Connect using client-side caching (CSC)

*Client-side caching* is a technique to reduce network traffic between
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
the client and server, resulting in better performance. See
[Client-side caching introduction]({{< relref "/develop/connect/clients/client-side-caching" >}})
for more information about how CSC works and how to use it effectively.

To enable CSC, you simply need to specify the
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
[RESP3]({{< relref "/develop/reference/protocol-spec#resp-versions" >}})
protocol and pass a cache configuration object during the connection.

The example below shows the simplest CSC connection to the default host and port,
`localhost:6379`.
All of the connection variants described above accept these parameters, so you can
use CSC with a connection pool or a cluster connection in exactly the same way.

```java
HostAndPort endpoint = new HostAndPort("localhost", 6379);

DefaultJedisClientConfig config = DefaultJedisClientConfig
.builder()
.password("secretPassword")
.protocol(RedisProtocol.RESP3)
.build();

CacheConfig cacheConfig = CacheConfig.builder().maxSize(1000).build();

UnifiedJedis client = new UnifiedJedis(endpoint, config, cacheConfig);
```

Once you have connected, the usual Redis commands will work transparently
with the cache:

```java
client.set("city", "New York");
client.get("city"); // Retrieved from Redis server and cached
client.get("city"); // Retrieved from cache
```

You can see the cache working if you connect to the same Redis database
with [`redis-cli`]({{< relref "/develop/connect/cli" >}}) and run the
[`MONITOR`]({{< relref "/commands/monitor" >}}) command. If you run the
code above but without passing `cacheConfig` during the connection,
you should see the following in the CLI among the output from `MONITOR`:

```
1723109720.268903 [...] "SET" "city" "New York"
1723109720.269681 [...] "GET" "city"
1723109720.270205 [...] "GET" "city"
```

This shows that the server responds to both `get("city")` calls.
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
If you run the code with `cacheConfig` added in again, you will see

```
1723110248.712663 [...] "SET" "city" "New York"
1723110248.713607 [...] "GET" "city"
```

This shows that the first `get("city")` call contacted the server but the second
andy-stark-redis marked this conversation as resolved.
Show resolved Hide resolved
call was satisfied by the cache.

### Removing items from the cache

You can remove individual keys from the cache with the
`deleteByRedisKey()` method of the cache object. This removes all cached items associated
kaitlynmichael marked this conversation as resolved.
Show resolved Hide resolved
with each specified key, so all results from multi-key commands (such as
kaitlynmichael marked this conversation as resolved.
Show resolved Hide resolved
[`MGET`]({{< relref "/commands/mget" >}})) and composite data structures
(such as [hashes]({{< relref "/develop/data-types/hashes" >}})) will be
cleared at once. The example below shows the effect of removing a single
key from the cache:

```java
client.hget("person:1", "name"); // Read from the server
client.hget("person:1", "name"); // Read from the cache

client.hget("person:2", "name"); // Read from the server
client.hget("person:2", "name"); // Read from the cache

Cache myCache = client.getCache();
myCache.deleteByRedisKey("person:1");

client.hget("person:1", "name"); // Read from the server
client.hget("person:1", "name"); // Read from the cache

client.hget("person:2", "name"); // Still read from the cache
```

You can also clear all cached items using the `flush()`
method:

```java
client.hget("person:1", "name"); // Read from the server
client.hget("person:1", "name"); // Read from the cache

client.hget("person:2", "name"); // Read from the server
client.hget("person:2", "name"); // Read from the cache

Cache myCache = client.getCache();
myCache.flush();

client.hget("person:1", "name"); // Read from the server
client.hget("person:1", "name"); // Read from the cache

client.hget("person:2", "name"); // Read from the server
client.hget("person:2", "name"); // Read from the cache
```

## Production usage

The following sections explain how to handle situations that may occur
Expand Down
Loading
Loading