Skip to content

Updated documentation and new example #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions docs/caches.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
..
Copyright (c) 2022, 2024, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v 1.0 as shown at
https://oss.oracle.com/licenses/upl.

Caches
======

Once a Session has been obtained, it is now possible to begin working with
`NamedMap` and/or `NamedCache` instances. This is done by calling either
`Session.get_map(name: str, cache_options: Optional[CacheOptions] = None)` or
`Session.get_cache(name: str, cache_options: Optional[CacheOptions] = None)`.
The `name` argument is the logical name for this cache. The optional `cache_options`
argument accepts a `CacheOptions` to configure the default time-to-live (ttl)
for entries placed in the cache as well as allowing the configuration of near
caching (discussed later in this section).

Here are some examples:

.. code-block:: python

# obtain NamedCache 'person'
cache: NamedCache[int, Person] = await session.get_cache("person")

.. code-block:: python

# obtain NamedCache 'person' with a default ttl of 2000 millis
# any entry inserted into this cache, unless overridden by a put call
# with a custom ttl, will have a default ttl of 2000
options: CacheOptions = CacheOptions(2000)
cache: NamedCache[int, Person] = await session.get_cache("person", options)

Near Caches
===========

Near caches are a local cache within the `NamedMap` or `NamedCache`
that will store entries as they are obtained from the remote cache. By doing
so, it is possible to reduce the number of remote calls made to the Coherence
cluster by returning the locally cached value. This local cache also
ensures updates made to, or removal of, an entry are properly
reflected thus ensuring stale data isn't mistakenly returned.

.. note::
Near caching will only work with Coherence CE `24.09` or later. Attempting
to use near caching features with older versions will have no effect.

A near cache is configured via `NearCacheOptions` which provides several
options for controlling how entries will be cached locally.

- `ttl` - configures the time-to-live of locally cached entries (this has no
impact on entries stored within Coherence). If not specified, or the
`ttl` is `0`, entries in the near cache will not expire
- `high_units` - configures the max number of entries that may be locally
cached. Once the number of locally cached entries exceeds the configured
value, the cache will be pruned down (least recently used entries first)
to a target size based on the configured `prune_factor`
(defaults to `0.80` meaning the prune operation would retain 80% of
the entries)
- `high_units_memory` - configures the maximum memory size, in bytes, the
locally cached entries may use. If total memory exceeds the configured
value, the cache will be pruned down (least recently used entries first)
to a target size based on the configured `prune_factor` (defaults to
`0.80` meaning the prune operation would retain 80% the cache memory)
- `prune_factor` - configures the target near cache size after exceeding
either `high_units` or `high_units_memory` high-water marks

.. note::
`high_units` and `high_units_memory` are mutually exclusive

Examples of configuring near caching:

.. code-block:: python

# obtain NamedCache 'person' and configure near caching with a local
# ttl of 20_000 millis
near_options: NearCacheOptions = NearCacheOptions(20_000)
cache_options: CacheOptions = CacheOptions(near_cache_options=near_options)
cache: NamedCache[int, Person] = await session.get_cache("person", options)


.. code-block:: python

# obtain NamedCache 'person' and configure near caching with a max
# number of entries of 1_000 and when pruned, it will be reduced
# to 20%
near_options: NearCacheOptions = NearCacheOptions(high_units=1_000, prune_factor=0.20)
cache_options: CacheOptions = CacheOptions(near_cache_options=near_options)
cache: NamedCache[int, Person] = await session.get_cache("person", options)

To verify the effectiveness of a near cache, several statistics are monitored
and may be obtained from the `CacheStats` instance returned by the
`near_cache_stats` property of the `NamedMap` or `NamedCache`

The following statistics are available (the statistic name given is the same
property name on the `CacheStats` instance)

- hits - the number of times an entry was found in the near cache
- misses - the number of times an entry was not found in the near cache
- misses_duration - The accumulated time, in millis, spent for a cache miss
(i.e., having to make a remote call and update the local cache)
- hit_rate - the ratio of hits to misses
- puts - the total number of puts that have been made against the near cache
- gets - the total number of gets that have been made against the near cache
- prunes - the number of times the cache was pruned due to exceeding the
configured `high_units` or `high_units_memory` high-water marks
- expires - the number of times the near cache's expiry logic expired entries
- num_pruned - the total number of entries that were removed due to exceeding the
configured `high_units` or `high_units_memory` high-water marks
- num_expired - the total number of entries that were removed due to
expiration
- prunes_duration - the accumulated time, in millis, spent pruning
the near cache
- expires_duration - the accumulated time, in millis, removing
expired entries from the near cache
- size - the total number of entries currently held by the near cache
- bytes - the total bytes the near cache entries consume

.. note::
The `near_cache_stats` option will return `None` if near caching isn't
configured or available

The following example demonstrates the value that near caching can provide:

.. literalinclude:: ../examples/near_caching.py
:language: python
:linenos:
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

installation
sessions
caches
basics
querying
aggregation
Expand Down
6 changes: 3 additions & 3 deletions docs/sessions.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
..
Copyright (c) 2022, 2023, Oracle and/or its affiliates.
Copyright (c) 2022, 2024, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v 1.0 as shown at
https://oss.oracle.com/licenses/upl.

Expand Down Expand Up @@ -37,7 +37,7 @@ The currently supported arguments foe `Options` are:
import asyncio

# create a new Session to the Coherence server
session: Session = Session(None)
session: Session = await Session.create()

This is the simplest invocation which assumes the following defaults:
- `address` is `localhost:1408`
Expand All @@ -55,7 +55,7 @@ and pass it to the constructor of the `Session`:
# create a new Session to the Coherence server
addr: str = 'example.com:4444'
opt: Options = Options(addr, default_scope, default_request_timeout, default_format)
session: Session = Session(opt)
session: Session = await Session.create(opt)

It's also possible to control the default address the session will bind to by providing
an address via the `COHERENCE_SERVER_ADDRESS` environment variable. The format of the value would
Expand Down
94 changes: 94 additions & 0 deletions examples/near_caching.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Copyright (c) 2024, Oracle and/or its affiliates.
# Licensed under the Universal Permissive License v 1.0 as shown at
# https://oss.oracle.com/licenses/upl.

import asyncio
import time
from functools import reduce

from coherence import CacheOptions, CacheStats, NamedCache, NearCacheOptions, Session


async def do_run() -> None:
session: Session = await Session.create()

# obtain the basic cache and configure near caching with
# all defaults; meaning no expiry or pruning
cache_remote: NamedCache[str, str] = await session.get_cache("remote")
cache_near: NamedCache[str, str] = await session.get_cache(
"near", CacheOptions(near_cache_options=NearCacheOptions(ttl=0))
)

await cache_remote.clear()
await cache_near.clear()
stats: CacheStats = cache_near.near_cache_stats

# these knobs control:
# - how many current tasks to run
# - how many entries will be inserted and queried
# - how many times the calls will be invoked
task_count: int = 25
num_entries: int = 1_000
iterations: int = 4

# seed data to populate the cache
cache_seed: dict[str, str] = {str(x): str(x) for x in range(num_entries)}
cache_seed_keys: set[str] = {key for key in cache_seed.keys()}
print()

# task calling get_all() for 1_000 keys
async def get_all_task(task_cache: NamedCache[str, str]) -> int:
begin = time.time_ns()

for _ in range(iterations):
async for _ in await task_cache.get_all(cache_seed_keys):
continue

return (time.time_ns() - begin) // 1_000_000

await cache_remote.put_all(cache_seed)
await cache_near.put_all(cache_seed)

print("Run without near caching ...")
begin_outer: int = time.time_ns()
results: list[int] = await asyncio.gather(*[get_all_task(cache_remote) for _ in range(task_count)])
end_outer: int = time.time_ns()
total_time = end_outer - begin_outer
task_time = reduce(lambda first, second: first + second, results)

# Example output
# Run without near caching ...
# [remote] 25 tasks completed!
# [remote] Total time: 4246ms
# [remote] Tasks completion average: 3755.6

print(f"[remote] {task_count} tasks completed!")
print(f"[remote] Total time: {total_time // 1_000_000}ms")
print(f"[remote] Tasks completion average: {task_time / task_count}")

print()
print("Run with near caching ...")
begin_outer = time.time_ns()
results = await asyncio.gather(*[get_all_task(cache_near) for _ in range(task_count)])
end_outer = time.time_ns()
total_time = end_outer - begin_outer
task_time = reduce(lambda first, second: first + second, results)

# Run with near caching ...
# [near] 25 tasks completed!
# [near] Total time: 122ms
# [near] Tasks completion average: 113.96
# [near] Near cache statistics: CacheStats(puts=1000, gets=100000, hits=99000,
# misses=1000, misses-duration=73ms, hit-rate=0.99, prunes=0,
# num-pruned=0, prunes-duration=0ms, size=1000, expires=0,
# num-expired=0, expires-duration=0ms, memory-bytes=681464)

print(f"[near] {task_count} tasks completed!")
print(f"[near] Total time: {total_time // 1_000_000}ms")
print(f"[near] Tasks completion average: {task_time / task_count}")
print(f"[near] Near cache statistics: {stats}")

await session.close()


asyncio.run(do_run())
Loading