Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
225 changes: 156 additions & 69 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,74 @@
[![Build](https://github.com/cacheMon/libCacheSim-python/actions/workflows/build.yml/badge.svg)](https://github.com/cacheMon/libCacheSim-python/actions/workflows/build.yml)
[![Documentation](https://github.com/cacheMon/libCacheSim-python/actions/workflows/docs.yml/badge.svg)](docs.libcachesim.com/python)

Python bindings for [libCacheSim](https://github.com/1a1a11a/libCacheSim), a high-performance cache simulator and analysis library.

libCacheSim is fast with the features from [underlying libCacheSim lib](https://github.com/1a1a11a/libCacheSim):

- **High performance** - over 20M requests/sec for a realistic trace replay
- **High memory efficiency** - predictable and small memory footprint
- **Parallelism out-of-the-box** - uses the many CPU cores to speed up trace analysis and cache simulations

libCacheSim is flexible and easy to use with:

- **Seamless integration** with [open-source cache dataset](https://github.com/cacheMon/cache_dataset) consisting of thousands traces hosted on S3
- **High-throughput simulation** with the [underlying libCacheSim lib](https://github.com/1a1a11a/libCacheSim)
- **Detailed cache requests** and other internal data control
- **Customized plugin cache development** without any compilation

## Prerequisites

- OS: Linux / macOS
- Python: 3.9 -- 3.13

## Installation

### Quick Install

Binary installers for the latest released version are available at the [Python Package Index (PyPI)](https://pypi.org/project/libcachesim).

```bash
pip install libcachesim
```

### Recommended Installation with uv

It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments:

```bash
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install libcachesim
```

### Advanced Features Installation

For users who want to run LRB, ThreeLCache, and GLCache eviction algorithms:

!!! important
If `uv` cannot find built wheels for your machine, the building system will skip these algorithms by default.

To enable them, you need to install all third-party dependencies first:

```bash
git clone https://github.com/cacheMon/libCacheSim-python.git
cd libCacheSim-python
bash scripts/install_deps.sh

# If you cannot install software directly (e.g., no sudo access)
bash scripts/install_deps_user.sh
```

Then, you can reinstall libcachesim using the following commands (may need to add `--no-cache-dir` to force it to build from scratch):

```bash
# Enable LRB
CMAKE_ARGS="-DENABLE_LRB=ON" uv pip install libcachesim
# Enable ThreeLCache
CMAKE_ARGS="-DENABLE_3L_CACHE=ON" uv pip install libcachesim
# Enable GLCache
CMAKE_ARGS="-DENABLE_GLCACHE=ON" uv pip install libcachesim
```

### Installation from sources

If there are no wheels suitable for your environment, consider building from source.
Expand All @@ -29,6 +87,42 @@ python -m pytest tests/

## Quick Start

### Cache Simulation

With libcachesim installed, you can start cache simulation for some eviction algorithm and cache traces:

```python
import libcachesim as lcs

# Step 1: Get one trace from S3 bucket
URI = "cache_dataset_oracleGeneral/2007_msr/msr_hm_0.oracleGeneral.zst"
dl = lcs.DataLoader()
dl.load(URI)

# Step 2: Open trace and process efficiently
reader = lcs.TraceReader(
trace = dl.get_cache_path(URI),
trace_type = lcs.TraceType.ORACLE_GENERAL_TRACE,
reader_init_params = lcs.ReaderInitParam(ignore_obj_size=False)
)

# Step 3: Initialize cache
cache = lcs.S3FIFO(cache_size=1024*1024)

# Step 4: Process entire trace efficiently (C++ backend)
obj_miss_ratio, byte_miss_ratio = cache.process_trace(reader)
print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}")

# Step 4.1: Process with limited number of requests
cache = lcs.S3FIFO(cache_size=1024*1024)
obj_miss_ratio, byte_miss_ratio = cache.process_trace(
reader,
start_req=0,
max_req=1000
)
Comment on lines +117 to +122
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The reader is consumed by the first cache.process_trace(reader) call. Without resetting it, the second call will start from the end of the trace and process zero requests. This is likely not the intended behavior for this example. You should add reader.reset() before re-initializing the cache for the second run to ensure the trace is processed from the beginning again.

Here is the corrected snippet:

# Step 4.1: Process with limited number of requests
reader.reset()
cache = lcs.S3FIFO(cache_size=1024*1024)
obj_miss_ratio, byte_miss_ratio = cache.process_trace(
    reader,
    start_req=0,
    max_req=1000
)
print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}")

print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}")
```

### Basic Usage

```python
Expand All @@ -46,7 +140,9 @@ print(cache.get(req)) # False (first access)
print(cache.get(req)) # True (second access)
```

### Trace Processing
### Trace Analysis

Here is an example demonstrating how to use `TraceAnalyzer`:

```python
import libcachesim as lcs
Expand All @@ -56,25 +152,40 @@ URI = "cache_dataset_oracleGeneral/2007_msr/msr_hm_0.oracleGeneral.zst"
dl = lcs.DataLoader()
dl.load(URI)

# Step 2: Open trace and process efficiently
reader = lcs.TraceReader(dl.get_cache_path(URI))
reader = lcs.TraceReader(
trace = dl.get_cache_path(URI),
trace_type = lcs.TraceType.ORACLE_GENERAL_TRACE,
reader_init_params = lcs.ReaderInitParam(ignore_obj_size=False)
)

# Step 3: Initialize cache
cache = lcs.S3FIFO(cache_size=1024*1024)
analysis_option = lcs.AnalysisOption(
req_rate=True, # Keep basic request rate analysis
access_pattern=False, # Disable access pattern analysis
size=True, # Keep size analysis
reuse=False, # Disable reuse analysis for small datasets
popularity=False, # Disable popularity analysis for small datasets (< 200 objects)
ttl=False, # Disable TTL analysis
popularity_decay=False, # Disable popularity decay analysis
lifetime=False, # Disable lifetime analysis
create_future_reuse_ccdf=False, # Disable experimental features
prob_at_age=False, # Disable experimental features
size_change=False, # Disable size change analysis
)

analysis_param = lcs.AnalysisParam()

analyzer = lcs.TraceAnalyzer(
reader, "example_analysis", analysis_option=analysis_option, analysis_param=analysis_param
)

# Step 4: Process entire trace efficiently (C++ backend)
obj_miss_ratio, byte_miss_ratio = cache.process_trace(reader)
print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}")
analyzer.run()
```

> [!NOTE]
> We DO NOT ignore the object size by defaults, you can add `reader_init_params = lcs.ReaderInitParam(ignore_obj_size=False)` to the initialization of `TraceReader` if needed.

## Custom Cache Policies
## Plugin System

Implement custom cache replacement algorithms using pure Python functions - **no C/C++ compilation required**.
libCacheSim allows you to develop your own cache eviction algorithms and test them via the plugin system without any C/C++ compilation required.

### Python Hook Cache Overview
### Plugin Cache Overview

The `PluginCache` allows you to define custom caching behavior through Python callback functions. You need to implement these callback functions:

Expand All @@ -87,74 +198,51 @@ The `PluginCache` allows you to define custom caching behavior through Python ca
| `remove_hook` | `(data: Any, obj_id: int) -> None` | Clean up when object removed |
| `free_hook` | `(data: Any) -> None` | [Optional] Final cleanup |

<details>
<summary>An example for LRU</summary>
### Example: Implementing LRU via Plugin System

```python
from collections import OrderedDict
from libcachesim import PluginCache, CommonCacheParams, Request, SyntheticReader, LRU


class StandaloneLRU:
def __init__(self):
self.cache_data = OrderedDict()

def cache_hit(self, obj_id):
if obj_id in self.cache_data:
obj_size = self.cache_data.pop(obj_id)
self.cache_data[obj_id] = obj_size

def cache_miss(self, obj_id, obj_size):
self.cache_data[obj_id] = obj_size

def cache_eviction(self):
evicted_id, _ = self.cache_data.popitem(last=False)
return evicted_id

def cache_remove(self, obj_id):
if obj_id in self.cache_data:
del self.cache_data[obj_id]

from typing import Any

def cache_init_hook(common_cache_params: CommonCacheParams):
return StandaloneLRU()
from libcachesim import PluginCache, LRU, CommonCacheParams, Request

def init_hook(_: CommonCacheParams) -> Any:
return OrderedDict()

def cache_hit_hook(cache, request: Request):
cache.cache_hit(request.obj_id)
def hit_hook(data: Any, req: Request) -> None:
data.move_to_end(req.obj_id, last=True)

def miss_hook(data: Any, req: Request) -> None:
data.__setitem__(req.obj_id, req.obj_size)

def cache_miss_hook(cache, request: Request):
cache.cache_miss(request.obj_id, request.obj_size)
def eviction_hook(data: Any, _: Request) -> int:
return data.popitem(last=False)[0]

def remove_hook(data: Any, obj_id: int) -> None:
data.pop(obj_id, None)

def cache_eviction_hook(cache, request: Request):
return cache.cache_eviction()


def cache_remove_hook(cache, obj_id):
cache.cache_remove(obj_id)


def cache_free_hook(cache):
cache.cache_data.clear()

def free_hook(data: Any) -> None:
data.clear()

plugin_lru_cache = PluginCache(
cache_size=1024,
cache_init_hook=cache_init_hook,
cache_hit_hook=cache_hit_hook,
cache_miss_hook=cache_miss_hook,
cache_eviction_hook=cache_eviction_hook,
cache_remove_hook=cache_remove_hook,
cache_free_hook=cache_free_hook,
cache_name="CustomizedLRU",
cache_size=128,
cache_init_hook=init_hook,
cache_hit_hook=hit_hook,
cache_miss_hook=miss_hook,
cache_eviction_hook=eviction_hook,
cache_remove_hook=remove_hook,
cache_free_hook=free_hook,
cache_name="Plugin_LRU",
)
```
</details>

reader = lcs.SyntheticReader(num_objects=1000, num_of_req=10000, obj_size=1)
req_miss_ratio, byte_miss_ratio = plugin_lru_cache.process_trace(reader)
ref_req_miss_ratio, ref_byte_miss_ratio = LRU(128).process_trace(reader)
print(f"plugin req miss ratio {req_miss_ratio}, ref req miss ratio {ref_req_miss_ratio}")
print(f"plugin byte miss ratio {byte_miss_ratio}, ref byte miss ratio {ref_byte_miss_ratio}")
```
Comment on lines 203 to +243
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This code snippet has a few issues that prevent it from running correctly:

  1. SyntheticReader is used but not imported.
  2. The lcs alias is used to call SyntheticReader, but lcs is not defined. SyntheticReader should be called directly after being imported.
  3. The reader object is consumed by the first process_trace call. It needs to be reset with reader.reset() before it can be used again in the second process_trace call.

Here is a corrected version of the snippet that addresses these points:

from collections import OrderedDict
from typing import Any

from libcachesim import PluginCache, LRU, CommonCacheParams, Request, SyntheticReader

def init_hook(_: CommonCacheParams) -> Any:
    return OrderedDict()

def hit_hook(data: Any, req: Request) -> None:
    data.move_to_end(req.obj_id, last=True)

def miss_hook(data: Any, req: Request) -> None:
    data.__setitem__(req.obj_id, req.obj_size)

def eviction_hook(data: Any, _: Request) -> int:
    return data.popitem(last=False)[0]

def remove_hook(data: Any, obj_id: int) -> None:
    data.pop(obj_id, None)

def free_hook(data: Any) -> None:
    data.clear()

plugin_lru_cache = PluginCache(
    cache_size=128,
    cache_init_hook=init_hook,
    cache_hit_hook=hit_hook,
    cache_miss_hook=miss_hook,
    cache_eviction_hook=eviction_hook,
    cache_remove_hook=remove_hook,
    cache_free_hook=free_hook,
    cache_name="Plugin_LRU",
)

reader = SyntheticReader(num_objects=1000, num_of_req=10000, obj_size=1)
req_miss_ratio, byte_miss_ratio = plugin_lru_cache.process_trace(reader)

reader.reset()  # Reset reader before re-using it
ref_req_miss_ratio, ref_byte_miss_ratio = LRU(128).process_trace(reader)

print(f"plugin req miss ratio {req_miss_ratio}, ref req miss ratio {ref_req_miss_ratio}")
print(f"plugin byte miss ratio {byte_miss_ratio}, ref byte miss ratio {ref_byte_miss_ratio}")


Another simple implementation via hook functions for S3FIFO respectively is given in [examples](examples/plugin_cache/s3fifo.py).
By defining custom hook functions for cache initialization, hit, miss, eviction, removal, and cleanup, users can easily prototype and test their own cache eviction algorithms.

### Getting Help

Expand Down Expand Up @@ -208,7 +296,6 @@ If you used libCacheSim in your research, please cite the above papers.

---


## License
See [LICENSE](LICENSE) for details.

Expand Down
2 changes: 1 addition & 1 deletion docs/src/en/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ To enable them, you need to install all third-party dependencies first.
bash scripts/install_deps_user.sh
```

Then, you can reinstall libcachesim using the following commands:
Then, you can reinstall libcachesim using the following commands (may need to add `--no-cache-dir` to force it to build from scratch):

```bash
# Enable LRB
Expand Down