Skip to content

Conversation

@haochengxia
Copy link
Collaborator

@haochengxia haochengxia commented Aug 4, 2025

  • Remove all old docs and add a new doc structure
  • Use CMAKE_ARGS to decide whether to enable LRB GL and 3L

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @haochengxia, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant overhaul to the project's build system and documentation. The primary goal is to provide more granular control over optional features during compilation using CMAKE_ARGS, while simultaneously restructuring and enhancing the user documentation for improved clarity and ease of use. This includes making optional features default to off for source builds but ensuring they are enabled in pre-built wheels, alongside a major reorganization of the documentation content and navigation.

Highlights

  • Build System Refinement: Optional cache algorithms (GLCACHE, LRB, 3L_CACHE) are now disabled by default in CMakeLists.txt and can be explicitly enabled using CMAKE_ARGS during compilation, providing more granular control for source builds.
  • Pre-built Wheel Configuration: The pyproject.toml configuration for cibuildwheel has been updated to ensure that pre-built Python wheels include all optional features enabled by default, simplifying installation for most users who rely on pre-compiled binaries.
  • Documentation Overhaul: The project's documentation structure has been completely reorganized, introducing new top-level sections like 'User Guide' and 'Developer Guide,' and breaking down 'Getting Started' and 'Examples' into more detailed sub-sections for improved clarity and navigation.
  • Enhanced Quickstart Guide: A comprehensive new quickstart guide has been added, covering installation (including uv and CMAKE_ARGS usage), cache simulation, trace analysis, and the plugin system, with practical code examples to help users get started quickly.
  • New Test Coverage: Dedicated tests for the optional GLCache, LRB, and 3LCache algorithms have been added, ensuring their functionality when enabled in the build.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the build system to make optional features (GLCache, LRB, 3LCache) disabled by default and controllable via CMAKE_ARGS. This is a solid improvement for users building from source, as it reduces the default set of dependencies. The documentation is significantly restructured and improved to guide users through the new installation process for optional features. The error messages for missing optional features are also much more helpful.

My main concern is the removal of the API reference documentation, which is a critical resource for users. I've also provided a suggestion to improve the clarity of the error messages for when optional features are not installed.

Comment on lines 1 to +3
# API Reference

This page provides detailed API documentation for the libCacheSim Python bindings.

## Core Classes

### Cache Classes

All cache classes inherit from the base cache interface and provide the following methods:

```python
class Cache:
"""Base cache interface."""

def get(self, obj_id: int, obj_size: int = 1) -> bool:
"""Request an object from the cache.
Args:
obj_id: Object identifier
obj_size: Object size in bytes
Returns:
True if cache hit, False if cache miss
"""

def get_hit_ratio(self) -> float:
"""Get the current cache hit ratio."""

def get_miss_ratio(self) -> float:
"""Get the current cache miss ratio."""

def get_num_hits(self) -> int:
"""Get the total number of cache hits."""

def get_num_misses(self) -> int:
"""Get the total number of cache misses."""
```

### Available Cache Algorithms

```python
# Basic algorithms
def LRU(cache_size: int) -> Cache: ...
def LFU(cache_size: int) -> Cache: ...
def FIFO(cache_size: int) -> Cache: ...
def Clock(cache_size: int) -> Cache: ...
def Random(cache_size: int) -> Cache: ...

# Advanced algorithms
def ARC(cache_size: int) -> Cache: ...
def S3FIFO(cache_size: int) -> Cache: ...
def Sieve(cache_size: int) -> Cache: ...
def TinyLFU(cache_size: int) -> Cache: ...
def TwoQ(cache_size: int) -> Cache: ...
```ence

This page provides detailed API documentation for libCacheSim Python bindings.

## Core Classes

### Cache Classes

All cache classes inherit from the base cache interface and provide the following methods:

::: libcachesim.cache

### TraceReader

```python
class TraceReader:
"""Read trace files in various formats."""

def __init__(self, trace_path: str, trace_type: TraceType,
reader_params: ReaderInitParam = None):
"""Initialize trace reader.
Args:
trace_path: Path to trace file
trace_type: Type of trace format
reader_params: Optional reader configuration
"""

def __iter__(self):
"""Iterate over requests in the trace."""

def reset(self):
"""Reset reader to beginning of trace."""

def skip(self, n: int):
"""Skip n requests."""

def clone(self):
"""Create a copy of the reader."""
```

### SyntheticReader

```python
class SyntheticReader:
"""Generate synthetic workloads."""

def __init__(self, num_objects: int, num_requests: int,
distribution: str = "zipf", alpha: float = 1.0,
obj_size: int = 1, seed: int = None):
"""Initialize synthetic reader.
Args:
num_objects: Number of unique objects
num_requests: Total requests to generate
distribution: Distribution type ("zipf", "uniform")
alpha: Zipf skewness parameter
obj_size: Object size in bytes
seed: Random seed for reproducibility
"""
```

### TraceAnalyzer

```python
class TraceAnalyzer:
"""Analyze trace characteristics."""

def __init__(self, trace_path: str, trace_type: TraceType,
reader_params: ReaderInitParam = None):
"""Initialize trace analyzer."""

def get_num_requests(self) -> int:
"""Get total number of requests."""

def get_num_objects(self) -> int:
"""Get number of unique objects."""

def get_working_set_size(self) -> int:
"""Get working set size."""
```

## Enumerations and Constants

### TraceType

```python
class TraceType:
"""Supported trace file formats."""
CSV_TRACE = "csv"
BINARY_TRACE = "binary"
ORACLE_GENERAL_TRACE = "oracle"
PLAIN_TXT_TRACE = "txt"
```

### SamplerType

```python
class SamplerType:
"""Sampling strategies."""
SPATIAL_SAMPLER = "spatial"
TEMPORAL_SAMPLER = "temporal"
```

### ReqOp

```python
class ReqOp:
"""Request operation types."""
READ = "read"
WRITE = "write"
DELETE = "delete"
```

## Data Structures

### Request

```python
class Request:
"""Represents a cache request."""

def __init__(self):
self.obj_id: int = 0
self.obj_size: int = 1
self.timestamp: int = 0
self.op: str = "read"
```

### ReaderInitParam

```python
class ReaderInitParam:
"""Configuration parameters for trace readers."""

def __init__(self):
self.has_header: bool = False
self.delimiter: str = ","
self.obj_id_is_num: bool = True
self.ignore_obj_size: bool = False
self.ignore_size_zero_req: bool = True
self.cap_at_n_req: int = -1
self.block_size: int = 4096
self.trace_start_offset: int = 0

# Field mappings (1-indexed)
self.time_field: int = 1
self.obj_id_field: int = 2
self.obj_size_field: int = 3
self.op_field: int = 4

self.sampler: Sampler = None
```

### Sampler

```python
class Sampler:
"""Configuration for request sampling."""

def __init__(self, sample_ratio: float = 1.0,
type: str = "spatial"):
"""Initialize sampler.
Args:
sample_ratio: Fraction of requests to sample (0.0-1.0)
type: Sampling type ("spatial" or "temporal")
"""
self.sample_ratio = sample_ratio
self.type = type
```

## Utility Functions

### Synthetic Trace Generation

```python
def create_zipf_requests(num_objects, num_requests, alpha, obj_size, seed=None):
"""
Create Zipf-distributed synthetic requests.
Args:
num_objects (int): Number of unique objects
num_requests (int): Total number of requests to generate
alpha (float): Zipf skewness parameter (higher = more skewed)
obj_size (int): Size of each object in bytes
seed (int, optional): Random seed for reproducibility
Returns:
List[Request]: List of generated requests
"""

def create_uniform_requests(num_objects, num_requests, obj_size, seed=None):
"""
Create uniformly-distributed synthetic requests.
Args:
num_objects (int): Number of unique objects
num_requests (int): Total number of requests to generate
obj_size (int): Size of each object in bytes
seed (int, optional): Random seed for reproducibility
Returns:
List[Request]: List of generated requests
"""
```

### Cache Algorithms

Available cache algorithms with their factory functions:

```python
# Basic algorithms
LRU(cache_size: int) -> Cache
LFU(cache_size: int) -> Cache
FIFO(cache_size: int) -> Cache
Clock(cache_size: int) -> Cache
Random(cache_size: int) -> Cache

# Advanced algorithms
ARC(cache_size: int) -> Cache
S3FIFO(cache_size: int) -> Cache
Sieve(cache_size: int) -> Cache
TinyLFU(cache_size: int) -> Cache
TwoQ(cache_size: int) -> Cache
LRB(cache_size: int) -> Cache

# Experimental algorithms
cache_3L(cache_size: int) -> Cache
```

### Performance Metrics

```python
class CacheStats:
"""Cache performance statistics."""

def __init__(self):
self.hits = 0
self.misses = 0
self.evictions = 0
self.bytes_written = 0
self.bytes_read = 0

@property
def hit_ratio(self) -> float:
"""Calculate hit ratio."""
total = self.hits + self.misses
return self.hits / total if total > 0 else 0.0

@property
def miss_ratio(self) -> float:
"""Calculate miss ratio."""
return 1.0 - self.hit_ratio
```

## Error Handling

The library uses standard Python exceptions:

- `ValueError`: Invalid parameters or configuration
- `FileNotFoundError`: Trace file not found
- `RuntimeError`: Runtime errors from underlying C++ library
- `MemoryError`: Out of memory conditions

Example error handling:

```python
try:
reader = lcs.TraceReader("nonexistent.csv", lcs.TraceType.CSV_TRACE)
except FileNotFoundError:
print("Trace file not found")
except ValueError as e:
print(f"Invalid configuration: {e}")
```

## Configuration Options

### Reader Configuration

```python
reader_params = lcs.ReaderInitParam(
has_header=True, # CSV has header row
delimiter=",", # Field delimiter
obj_id_is_num=True, # Object IDs are numeric
ignore_obj_size=False, # Don't ignore object sizes
ignore_size_zero_req=True, # Ignore zero-size requests
cap_at_n_req=1000000, # Limit number of requests
block_size=4096, # Block size for block-based traces
trace_start_offset=0, # Skip initial requests
)

# Field mappings (1-indexed)
reader_params.time_field = 1
reader_params.obj_id_field = 2
reader_params.obj_size_field = 3
reader_params.op_field = 4
```

### Sampling Configuration

```python
sampler = lcs.Sampler(
sample_ratio=0.1, # Sample 10% of requests
type=lcs.SamplerType.SPATIAL_SAMPLER # Spatial sampling
)
reader_params.sampler = sampler
```

## Thread Safety

The library provides thread-safe operations for most use cases:

- Cache operations are thread-safe within a single cache instance
- Multiple readers can be used concurrently
- Analysis operations can utilize multiple threads

For high-concurrency scenarios, consider using separate cache instances per thread.

## Memory Management

The library automatically manages memory for most operations:

- Cache objects handle their own memory allocation
- Trace readers manage buffering automatically
- Request objects are lightweight and reusable

For large-scale simulations, monitor memory usage and consider:

- Using sampling to reduce trace size
- Processing traces in chunks
- Limiting cache sizes appropriately

## Best Practices

1. **Use appropriate cache sizes**: Size caches based on your simulation goals
2. **Set random seeds**: For reproducible results in synthetic traces
3. **Handle errors**: Always wrap file operations in try-catch blocks
4. **Monitor memory**: For large traces, consider sampling or chunking
5. **Use threading**: Leverage multi-threading for analysis tasks
6. **Validate traces**: Check trace format and content before simulation
[TBD]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The API reference documentation has been removed and replaced with "[TBD]". This is a significant documentation regression. While the rest of the documentation has been greatly improved, removing the API reference will make it harder for users to understand and use the library's features. If the plan is to auto-generate this documentation, it would be best to include that in this PR or a subsequent one soon. If not, please restore the previous content and update it to reflect the changes in this PR.

Comment on lines 559 to +563
from .libcachesim_python import ThreeLCache_init
except ImportError:
raise ImportError("ThreeLCache is not installed. Please install it with `pip install libcachesim[all]`")
raise ImportError(
'ThreeLCache is not installed. Please install it with `CMAKE_ARGS="-DENABLE_3L_CACHE=ON" pip install libcachesim --force-reinstall`'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new error message is a great improvement as it provides clear, actionable instructions for the user. To make it even more robust, consider making the message more explanatory about why the import fails and less tied to a specific package manager (pip). This would also be a good opportunity to suggest using --no-cache-dir to avoid issues with cached wheels.

A similar improvement can be applied to the error messages for GLCache and LRB.

Here's a suggested improvement:

Suggested change
from .libcachesim_python import ThreeLCache_init
except ImportError:
raise ImportError("ThreeLCache is not installed. Please install it with `pip install libcachesim[all]`")
raise ImportError(
'ThreeLCache is not installed. Please install it with `CMAKE_ARGS="-DENABLE_3L_CACHE=ON" pip install libcachesim --force-reinstall`'
)
raise ImportError(
'ThreeLCache is not available. This optional feature must be enabled at build time. '
'Please reinstall with the `ENABLE_3L_CACHE` CMake option set to ON. '
'For example: `CMAKE_ARGS="-DENABLE_3L_CACHE=ON" pip install --force-reinstall --no-cache-dir libcachesim`'
)

@haochengxia haochengxia merged commit 544a3d4 into main Aug 4, 2025
21 checks passed
@haochengxia haochengxia deleted the hxia/patch branch August 4, 2025 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants