[Build] Use CMAKE_ARGS to control optional features #24

haochengxia · 2025-08-04T05:37:10Z

Remove all old docs and add a new doc structure
Use CMAKE_ARGS to decide whether to enable LRB GL and 3L

gemini-code-assist

Summary of Changes

Hello @haochengxia, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant overhaul to the project's build system and documentation. The primary goal is to provide more granular control over optional features during compilation using CMAKE_ARGS, while simultaneously restructuring and enhancing the user documentation for improved clarity and ease of use. This includes making optional features default to off for source builds but ensuring they are enabled in pre-built wheels, alongside a major reorganization of the documentation content and navigation.

Highlights

Build System Refinement: Optional cache algorithms (GLCACHE, LRB, 3L_CACHE) are now disabled by default in CMakeLists.txt and can be explicitly enabled using CMAKE_ARGS during compilation, providing more granular control for source builds.
Pre-built Wheel Configuration: The pyproject.toml configuration for cibuildwheel has been updated to ensure that pre-built Python wheels include all optional features enabled by default, simplifying installation for most users who rely on pre-compiled binaries.
Documentation Overhaul: The project's documentation structure has been completely reorganized, introducing new top-level sections like 'User Guide' and 'Developer Guide,' and breaking down 'Getting Started' and 'Examples' into more detailed sub-sections for improved clarity and navigation.
Enhanced Quickstart Guide: A comprehensive new quickstart guide has been added, covering installation (including uv and CMAKE_ARGS usage), cache simulation, trace analysis, and the plugin system, with practical code examples to help users get started quickly.
New Test Coverage: Dedicated tests for the optional GLCache, LRB, and 3LCache algorithms have been added, ensuring their functionality when enabled in the build.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the build system to make optional features (GLCache, LRB, 3LCache) disabled by default and controllable via CMAKE_ARGS. This is a solid improvement for users building from source, as it reduces the default set of dependencies. The documentation is significantly restructured and improved to guide users through the new installation process for optional features. The error messages for missing optional features are also much more helpful.

My main concern is the removal of the API reference documentation, which is a critical resource for users. I've also provided a suggestion to improve the clarity of the error messages for when optional features are not installed.

gemini-code-assist · 2025-08-04T05:39:04Z

docs/src/en/api.md

 # API Reference

-This page provides detailed API documentation for the libCacheSim Python bindings.
-
-## Core Classes
-
-### Cache Classes
-
-All cache classes inherit from the base cache interface and provide the following methods:
-
-```python
-class Cache:
-    """Base cache interface."""
-
-    def get(self, obj_id: int, obj_size: int = 1) -> bool:
-        """Request an object from the cache.
-        
-        Args:
-            obj_id: Object identifier
-            obj_size: Object size in bytes
-            
-        Returns:
-            True if cache hit, False if cache miss
-        """
-
-    def get_hit_ratio(self) -> float:
-        """Get the current cache hit ratio."""
-
-    def get_miss_ratio(self) -> float:
-        """Get the current cache miss ratio."""
-
-    def get_num_hits(self) -> int:
-        """Get the total number of cache hits."""
-
-    def get_num_misses(self) -> int:
-        """Get the total number of cache misses."""
-```
-
-### Available Cache Algorithms
-
-```python
-# Basic algorithms
-def LRU(cache_size: int) -> Cache: ...
-def LFU(cache_size: int) -> Cache: ...
-def FIFO(cache_size: int) -> Cache: ...
-def Clock(cache_size: int) -> Cache: ...
-def Random(cache_size: int) -> Cache: ...
-
-# Advanced algorithms  
-def ARC(cache_size: int) -> Cache: ...
-def S3FIFO(cache_size: int) -> Cache: ...
-def Sieve(cache_size: int) -> Cache: ...
-def TinyLFU(cache_size: int) -> Cache: ...
-def TwoQ(cache_size: int) -> Cache: ...
-```ence
-
-This page provides detailed API documentation for libCacheSim Python bindings.
-
-## Core Classes
-
-### Cache Classes
-
-All cache classes inherit from the base cache interface and provide the following methods:
-
-::: libcachesim.cache
-
-### TraceReader
-
-```python
-class TraceReader:
-    """Read trace files in various formats."""
-
-    def __init__(self, trace_path: str, trace_type: TraceType, 
-                 reader_params: ReaderInitParam = None):
-        """Initialize trace reader.
-        
-        Args:
-            trace_path: Path to trace file
-            trace_type: Type of trace format
-            reader_params: Optional reader configuration
-        """
-
-    def __iter__(self):
-        """Iterate over requests in the trace."""
-
-    def reset(self):
-        """Reset reader to beginning of trace."""
-
-    def skip(self, n: int):
-        """Skip n requests."""
-
-    def clone(self):
-        """Create a copy of the reader."""
-```
-
-### SyntheticReader  
-
-```python
-class SyntheticReader:
-    """Generate synthetic workloads."""
-
-    def __init__(self, num_objects: int, num_requests: int,
-                 distribution: str = "zipf", alpha: float = 1.0,
-                 obj_size: int = 1, seed: int = None):
-        """Initialize synthetic reader.
-        
-        Args:
-            num_objects: Number of unique objects
-            num_requests: Total requests to generate
-            distribution: Distribution type ("zipf", "uniform")
-            alpha: Zipf skewness parameter
-            obj_size: Object size in bytes
-            seed: Random seed for reproducibility
-        """
-```
-
-### TraceAnalyzer
-
-```python
-class TraceAnalyzer:
-    """Analyze trace characteristics."""
-
-    def __init__(self, trace_path: str, trace_type: TraceType,
-                 reader_params: ReaderInitParam = None):
-        """Initialize trace analyzer."""
-
-    def get_num_requests(self) -> int:
-        """Get total number of requests."""
-
-    def get_num_objects(self) -> int:
-        """Get number of unique objects."""
-
-    def get_working_set_size(self) -> int:
-        """Get working set size."""
-```
-
-## Enumerations and Constants
-
-### TraceType
-
-```python
-class TraceType:
-    """Supported trace file formats."""
-    CSV_TRACE = "csv"
-    BINARY_TRACE = "binary"  
-    ORACLE_GENERAL_TRACE = "oracle"
-    PLAIN_TXT_TRACE = "txt"
-```
-
-### SamplerType
-
-```python
-class SamplerType:
-    """Sampling strategies."""
-    SPATIAL_SAMPLER = "spatial"
-    TEMPORAL_SAMPLER = "temporal"
-```
-
-### ReqOp
-
-```python
-class ReqOp:
-    """Request operation types."""
-    READ = "read"
-    WRITE = "write"
-    DELETE = "delete"
-```
-
-## Data Structures
-
-### Request
-
-```python
-class Request:
-    """Represents a cache request."""
-
-    def __init__(self):
-        self.obj_id: int = 0
-        self.obj_size: int = 1
-        self.timestamp: int = 0
-        self.op: str = "read"
-```
-
-### ReaderInitParam
-
-```python
-class ReaderInitParam:
-    """Configuration parameters for trace readers."""
-
-    def __init__(self):
-        self.has_header: bool = False
-        self.delimiter: str = ","
-        self.obj_id_is_num: bool = True
-        self.ignore_obj_size: bool = False
-        self.ignore_size_zero_req: bool = True
-        self.cap_at_n_req: int = -1
-        self.block_size: int = 4096
-        self.trace_start_offset: int = 0
-
-        # Field mappings (1-indexed)
-        self.time_field: int = 1
-        self.obj_id_field: int = 2
-        self.obj_size_field: int = 3
-        self.op_field: int = 4
-
-        self.sampler: Sampler = None
-```
-
-### Sampler
-
-```python
-class Sampler:
-    """Configuration for request sampling."""
-
-    def __init__(self, sample_ratio: float = 1.0, 
-                 type: str = "spatial"):
-        """Initialize sampler.
-        
-        Args:
-            sample_ratio: Fraction of requests to sample (0.0-1.0)
-            type: Sampling type ("spatial" or "temporal")
-        """
-        self.sample_ratio = sample_ratio
-        self.type = type
-```
-
-## Utility Functions
-
-### Synthetic Trace Generation
-
-```python
-def create_zipf_requests(num_objects, num_requests, alpha, obj_size, seed=None):
-    """
-    Create Zipf-distributed synthetic requests.
-    
-    Args:
-        num_objects (int): Number of unique objects
-        num_requests (int): Total number of requests to generate
-        alpha (float): Zipf skewness parameter (higher = more skewed)
-        obj_size (int): Size of each object in bytes
-        seed (int, optional): Random seed for reproducibility
-        
-    Returns:
-        List[Request]: List of generated requests
-    """
-
-def create_uniform_requests(num_objects, num_requests, obj_size, seed=None):
-    """
-    Create uniformly-distributed synthetic requests.
-    
-    Args:
-        num_objects (int): Number of unique objects
-        num_requests (int): Total number of requests to generate  
-        obj_size (int): Size of each object in bytes
-        seed (int, optional): Random seed for reproducibility
-        
-    Returns:
-        List[Request]: List of generated requests
-    """
-```
-
-### Cache Algorithms
-
-Available cache algorithms with their factory functions:
-
-```python
-# Basic algorithms
-LRU(cache_size: int) -> Cache
-LFU(cache_size: int) -> Cache  
-FIFO(cache_size: int) -> Cache
-Clock(cache_size: int) -> Cache
-Random(cache_size: int) -> Cache
-
-# Advanced algorithms
-ARC(cache_size: int) -> Cache
-S3FIFO(cache_size: int) -> Cache
-Sieve(cache_size: int) -> Cache
-TinyLFU(cache_size: int) -> Cache
-TwoQ(cache_size: int) -> Cache
-LRB(cache_size: int) -> Cache
-
-# Experimental algorithms
-cache_3L(cache_size: int) -> Cache
-```
-
-### Performance Metrics
-
-```python
-class CacheStats:
-    """Cache performance statistics."""
-
-    def __init__(self):
-        self.hits = 0
-        self.misses = 0
-        self.evictions = 0
-        self.bytes_written = 0
-        self.bytes_read = 0
-
-    @property
-    def hit_ratio(self) -> float:
-        """Calculate hit ratio."""
-        total = self.hits + self.misses
-        return self.hits / total if total > 0 else 0.0
-
-    @property
-    def miss_ratio(self) -> float:
-        """Calculate miss ratio."""
-        return 1.0 - self.hit_ratio
-```
-
-## Error Handling
-
-The library uses standard Python exceptions:
-
- `ValueError`: Invalid parameters or configuration
- `FileNotFoundError`: Trace file not found
- `RuntimeError`: Runtime errors from underlying C++ library
- `MemoryError`: Out of memory conditions
-
-Example error handling:
-
-```python
-try:
-    reader = lcs.TraceReader("nonexistent.csv", lcs.TraceType.CSV_TRACE)
-except FileNotFoundError:
-    print("Trace file not found")
-except ValueError as e:
-    print(f"Invalid configuration: {e}")
-```
-
-## Configuration Options
-
-### Reader Configuration
-
-```python
-reader_params = lcs.ReaderInitParam(
-    has_header=True,           # CSV has header row
-    delimiter=",",             # Field delimiter
-    obj_id_is_num=True,       # Object IDs are numeric
-    ignore_obj_size=False,    # Don't ignore object sizes
-    ignore_size_zero_req=True, # Ignore zero-size requests
-    cap_at_n_req=1000000,     # Limit number of requests
-    block_size=4096,          # Block size for block-based traces
-    trace_start_offset=0,     # Skip initial requests
-)
-
-# Field mappings (1-indexed)
-reader_params.time_field = 1
-reader_params.obj_id_field = 2  
-reader_params.obj_size_field = 3
-reader_params.op_field = 4
-```
-
-### Sampling Configuration
-
-```python
-sampler = lcs.Sampler(
-    sample_ratio=0.1,                    # Sample 10% of requests
-    type=lcs.SamplerType.SPATIAL_SAMPLER # Spatial sampling
-)
-reader_params.sampler = sampler
-```
-
-## Thread Safety
-
-The library provides thread-safe operations for most use cases:
-
- Cache operations are thread-safe within a single cache instance
- Multiple readers can be used concurrently  
- Analysis operations can utilize multiple threads
-
-For high-concurrency scenarios, consider using separate cache instances per thread.
-
-## Memory Management
-
-The library automatically manages memory for most operations:
-
- Cache objects handle their own memory allocation
- Trace readers manage buffering automatically  
- Request objects are lightweight and reusable
-
-For large-scale simulations, monitor memory usage and consider:
-
- Using sampling to reduce trace size
- Processing traces in chunks
- Limiting cache sizes appropriately
-
-## Best Practices
-
-1. **Use appropriate cache sizes**: Size caches based on your simulation goals
-2. **Set random seeds**: For reproducible results in synthetic traces
-3. **Handle errors**: Always wrap file operations in try-catch blocks
-4. **Monitor memory**: For large traces, consider sampling or chunking
-5. **Use threading**: Leverage multi-threading for analysis tasks
-6. **Validate traces**: Check trace format and content before simulation
+[TBD]


The API reference documentation has been removed and replaced with "[TBD]". This is a significant documentation regression. While the rest of the documentation has been greatly improved, removing the API reference will make it harder for users to understand and use the library's features. If the plan is to auto-generate this documentation, it would be best to include that in this PR or a subsequent one soon. If not, please restore the previous content and update it to reflect the changes in this PR.

gemini-code-assist · 2025-08-04T05:39:04Z

libcachesim/cache.py

            from .libcachesim_python import ThreeLCache_init
        except ImportError:
-            raise ImportError("ThreeLCache is not installed. Please install it with `pip install libcachesim[all]`")
+            raise ImportError(
+                'ThreeLCache is not installed. Please install it with `CMAKE_ARGS="-DENABLE_3L_CACHE=ON" pip install libcachesim --force-reinstall`'
+            )


The new error message is a great improvement as it provides clear, actionable instructions for the user. To make it even more robust, consider making the message more explanatory about why the import fails and less tied to a specific package manager (pip). This would also be a good opportunity to suggest using --no-cache-dir to avoid issues with cached wheels.

A similar improvement can be applied to the error messages for GLCache and LRB.

Here's a suggested improvement:

Suggested change

from .libcachesim_python import ThreeLCache_init

except ImportError:

raise ImportError("ThreeLCache is not installed. Please install it with `pip install libcachesim[all]`")

raise ImportError(

'ThreeLCache is not installed. Please install it with `CMAKE_ARGS="-DENABLE_3L_CACHE=ON" pip install libcachesim --force-reinstall`'

)

raise ImportError(

'ThreeLCache is not available. This optional feature must be enabled at build time. '

'Please reinstall with the `ENABLE_3L_CACHE` CMake option set to ON. '

'For example: `CMAKE_ARGS="-DENABLE_3L_CACHE=ON" pip install --force-reinstall --no-cache-dir libcachesim`'

)

haochengxia added 3 commits August 4, 2025 05:29

Use cmake args and update doc

2a3e315

Format with ruff

2cf1216

Remove optional deps in workflow

7c92186

gemini-code-assist bot reviewed Aug 4, 2025

View reviewed changes

haochengxia added 2 commits August 4, 2025 05:41

Run optinal only in cibuild

b3e4199

Fix tests in cibuild

34950ef

haochengxia merged commit 544a3d4 into main Aug 4, 2025
21 checks passed

haochengxia deleted the hxia/patch branch August 4, 2025 06:00

haochengxia temporarily deployed to testpypi August 4, 2025 06:05 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Build] Use CMAKE_ARGS to control optional features #24

[Build] Use CMAKE_ARGS to control optional features #24

Uh oh!

haochengxia commented Aug 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 4, 2025

Uh oh!

gemini-code-assist bot Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Build] Use CMAKE_ARGS to control optional features #24

[Build] Use CMAKE_ARGS to control optional features #24

Uh oh!

Conversation

haochengxia commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haochengxia commented Aug 4, 2025 •

edited

Loading