Skip to content

willibrandon/pg_walrus

Repository files navigation

pg_walrus

WAL + Rust = Walrus

A PostgreSQL extension that automatically monitors and adjusts max_wal_size to prevent performance-degrading forced checkpoints. pg_walrus is a Rust rewrite of pg_walsizer using the pgrx framework.

The Problem

PostgreSQL's max_wal_size determines how much WAL data can accumulate between checkpoints. When this limit is exceeded before checkpoint_timeout, PostgreSQL forces a checkpoint—which can dramatically reduce OLTP performance (sometimes by an order of magnitude).

The default max_wal_size is only 1GB, and most active systems need more. But how much more? This extension removes the guesswork.

How It Works

pg_walrus runs a background worker that:

  1. Wakes every checkpoint_timeout interval
  2. Checks how many forced checkpoints occurred
  3. If forced checkpoints exceed the threshold, calculates a new max_wal_size
  4. Applies the change via ALTER SYSTEM and signals PostgreSQL to reload
LOG:  detected 4 forced checkpoints over 60 seconds
LOG:  WAL request threshold (2) met, resizing max_wal_size
LOG:  current max_wal_size is 512, should be 2560
LOG:  received SIGHUP, reloading configuration files
LOG:  parameter "max_wal_size" changed to "2560"

Features

Current

  • Background worker monitoring checkpoint activity
  • Automatic max_wal_size increases when forced checkpoints exceed threshold
  • Configurable maximum cap to prevent runaway growth
  • Live configuration updates via ALTER SYSTEM + SIGHUP
  • Auto-Shrink: Automatically reduce max_wal_size after sustained periods of low checkpoint activity
  • History Table: Full audit trail of all sizing adjustments in walrus.history
  • SQL Functions: Query status, history, and recommendations; trigger immediate analysis
  • Dry-Run Mode: Test behavior without making changes (walrus.dry_run = true)
  • Rate Limiting: Prevent thrashing with cooldown periods and hourly adjustment limits

Planned

  • NOTIFY Events: Real-time notifications on adjustments
  • Prometheus Metrics: Standard monitoring integration
  • Smart Algorithms: Multiple sizing strategies (adaptive, percentile)

Installation

Prerequisites

  • Rust toolchain (rustc, cargo, rustfmt) from https://rustup.rs
  • libclang 11+ (for bindgen)
  • PostgreSQL build dependencies

See the pgrx system requirements for platform-specific details.

Building from Source (Rust)

# Install cargo-pgrx
cargo install --locked cargo-pgrx

# Initialize pgrx (downloads and compiles Postgres versions to ~/.pgrx/)
# For a single version:
cargo pgrx init --pg18 download

# Or for all supported versions:
cargo pgrx init

# Clone and build pg_walrus
git clone https://github.com/willibrandon/pg_walrus.git
cd pg_walrus

# Run interactively with psql
cargo pgrx run pg18

# Run integration tests
cargo pgrx test pg18

# Run SQL regression tests (requires --postgresql-conf for background worker)
cargo pgrx regress pg18 --postgresql-conf "shared_preload_libraries='pg_walrus'"

# Create installation package
cargo pgrx package

Installing to System Postgres

# Install to Postgres found via pg_config on $PATH
cargo pgrx install --release

# Or with sudo if needed
cargo pgrx install --release --sudo

Enable the Extension

Add to postgresql.conf (or use ALTER SYSTEM):

-- Add to existing shared_preload_libraries
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements, pg_walrus';

Or in postgresql.conf:

shared_preload_libraries = 'pg_walrus'  # add to existing list if needed

Restart PostgreSQL:

pg_ctl restart -D $PGDATA

Configuration

Core Parameters

Parameter Default Description
walrus.enable true Enable/disable automatic resizing
walrus.max 4GB Maximum allowed max_wal_size
walrus.threshold 2 Forced checkpoints before resize

Auto-Shrink Parameters

Parameter Default Description
walrus.shrink_enable true Enable/disable automatic shrinking
walrus.shrink_factor 0.75 Multiplier for shrink calculation (0.01-0.99)
walrus.shrink_intervals 5 Quiet intervals before shrinking (1-1000)
walrus.min_size 1GB Minimum floor for max_wal_size

History Parameters

Parameter Default Description
walrus.history_retention_days 7 Days to retain history records (0-3650)

Dry-Run Parameters

Parameter Default Description
walrus.dry_run false Log decisions without executing ALTER SYSTEM

Rate Limiting Parameters

Parameter Default Description
walrus.cooldown_sec 300 Minimum seconds between adjustments (0-86400)
walrus.max_changes_per_hour 4 Maximum adjustments per rolling hour (0-1000)

All parameters require SIGHUP to take effect (no restart needed).

Database Connection

Parameter Default Description
walrus.database postgres Database where history table is stored (requires restart)

Note: walrus.database has postmaster context and requires a PostgreSQL restart to change.

History Table

The walrus.history table records all sizing decisions made by pg_walrus. The table is created in the walrus schema when you run CREATE EXTENSION pg_walrus.

Schema

walrus.history (
    id BIGSERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
    action TEXT NOT NULL,           -- 'increase', 'decrease', 'capped', 'dry_run', or 'skipped'
    old_size_mb INTEGER NOT NULL,
    new_size_mb INTEGER NOT NULL,
    forced_checkpoints BIGINT NOT NULL,
    checkpoint_timeout_sec INTEGER NOT NULL,
    reason TEXT,
    metadata JSONB                  -- Action-specific details
)

Querying History

-- Recent sizing decisions
SELECT timestamp, action, old_size_mb, new_size_mb, reason
FROM walrus.history
ORDER BY timestamp DESC
LIMIT 10;

-- Summary by action type
SELECT action, count(*), avg(new_size_mb - old_size_mb)::int AS avg_change
FROM walrus.history
GROUP BY action;

-- Export for compliance audit
COPY (SELECT * FROM walrus.history WHERE timestamp >= '2025-01-01' ORDER BY timestamp)
TO '/tmp/walrus_audit.csv' WITH CSV HEADER;

Automatic Cleanup

Old history records are automatically deleted based on walrus.history_retention_days. You can also manually trigger cleanup:

-- Delete records older than retention period
SELECT walrus.cleanup_history();
-- Returns: number of deleted records

Dry-Run Mode

Dry-run mode allows you to test pg_walrus behavior without making actual configuration changes. When enabled, the extension logs what decisions WOULD be made and records them to the history table with action = 'dry_run'.

Enabling Dry-Run Mode

-- Enable dry-run mode
ALTER SYSTEM SET walrus.dry_run = true;
SELECT pg_reload_conf();

-- Verify it's enabled
SHOW walrus.dry_run;

Log Output

When dry-run is enabled, sizing decisions appear in the PostgreSQL log with a [DRY-RUN] prefix:

LOG:  pg_walrus [DRY-RUN]: would change max_wal_size from 1024 MB to 2048 MB (threshold exceeded)
LOG:  pg_walrus [DRY-RUN]: would change max_wal_size from 4096 MB to 3072 MB (sustained low activity)

History Records

Dry-run decisions are recorded in walrus.history with action = 'dry_run' and metadata indicating what action would have been taken:

SELECT timestamp, action, old_size_mb, new_size_mb,
       metadata->>'would_apply' AS would_apply
FROM walrus.history
WHERE action = 'dry_run'
ORDER BY timestamp DESC;

Use Cases

  1. Pre-production validation: Test the extension in a staging environment before enabling in production
  2. Parameter tuning: Experiment with walrus.threshold and walrus.shrink_factor values
  3. Compliance auditing: Generate a complete audit trail of all sizing decisions without system impact

Rate Limiting

Rate limiting prevents thrashing during unstable workloads by enforcing a cooldown period between adjustments and limiting the total number of adjustments per hour.

Configuration

-- Set cooldown to 10 minutes between adjustments
ALTER SYSTEM SET walrus.cooldown_sec = 600;

-- Allow maximum 2 adjustments per hour
ALTER SYSTEM SET walrus.max_changes_per_hour = 2;

-- Reload configuration
SELECT pg_reload_conf();

Monitoring Rate Limit State

-- Check current rate limiting status
SELECT
    status->>'cooldown_active' AS cooldown_active,
    status->>'cooldown_remaining_sec' AS cooldown_remaining,
    status->>'changes_this_hour' AS changes_this_hour,
    status->>'hourly_limit_reached' AS hourly_limit_reached
FROM walrus.status() AS status;

Log Output

When an adjustment is blocked by rate limiting:

LOG:  pg_walrus: adjustment blocked - cooldown active (150 seconds remaining)
LOG:  pg_walrus: adjustment blocked - hourly limit reached (4 of 4)

Skipped Adjustments in History

Blocked adjustments are recorded in walrus.history with action = 'skipped':

SELECT timestamp, reason, metadata->>'blocked_by' AS blocked_by
FROM walrus.history
WHERE action = 'skipped'
ORDER BY timestamp DESC;

Edge Cases

  • walrus.cooldown_sec = 0: Disables cooldown (adjustments can happen every interval)
  • walrus.max_changes_per_hour = 0: Blocks all automatic adjustments (manual only via walrus.analyze(apply := true))
  • Manual adjustments via walrus.analyze(apply := true) bypass rate limiting

SQL Functions

All functions are in the walrus schema.

walrus.status()

Returns JSONB with current extension state including configuration, worker status, and counters.

SELECT walrus.status();
-- Returns: {"enabled": true, "worker_running": true, "current_max_wal_size_mb": 1024, ...}

-- Pretty-print for readability
SELECT jsonb_pretty(walrus.status());

walrus.history()

Returns adjustment history as a set of records (alternative to querying the table directly).

SELECT * FROM walrus.history();
-- Returns: timestamp, action, old_size_mb, new_size_mb, forced_checkpoints, reason

walrus.recommendation()

Returns the current sizing recommendation without applying any changes.

SELECT walrus.recommendation();
-- Returns: {"action": "increase", "current_size_mb": 512, "recommended_size_mb": 1024, "confidence": 85, "reason": "..."}

walrus.analyze(apply)

Triggers immediate analysis. With apply := true, executes the recommendation (superuser only).

-- Just analyze, don't apply
SELECT walrus.analyze();

-- Analyze and apply the recommendation immediately
SELECT walrus.analyze(apply := true);
-- Returns: {"analyzed": true, "applied": true, "recommendation": {...}}

walrus.reset()

Clears all history and resets shared memory counters. Superuser only.

SELECT walrus.reset();
-- Returns: true

walrus.cleanup_history()

Deletes history records older than walrus.history_retention_days.

SELECT walrus.cleanup_history();
-- Returns: number of deleted records

PostgreSQL Version Support

  • PostgreSQL 15
  • PostgreSQL 16
  • PostgreSQL 17
  • PostgreSQL 18

Requires PostgreSQL 15+ due to pgstat_fetch_stat_checkpointer() API.

Development

Running Tests

pg_walrus uses pgrx-managed PostgreSQL instances for development and testing. These are separate from any system PostgreSQL installations.

# Integration tests (automatically configures shared_preload_libraries via pg_test module)
cargo pgrx test pg18

# SQL regression tests (requires explicit --postgresql-conf)
cargo pgrx regress pg18 --postgresql-conf "shared_preload_libraries='pg_walrus'"

# Test all supported versions
for v in pg15 pg16 pg17 pg18; do
    cargo pgrx test $v || exit 1
    cargo pgrx regress $v --postgresql-conf "shared_preload_libraries='pg_walrus'" || exit 1
done

Note: cargo pgrx test reads shared_preload_libraries from the pg_test::postgresql_conf_options() function in src/lib.rs. cargo pgrx regress does not—you must pass --postgresql-conf explicitly for background worker extensions.

About

PostgreSQL extension that automatically adjusts max_wal_size to prevent forced checkpoints

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published