Skip to content

lukefleed/two-pass-lanczos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Cache Efficient, Low-Memory Lanczos Algorithm

This repository contains a Rust implementation of a memory-efficient, two-pass variant of the Lanczos algorithm for computing the action of a matrix function on a vector, $f(\mathbf{A}) \mathbf{b}$. The two-pass Lanczos algorithm significantly reduces memory consumption compared to the standard Lanczos method by avoiding the storage of the entire Krylov basis during the iterations, at the cost of performing double the number of matrix-vector products.

The implementation is based on the library faer.

Best way to have a look at the API

You can explore the API documentation generated by Rust's documentation tool, rustdoc. To build and view the documentation locally, run the following command in the project root directory:

cargo doc --open

The project is heavily documented with doc comments, so this will give you a full overview of the available functions, types, modules and technical choices made in the implementation.

The --open flag might not work as expected, in that case you can manually open the generated documentation in .html with your browser. The output of the command will tell you where the documentation is located.

The library is structured into two main API layers to serve different use cases:

High-Level API: The solvers Module

The solvers module provides the primary, user-facing API. It contains two main functions:

  • lanczos: The standard one-pass Lanczos algorithm.
  • lanczos_two_pass: The memory-efficient two-pass variant.

These solvers abstract away the internal mechanics of the Lanczos iteration and are the recommended entry point for most applications.

Low-Level API: The algorithms Module

The algorithms module exposes the core building blocks of the Lanczos iterations. This API is intended for advanced use cases and testing purposes where fine-grained control is necessary. It allows for:

  • Direct access to intermediate data, such as the Lanczos basis matrix $\mathbf{V}_k$.
  • Implementation of custom logic between the two passes of the two-pass algorithm.
  • Benchmarking specific parts of the algorithm.

While this module is public, for general use, we strongly recommend using the high-level solvers, as they provide a safer and more convenient interface.

Compilation and Optimization

Building the Project

Compile the project with full optimizations:

cargo build --release

The release profile includes aggressive optimization settings:

  • Full Link-Time Optimization (LTO): lto = "fat" enables cross-crate optimization for maximum performance
  • Single Compilation Unit: codegen-units = 1 forces the optimizer to consider the entire crate context
  • Runtime Optimizations: opt-level = 3 enables all compiler optimizations
  • Debug Information: Retained in release builds for meaningful stack traces during long computations
  • Disabled Overflow Checks: Standard for high-performance numerical code

Compile time may be longer due to LTO and single codegen unit settings, but the resulting binaries will be significantly faster. If you need faster compile times, just comment out the relevant lines in Cargo.toml or build without --release. Note that debug builds will be significantly slower for numerical workloads

Build Script (build.rs)

The build.rs script automatically generates test suites by scanning the data/ directory for test instances. It performs the following operations:

  1. Instance Discovery: Scans directories data/1000, data/2000, and data/3000 for .dmx and .qfc file pairs
  2. Test Generation: Creates parameterized test functions for each discovered instance
  3. Property Validation: Generates tests for four mathematical properties:
    • Decomposition Consistency: Validates the Lanczos tridiagonal decomposition
    • Lanczos Relation: Verifies the fundamental recurrence relation
    • Orthonormality: Checks orthogonality of the Krylov basis vectors
    • Reconstruction Stability: Tests numerical stability of the two-pass reconstruction

The generated tests are included at compile time via the include! macro in tests/correctness.rs.

The build.rs is automatically executed during the build process and does not require manual invocation.

Data Generation

The datagen binary orchestrates a three-stage pipeline to generate KKT test instances using external tools.

Usage

cargo run --release --bin datagen -- \
    --arcs 5000 \
    --rho 3 \
    --instance-id 1 \
    --fixed-cost a \
    --quadratic-cost a \
    --scaling ns \
    --output-dir data/custom

Parameters

  • arcs: Number of arcs in the generated network
  • rho: Structural parameter (1-3) controlling network topology
  • instance-id: Unique identifier used as random seed
  • fixed-cost: Cost level for fixed costs (a = high, b = low)
  • quadratic-cost: Cost level for quadratic costs (a = high, b = low)
  • scaling: Arc capacity scaling (s = scaled, ns = not scaled)

Generation Pipeline

  1. pargen: Generates parameter file with problem specifications
  2. netgen: Creates network topology in DIMACS format (.dmx)
  3. qfcgen: Generates quadratic cost coefficients (.qfc)

Example: Generate Multiple Instances

# Generate instances with different structural parameters
for rho in 1 2 3; do
    for instance in 1 2 3; do
        cargo run --release --bin datagen -- \
            --arcs 10000 \
            --rho $rho \
            --instance-id $instance \
            --fixed-cost a \
            --quadratic-cost a \
            --scaling ns \
            --output-dir data/custom_10k
    done
done

Correctness Testing

The test suite validates properties of the Lanczos implementations. For this test we use the datasets downloaded from the CommaLab MCF repository.

Particularly, we test all the instances in the directories:

  • data/1000/ - Single-commodity problems with 1000 nodes, downloaded from here
  • data/2000/ - Single-commodity problems with 2000 nodes, downloaded from here
  • data/3000/ - Single-commodity problems with 3000 nodes, downloaded from here

You can download and extract these datasets using the following commands:

mkdir -p data/1000 data/2000 data/3000
wget https://commalab.di.unipi.it/files/Data/MCF/1000.tgz -O - | tar -xz -C data/1000 --strip-components=1
wget https://commalab.di.unipi.it/files/Data/MCF/2000.tgz -O - | tar -xz -C data/2000 --strip-components=1
wget https://commalab.di.unipi.it/files/Data/MCF/3000.tgz -O - | tar -xz -C data/3000 --strip-components=1

Running All Tests

cargo test --release

Test Categories

The tests are automatically generated by build.rs and cover four fundamental properties:

  1. Decomposition Consistency: Validates that the tridiagonal matrix T_k correctly represents the projection of A onto the Krylov subspace
  2. Lanczos Relation: Verifies the fundamental recurrence $A V_k = V_{k+1} \bar{T}_k$
  3. Orthonormality: Checks that the Krylov basis vectors maintain orthogonality
  4. Reconstruction Stability: Tests that the two-pass method reproduces the same basis as the standard method

Test Failures

Some tests may fail due to invalid indices in downloaded datasets. These failures are not algorithmic errors but data quality issues in the source datasets. The problematic datasets are primarily in directories:

  • data/1000/ - Single-commodity problems with 1000 nodes
  • data/2000/ - Single-commodity problems with 2000 nodes
  • data/3000/ - Single-commodity problems with 3000 nodes

Valid instances can be generated using the datagen utility as described above.

Running Specific Test Categories

# Run only orthonormality tests
cargo test orthonormality --release

# Run tests for a specific instance size
cargo test netgen_1000 --release

Reproducing Experimental Results

The project includes three experimental analyses corresponding to the results in tex/report.pdf. Each experiment generates CSV data that is then visualized using Python scripts.

1. Memory and Computation Trade-off

This experiment validates the theoretical memory-computation trade-off by analyzing how memory usage and execution time vary with iteration count across different problem scales. The main analysis is performed on three KKT system instances of different sizes:

  • Large-scale: 500,000 arcs
  • Medium-scale: 50,000 arcs
  • Small-scale: 5,000 arcs

All instances use rho=3 as the structural parameter for network topology generation.

Generate Data:

cargo run --release --bin datagen -- \
    --arcs 5000 \
    --rho 3 \
    --instance-id 1 \
    --fixed-cost a \
    --quadratic-cost a \
    --scaling ns \
    --output-dir data/arcs5k_rho3

cargo run --release --bin datagen -- \
    --arcs 50000 \
    --rho 3 \
    --instance-id 1 \
    --fixed-cost a \
    --quadratic-cost a \
    --scaling ns \
    --output-dir data/arcs50k_rho3

cargo run --release --bin datagen -- \
    --arcs 500000 \
    --rho 3 \
    --instance-id 1 \
    --fixed-cost a \
    --quadratic-cost a \
    --scaling ns \
    --output-dir data/arcs500k_rho3

Run Experiments

# Large-scale problem (500K arcs)
cargo run --release --bin tradeoff -- \
    --instance-dir data/arcs500k_rho3 \
    --output results/tradeoff_arcs500k_rho3.csv \
    --k-start 50 \
    --k-end 1000 \
    --k-step 50

# Medium-scale problem (50K arcs)
cargo run --release --bin tradeoff -- \
    --instance-dir data/arcs50k_rho3 \
    --output results/tradeoff_arcs50k_rho3.csv \
    --k-start 50 \
    --k-end 1000 \
    --k-step 50

# Small-scale problem (5K arcs)
cargo run --release --bin tradeoff -- \
    --instance-dir data/arcs5k_rho3 \
    --output results/tradeoff_arcs5k_rho3.csv \
    --k-start 50 \
    --k-end 1000 \
    --k-step 50

Generate Plots:

python3 python/plot_tradeoff.py \
    --input results/tradeoff_arcs500k_rho3.csv \
    --output-prefix results/images/tradeoff_arcs500k_rho3

python3 python/plot_tradeoff.py \
    --input results/tradeoff_arcs50k_rho3.csv \
    --output-prefix results/images/tradeoff_arcs50k_rho3

python3 python/plot_tradeoff.py \
    --input results/tradeoff_arcs5k_rho3.csv \
    --output-prefix results/images/tradeoff_arcs5k_rho3

This generates six plots total: memory and time plots for each of the three problem scales.

Dense Matrix Validation

We also run the same experiment on a dense random matrix, where the $O(n^2)$ cost of the matrix-vector product dominates. This helps isolate the effect of memory access patterns and cache behavior.

Generate Results:

cargo run --release --bin dense_tradeoff -- \
    --n 10000 \
    --k-start 100 \
    --k-end 1000 \
    --k-step 100 \
    --output results/dense_tradeoff.csv

Generate Plot:

python3 python/plot_dense_tradeoff.py \
    --input results/dense_tradeoff.csv \
    --output results/images/dense_tradeoff.pdf

This produces a single plot showing execution time versus iteration count for both Lanczos variants on a dense matrix.

2. Scalability with Problem Dimension

This experiment measures how memory usage and execution time scale with problem dimension while keeping the number of iterations fixed.

Generate Data and Results:

cargo run --release --bin scalability -- \
    --arcs-start 5000 \
    --arcs-end 500000 \
    --arcs-step 5000 \
    --rho 3 \
    --k-fixed 500 \
    --output results/scalability_k500_rho3.csv

Generate Plots:

python3 python/plot_scalability.py \
    --input results/scalability_k500_rho3.csv \
    --output-prefix results/images/scalability_k500_rho3

This generates two plots showing memory usage and execution time vs problem dimension, with k=500 iterations fixed across all problem sizes.

3. Dense Matrix Performance Validation

This experiment validates the theoretical performance analysis by measuring execution time on dense matrices, where the O(n²) matrix-vector product cost dominates computation. This confirms that sparse matrix performance characteristics are attributable to memory access patterns rather than algorithmic differences.

Generate Data:

cargo run --release --bin dense_tradeoff -- \
    --n 5000 \
    --k-start 50 \
    --k-end 1000 \
    --k-step 50 \
    --output results/dense_tradeoff.csv

Generate Plots:

python3 python/plot_dense_tradeoff.py \
    --input results/dense_tradeoff.csv \
    --output results/images/dense_tradeoff.pdf

This generates a single plot showing execution time vs iteration count for both Lanczos variants on a dense matrix.

4. Numerical Accuracy and Stability Analysis

This experiment provides accuracy measurements and orthogonality studies across multiple problem scenarios. It uses two executables (stability and orthogonality) to generate four distinct experimental configurations.

Problem Scenarios: The analysis tests two functions on two spectral conditions:

  • Functions:
    • inv: inverse $f(z) = z^{-1}$ (linear system solving)
    • exp: exponential $f(z) = \exp(z)$
  • Spectral Conditions:
    • well-conditioned: Eigenvalues well-separated from function singularities
    • ill-conditioned: Wide spectrum or eigenvalues near singularities

This creates four combinations

Generate Accuracy Results:

# Matrix inverse - well-conditioned spectrum
cargo run --release --bin stability -- \
    --function inv \
    --scenario well-conditioned \
    --n 10000 \
    --k-min 10 \
    --k-max 200 \
    --k-step 10 \
    --output results/accuracy_inv_well-conditioned.csv

# Matrix inverse - ill-conditioned spectrum
cargo run --release --bin stability -- \
    --function inv \
    --scenario ill-conditioned \
    --n 10000 \
    --k-min 10 \
    --k-max 200 \
    --k-step 10 \
    --output results/accuracy_inv_ill-conditioned.csv

# Matrix exponential - well-conditioned spectrum
cargo run --release --bin stability -- \
    --function exp \
    --scenario well-conditioned \
    --n 10000 \
    --k-min 10 \
    --k-max 200 \
    --k-step 10 \
    --output results/accuracy_exp_well-conditioned.csv

# Matrix exponential - ill-conditioned spectrum
cargo run --release --bin stability -- \
    --function exp \
    --scenario ill-conditioned \
    --n 10000 \
    --k-min 10 \
    --k-max 200 \
    --k-step 10 \
    --output results/accuracy_exp_ill-conditioned.csv

Generate Orthogonality Results:

# Matrix inverse - well-conditioned spectrum
cargo run --release --bin orthogonality -- \
    --function inv \
    --scenario well-conditioned \
    --n 10000 \
    --k-min 20 \
    --k-max 500 \
    --k-step 20 \
    --output results/orthogonality_inv_well-conditioned.csv

# Matrix inverse - ill-conditioned spectrum
cargo run --release --bin orthogonality -- \
    --function inv \
    --scenario ill-conditioned \
    --n 10000 \
    --k-min 20 \
    --k-max 500 \
    --k-step 20 \
    --output results/orthogonality_inv_ill-conditioned.csv

# Matrix exponential - well-conditioned spectrum
cargo run --release --bin orthogonality -- \
    --function exp \
    --scenario well-conditioned \
    --n 10000 \
    --k-min 20 \
    --k-max 500 \
    --k-step 20 \
    --output results/orthogonality_exp_well-conditioned.csv

# Matrix exponential - ill-conditioned spectrum
cargo run --release --bin orthogonality -- \
    --function exp \
    --scenario ill-conditioned \
    --n 10000 \
    --k-min 20 \
    --k-max 500 \
    --k-step 20 \
    --output results/orthogonality_exp_ill-conditioned.csv

Generate All Plots:

# Accuracy plots
python3 python/plot_stability.py \
    --input results/accuracy_inv_well-conditioned.csv \
    --output results/images/accuracy_inv_well-conditioned.pdf

python3 python/plot_stability.py \
    --input results/accuracy_inv_ill-conditioned.csv \
    --output results/images/accuracy_inv_ill-conditioned.pdf

python3 python/plot_stability.py \
    --input results/accuracy_exp_well-conditioned.csv \
    --output results/images/accuracy_exp_well-conditioned.pdf

python3 python/plot_stability.py \
    --input results/accuracy_exp_ill-conditioned.csv \
    --output results/images/accuracy_exp_ill-conditioned.pdf

# Orthogonality plots
python3 python/plot_orthogonality.py \
    --input results/orthogonality_inv_well-conditioned.csv \
    --output results/images/orthogonality_inv_well-conditioned.pdf

python3 python/plot_orthogonality.py \
    --input results/orthogonality_inv_ill-conditioned.csv \
    --output results/images/orthogonality_inv_ill-conditioned.pdf

python3 python/plot_orthogonality.py \
    --input results/orthogonality_exp_well-conditioned.csv \
    --output results/images/orthogonality_exp_well-conditioned.pdf

python3 python/plot_orthogonality.py \
    --input results/orthogonality_exp_ill-conditioned.csv \
    --output results/images/orthogonality_exp_ill-conditioned.pdf

About

A Cache Efficient, Low Memory, Lanczos Algorithm written in Rust

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published