GPU-accelerated AI-driven optimization of diagonal parameters in linear ODE systems using adjoint Krylov methods.
This package provides an efficient framework for optimizing diagonal parameters of large-scale linear ordinary differential equations of the form:
where:
-
$A₀ \in \mathbb{R}^{n \times n}$ is a fixed base operator, -
$θ \in \mathbb{R}^n$ are trainable diagonal parameters, -
$d(θ)$ is a monotone parametrization enforcing stability, -
$f \in \mathbb{R}^{n \times s}$ are constant forcing vectors (many right-hand sides).
The implementation combines:
- GPU-resident batched Krylov / Arnoldi projections,
- CPU small-matrix Schur decompositions for stable matrix functions,
- Adjoint sensitivity analysis for gradient computation,
- First-order optimizers (Adam) for training diagonal parameters.
The code is designed for research, numerical experimentation, and HPC-oriented workflows.
- Fully GPU-based batched Arnoldi / Krylov projections.
- Stable evaluation of
exp(tH)andt·φ₁(tH)via real Schur decomposition. - Adjoint-based gradients with respect to diagonal parameters.
- Modular diagonal parametrizations (exp, softplus, or custom).
- Support for many trajectories / right-hand sides in parallel.
- Pluggable optimization layer (Adam provided, others easy to add).
- Automatic masking of numerically inactive trajectories.
This package depends on:
Batched_Krylov_GPU
https://github.com/Micik24/Batched_Krylov_GPU
Batched_Krylov_GPU provides the GPU implementation of the batched Arnoldi method used to construct Krylov bases and Hessenberg matrices.
This package does not duplicate that functionality and requires it as a dependency.
- Julia ≥ 1.10
- CUDA-capable GPU with CUDA.jl properly configured
- Git
git clone https://github.com/<your-username>/DiagonalODEOpt.jl.git
cd DiagonalODEOpt.jljulia
]
activate .
add https://github.com/Micik24/Batched_Krylov_GPU
instantiate
precompile
using DiagonalODEOpt
using CUDAWe consider a linear ODE system:
with constant forcing
The exact solution at time
where:
The action of these operators is approximated using Krylov subspaces constructed via batched Arnoldi iterations on GPU.
A generic loss
The gradient with respect to diagonal parameters is computed via an adjoint method:
where:
-
$Λ(t)$ is the adjoint solution, -
$X(t)$ is the forward forcing contribution, -
$\odot$ denotes elementwise multiplication.
Time integration is performed using user-provided quadrature weights.
The mapping from parameters
Two parametrizations are provided:
Custom parametrizations can be added by implementing:
diag_from_theta(param, θ)
dtheta_from_ddiag(param, θ)The package includes a GPU-safe implementation of Adam, used to optimize
The optimization loop is intentionally decoupled from gradient computation:
step_grad_theta!computes loss and gradient,- optimizers update parameters externally.
This design allows easy experimentation with alternative optimizers.
using DiagonalODEOpt
using CUDA
CUDA.allowscalar(false)
n = 256 # state dimension
S = 64 # number of forcing vectors
A0 = CUDA.randn(Float64, n, n) .* 0.01
@views A0[diagind(A0)] .= 0.0
F = CUDA.randn(Float64, n, S)
labels = rand(1:n, S)
θ = CUDA.zeros(Float64, n)
param = SoftplusParam()
θ, history = optimize_diagonal_adam!(
A0, θ, F, labels, param;
T = 2.0,
Q = 32,
epochs = 20,
lr = 1e-3,
k = 30,
β_soft = 10.0,
batchsize = 32,
clamp_theta = (-50.0, 50.0),
callback = (ep, loss, θ) ->
println("epoch $ep | loss = $(round(loss, sigdigits=6))")
)- Krylov projections are performed entirely on GPU.
- Small Hessenberg matrices are processed on CPU using real Schur decomposition.
-
$φ₁$ is evaluated via stable block back-substitution, not explicit inversion. - Numerically inactive trajectories are automatically masked.
- All computations are performed in Float64 for stability.
src/
├── DiagonalODEOpt.jl # package entry point
├── Parametrization.jl # θ → diagonal mapping
├── AdjointStep.jl # adjoint Krylov gradient
├── SchurCPU.jl # small-matrix CPU numerics
├── ReconstructionGPU.jl # batched GPU reconstruction
├── Training.jl # optimization driver
└── Optimizers/
└── Adam.jl # Adam optimizer
examples/
└── synthetic_demo.jl
- Only diagonal parametrizations are supported.
- Only Float64 precision is currently supported.
- The package is not registered in the Julia General registry.
- Intended primarily for research and advanced numerical experimentation.
MIT
Maciej Wajda
maciejwajda01@gmail.com