Note
This package is a Julia port of the Python structure-tensor package by Niels Jeppesen (Skielex). The mathematics and algorithmic structure are kept identical to the original.
A Julia package for fast 3D structure tensor analysis of volumetric image data, with CPU multithreading, GPU (CUDA) acceleration, chunked processing for large volumes, and a file I/O wrapper for neuroimaging formats.
- About
- Installation
- Quick Start
- API Reference
- User Guide
- Correspondence to Python
- Mathematical Background
- Examples
- Contributing
- Citation
- License
The structure tensor is a fundamental tool in image analysis that encodes the local orientation and anisotropy of intensity gradients. It is widely used in:
- Fibre orientation analysis in composite materials and biological tissue
- Tractography and white matter mapping in neuroimaging
- Texture analysis and feature extraction in computer vision
- Flow estimation in fluid dynamics imaging
This package provides a faithful Julia port of the established Python structure-tensor library, with identical mathematics and several Julia-native enhancements:
| Feature | Python (structure-tensor) |
Julia (StructureTensor.jl) |
|---|---|---|
| 3D structure tensor | ✅ NumPy | ✅ Base Arrays |
| GPU acceleration | ✅ CuPy | ✅ CUDA.jl |
| Parallel/chunked | ✅ multiprocessing | ✅ Threads.@threads |
| File I/O wrapper | ❌ | ✅ TIFF, NIfTI, MGZ, Zarr, OME-Zarr, NPY |
| Eigendecomposition | ✅ Cardano's formula | ✅ Cardano's formula (identical) |
using Pkg
Pkg.add(url="https://github.com/Andrew-Keenlyside/StructureTensor.jl")Install these only for the features you need:
# GPU support
Pkg.add("CUDA")
# File I/O (install whichever formats you need)
Pkg.add("NIfTI") # .nii, .nii.gz
Pkg.add("TiffImages") # .tif, .tiff
Pkg.add("FreeSurfer") # .mgz, .mgh
Pkg.add("Zarr") # .zarr, .ome.zarr
Pkg.add("NPZ") # .npy- Julia ≥ 1.9 (for package extensions)
- NVIDIA GPU + CUDA toolkit (only for GPU support)
using StructureTensor
σ = 1.5 # noise scale (inner Gaussian)
ρ = 5.5 # integration scale (outer Gaussian)
# Load or generate a 3D volume
volume = randn(128, 128, 128)
# Compute structure tensor
S = structure_tensor_3d(volume, σ, ρ)
# Eigendecomposition (eigenvalues sorted ascending)
val, vec = eig_special_3d(S)using StructureTensor
using CUDA
volume = randn(128, 128, 128)
# Transfer to GPU and compute
S = structure_tensor_3d(CuArray(volume), 1.5, 5.5)
val, vec = eig_special_3d(S)
# Move results back to CPU
val_cpu = Array(val)
vec_cpu = Array(vec)# Start Julia with threads: julia -t 8
S, val, vec = parallel_structure_tensor_analysis(
volume, 1.5, 5.5;
chunk_size = 200,
verbose = true
)using StructureTensor
using NIfTI # for .nii.gz support
results = process_volume(
"brain.nii.gz", 1.5, 5.5;
output_dir = "results/",
output_format = :nifti_gz,
chunk_size = 200,
full = true,
verbose = true
)Compute the 3D structure tensor of a volume.
Arguments:
volume::AbstractArray{<:Real, 3}— Input 3D volume. Converted toFloat64internally.σ::Real— Noise scale. Standard deviation of the Gaussian for gradient estimation.ρ::Real— Integration scale. Standard deviation of the Gaussian for averaging the outer product.
Keyword Arguments:
truncate::Real = 4.0— Gaussian kernel truncation in units of σ/ρ.
Returns:
S::Array{Float64, 4}— Structure tensor with shape(6, nx, ny, nz). Components:(Sxx, Syy, Szz, Sxy, Sxz, Syz).
Eigendecomposition of 3×3 symmetric structure tensors using Cardano's formula.
Arguments:
S::AbstractArray{<:Real, 4}— Structure tensor(6, nx, ny, nz).
Keyword Arguments:
full::Bool = false— Iftrue, return all three eigenvectors.eigenvalue_order::Symbol = :asc—:asc(smallest first) or:desc(largest first).
Returns:
val::Array{Float64, 4}— Eigenvalues(3, nx, ny, nz).vec— Eigenvectors. Shape(3, nx, ny, nz)iffull=false, or(3, 3, nx, ny, nz)iffull=true.
Compute the structure tensor in overlapping blocks. Reduces peak memory and enables multithreading.
Keyword Arguments:
chunk_size::Int = 128— Block size per dimension (voxels). Values 100–400 work well.verbose::Bool = false— Print progress.
Combined structure tensor + eigendecomposition pipeline with chunked processing.
Keyword Arguments:
chunk_size::Int = 128— Block size.full::Bool = false— All eigenvectors.eigenvalue_order::Symbol = :asc— Eigenvalue ordering.include_S::Bool = true— Iffalse, returnsnothingfor S (saves memory).verbose::Bool = false— Print progress.
When CUDA.jl is loaded, structure_tensor_3d and eig_special_3d dispatch automatically on CuArray inputs:
using StructureTensor, CUDA
S = structure_tensor_3d(cu(volume), σ, ρ) # GPU
val, vec = eig_special_3d(S) # GPU
val_cpu, vec_cpu = Array(val), Array(vec) # → CPUThe GPU implementation uses custom CUDA kernels with one thread per voxel for the eigendecomposition and separable 1D convolution for Gaussian filtering.
End-to-end pipeline: load → compute → save.
Keyword Arguments:
output_dir— Directory for output files (nothingfor in-memory only).output_format::Symbol— One of:nifti,:nifti_gz,:mgz,:zarr,:ome_zarr,:npy.chunk_size— Block size for chunked processing (nothingfor full volume).full::Bool— All eigenvectors.compute_S::Bool— Compute structure tensor.compute_eigen::Bool— Compute eigendecomposition.voxel_size— Tuple(dx, dy, dz)for output headers.prefix::AbstractString— Filename prefix (default"st").
Returns: (volume=..., S=..., val=..., vec=...)
Supported Formats:
| Format | Input | Output | Required Package |
|---|---|---|---|
| TIFF file (.tif/.tiff) | ✅ | — | TiffImages.jl |
| TIFF directory | ✅ | — | TiffImages.jl |
| NIfTI (.nii) | ✅ | ✅ | NIfTI.jl |
| NIfTI (.nii.gz) | ✅ | ✅ | NIfTI.jl |
| FreeSurfer MGZ (.mgz) | ✅ | ✅ | FreeSurfer.jl |
| Zarr (.zarr) | ✅ | ✅ | Zarr.jl |
| OME-Zarr (.ome.zarr) | ✅ | ✅ | Zarr.jl |
| NumPy (.npy) | ✅ | ✅ | NPZ.jl |
Load a 3D volume from file. Format auto-detected from extension.
Save an array to file. Format auto-detected from extension.
The structure tensor at each voxel of a 3D volume V is defined as:
S = Gρ ⊛ (∇σV · ∇σVᵀ)
where:
∇σVis the image gradient after Gaussian smoothing with standard deviationσ⊗denotes the outer productGρis a Gaussian kernel with standard deviationρfor spatial averaging
The resulting 3×3 symmetric tensor at each voxel has eigenvalues λ₁ ≤ λ₂ ≤ λ₃ that encode local structure:
| Pattern | Meaning |
|---|---|
λ₁ ≈ λ₂ ≈ λ₃ ≈ 0 |
Homogeneous region (no gradient) |
λ₁ ≈ λ₂ ≈ 0, λ₃ ≫ 0 |
Planar structure (edge) |
λ₁ ≈ 0, λ₂ ≈ λ₃ ≫ 0 |
Linear structure (fibre, tube) |
λ₁ ≈ λ₂ ≈ λ₃ ≫ 0 |
Isotropic structure (blob, noise) |
The eigenvector corresponding to λ₁ (smallest) points along the dominant linear structure (e.g., fibre direction).
- σ (noise scale): Controls the scale of gradient estimation. Small
σcaptures fine structures; largeσcaptures coarse features. Typical range:0.5–3.0voxels. - ρ (integration scale): Controls the neighbourhood size for averaging. Should be larger than
σ. Typical range:2.0–10.0voxels. - Rule of thumb:
ρ ≈ 2–4 × σworks well for most applications.
For volumes that exceed available memory:
# Chunked processing uses overlapping blocks
S = structure_tensor_3d_chunked(volume, σ, ρ; chunk_size=200)
# Combined pipeline with memory-efficient option
_, val, vec = parallel_structure_tensor_analysis(
volume, σ, ρ;
chunk_size = 200,
include_S = false, # don't keep S in memory
verbose = true
)Block size guidelines:
100–200: Conservative, lower memory usage200–400: Good balance of speed and memory- Reduce if you encounter out-of-memory errors
GPU processing is beneficial for:
- Volumes larger than ~64³
- Batch processing of many volumes
- When GPU memory is sufficient (the full volume + intermediates must fit)
using CUDA
# Check available GPU memory
println(CUDA.available_memory() / 1e9, " GB available")
# For volumes that don't fit on GPU, use chunked CPU processing insteadCPU parallel processing uses Julia's built-in threading:
# Start Julia with 8 threads
julia -t 8
# Or set the environment variable
export JULIA_NUM_THREADS=8# Check thread count
println("Using $(Threads.nthreads()) threads")
# Chunked processing automatically uses all available threads
S = structure_tensor_3d_chunked(volume, σ, ρ; chunk_size=128)The eigendecomposition in eig_special_3d is also multithreaded via Threads.@threads.
Install only the I/O packages you need:
using Pkg
# NIfTI (most common neuroimaging format)
Pkg.add("NIfTI")
# TIFF (microscopy data)
Pkg.add("TiffImages")
# FreeSurfer MGZ
Pkg.add("FreeSurfer")
# Zarr / OME-Zarr (cloud-optimised arrays)
Pkg.add("Zarr")
# NumPy arrays
Pkg.add("NPZ")This package is a faithful port of structure-tensor. The mapping is:
| Python | Julia |
|---|---|
from structure_tensor import structure_tensor_3d |
using StructureTensor |
structure_tensor_3d(volume, sigma, rho) |
structure_tensor_3d(volume, σ, ρ) |
eig_special_3d(S) |
eig_special_3d(S) |
eig_special_3d(S, full=True) |
eig_special_3d(S; full=true) |
from structure_tensor.cp import ... |
using CUDA; structure_tensor_3d(cu(volume), ...) |
parallel_structure_tensor_analysis(data, σ, ρ, devices=['cpu'], block_size=200) |
parallel_structure_tensor_analysis(volume, σ, ρ; chunk_size=200) |
Key differences:
- Array ordering: Julia uses column-major (Fortran) order; NumPy uses row-major (C) order. The structure tensor components are stored in the same order
(Sxx, Syy, Szz, Sxy, Sxz, Syz)with dimensions corresponding to Julia's native indexing. - GPU backend: Uses CUDA.jl (native Julia) instead of CuPy.
- Parallelism: Uses Julia threads instead of Python multiprocessing. No
devicesparameter — threads handle CPU;CuArraydispatch handles GPU. - I/O: Additional
process_volumewrapper not present in the Python package.
For a 3D scalar field V(x,y,z), the structure tensor is computed as:
-
Gradient estimation: Compute partial derivatives using Gaussian derivative filters:
∂V/∂xᵢ = (∂Gσ/∂xᵢ) ⊛ Vwhere
Gσis a Gaussian with standard deviationσ. -
Outer product: Form the 3×3 symmetric matrix at each voxel:
T = ∇V · ∇Vᵀ -
Integration: Smooth each component with a Gaussian
Gρ:S = Gρ ⊛ T
For a real 3×3 symmetric matrix A with eigenvalues λ₁ ≤ λ₂ ≤ λ₃:
- Compute
m = tr(A)/3and shift:K = A - mI - Compute
p = ‖K‖²_F / 6andq = det(K) / 2 - Compute angle
φ = acos(clamp(q/p^{3/2}, -1, 1)) / 3 - Eigenvalues:
λᵢ = m + 2√p · cos(φ - 2πi/3)fori = 0, 1, 2
Eigenvectors are computed via cross products of rows of (A - λᵢI), selecting the pair with the largest cross-product magnitude for numerical stability.
See the examples/ directory:
basic_usage.jl— Core API demonstrationsgpu_usage.jl— GPU acceleration with CUDA.jlio_pipeline.jl— File format I/O pipeline
Contributions are welcome! Please open an issue or pull request.
If you use this package in academic work, please cite both the original Python implementation and this Julia port:
Jeppesen, N., et al. "Quantifying effects of manufacturing methods on fiber orientation in unidirectional composites using structure tensor analysis." Composites Part A: Applied Science and Manufacturing 149 (2021): 106541.
@article{JEPPESEN2021106541,
title = {Quantifying effects of manufacturing methods on fiber orientation
in unidirectional composites using structure tensor analysis},
journal = {Composites Part A: Applied Science and Manufacturing},
volume = {149},
pages = {106541},
year = {2021},
doi = {https://doi.org/10.1016/j.compositesa.2021.106541},
author = {Jeppesen, N. and Dahl, V.A. and Christensen, A.N. and
Dahl, A.B. and Mikkelsen, L.P.}
}Keenlyside, A. "StructureTensor.jl: Structure tensor analysis for 3D volumetric data in Julia." (2025). https://github.com/Andrew-Keenlyside/StructureTensor.jl
MIT License — see the Python package's LICENSE for the original.
