13. Precision Policy Configuration

Precision Policy Configuration

Introduction

The precision policy system in the Oxide-Lab repository provides a centralized mechanism for managing data type selection during model loading and inference. This system ensures consistent memory usage and performance across different hardware platforms by allowing users to select from predefined precision policies that optimize for compatibility, memory efficiency, or maximum precision. The implementation is designed to work seamlessly with both CPU and GPU devices, automatically selecting appropriate data types based on the target hardware and user preferences.

Section sources

src-tauri/src/core/precision.rs

Precision Policy Overview

The precision policy system is implemented as a unified configuration framework that determines the appropriate data types (dtype) for model weights based on the target device and user preferences. This system is centralized in the precision.rs module and integrates with the model loading pipeline to ensure consistent behavior across different models and hardware configurations.

The core components of the precision policy system include:

PrecisionPolicy enum: Defines the available policy options that users can select
PrecisionConfig struct: Stores the configuration for CPU and GPU data types
dtype selection functions: Determine the appropriate data type based on device and policy

The system follows a hierarchical approach where policies map to specific configurations, which in turn determine the data types used during model loading. This abstraction allows the application to maintain a simple user interface while providing sophisticated control over precision settings.

``mermaid flowchart TD A[Precision Policy Selection] --> B{Policy Type} B --> C[Default] B --> D[MemoryEfficient] B --> E[MaximumPrecision] C --> F[CPU: F32, GPU: BF16] D --> G[CPU: F32, GPU: F16] E --> H[CPU: F32, GPU: F32] F --> I[Model Loading] G --> I H --> I I --> J[Inference Execution]


**Diagram sources**
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L15-L35)
- [src-tauri/src/core/weights.rs](file://src-tauri/src/core/weights.rs#L156-L186)

**Section sources**
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L1-L50)

## Available Precision Policies
The system provides three distinct precision policies to accommodate different use cases and hardware constraints:

### Default Policy
The **Default** policy is designed to balance performance and compatibility across different hardware platforms:
- **CPU devices**: Uses F32 (32-bit floating point) for maximum compatibility
- **GPU devices**: Uses BF16 (Brain Floating Point 16-bit) for better performance while maintaining reasonable precision

This policy is recommended for most users as it provides good performance on GPU devices while ensuring compatibility with CPU-only systems.

### MemoryEfficient Policy
The **MemoryEfficient** policy prioritizes reduced memory usage, making it ideal for systems with limited VRAM:
- **CPU devices**: Uses F32 (no change from default)
- **GPU devices**: Uses F16 (16-bit floating point) which consumes less memory than BF16

This policy is particularly useful when running large models on GPUs with limited memory capacity, though it may result in slightly reduced numerical precision.

### MaximumPrecision Policy
The **MaximumPrecision** policy prioritizes numerical accuracy over performance and memory efficiency:
- **CPU devices**: Uses F32
- **GPU devices**: Uses F32 (32-bit floating point)

This policy ensures the highest possible precision for both training and inference operations, making it suitable for applications where numerical accuracy is critical, though it comes at the cost of increased memory usage and potentially slower performance.

``mermaid
classDiagram
class PrecisionPolicy {
+Default
+MemoryEfficient
+MaximumPrecision
}
class PrecisionConfig {
+cpu_dtype : DType
+gpu_dtype : DType
+allow_override : bool
+default() PrecisionConfig
+memory_efficient() PrecisionConfig
+maximum_precision() PrecisionConfig
}
PrecisionPolicy --> PrecisionConfig : "maps to"

Diagram sources

src-tauri/src/core/precision.rs

Section sources

src-tauri/src/core/precision.rs

Policy Configuration and Implementation

The precision policy system is implemented through a combination of Rust enums, structs, and utility functions that provide a clean API for policy selection and dtype determination.

Core Data Structures

The system defines two primary data structures:

PrecisionPolicy enum: Represents the user-selectable policies

pub enum PrecisionPolicy {
    Default,
    MemoryEfficient,
    MaximumPrecision,
}

PrecisionConfig struct: Contains the actual dtype configuration

pub struct PrecisionConfig {
    pub cpu_dtype: DType,
    pub gpu_dtype: DType,
    pub allow_override: bool,
}

Configuration Methods

The PrecisionConfig struct provides factory methods for creating configurations corresponding to each policy:

default(): Creates the default configuration (CPU=F32, GPU=BF16)
memory_efficient(): Creates a memory-efficient configuration (CPU=F32, GPU=F16)
maximum_precision(): Creates a maximum precision configuration (CPU=F32, GPU=F32)

Dtype Selection Logic

The system uses two primary functions for dtype selection:

select_dtype: Selects dtype based on device and configuration

pub fn select_dtype(device: &Device, config: &PrecisionConfig) -> DType {
    match device {
        Device::Cpu => config.cpu_dtype,
        Device::Cuda(_) | Device::Metal(_) => config.gpu_dtype,
    }
}

select_dtype_by_policy: Selects dtype based on device and policy

pub fn select_dtype_by_policy(device: &Device, policy: &PrecisionPolicy) -> DType {
    let config = policy_to_config(policy);
    select_dtype(device, &config)
}

The selection logic follows a clear hierarchy: device type determines whether CPU or GPU settings are used, while the policy determines the specific dtype values within those categories.

``mermaid sequenceDiagram participant User as "User Interface" participant State as "ModelState" participant Policy as "Precision Policy" participant Config as "PrecisionConfig" participant Dtype as "dtype Selection" participant Loader as "Model Loader" User->>State : Select Policy State->>Policy : Store PrecisionPolicy Policy->>Config : Convert to PrecisionConfig Config->>Dtype : Provide cpu_dtype/gpu_dtype Dtype->>Loader : Return appropriate dtype Loader->>Loader : Use dtype for model loading


**Diagram sources**
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L77-L90)
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L145-L155)

**Section sources**
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L45-L155)

## Integration with Model Loading
The precision policy system is tightly integrated with the model loading pipeline, ensuring that the selected policy is applied consistently when loading model weights.

### State Management
The `ModelState` struct in `state.rs` includes a `precision_policy` field that stores the current policy selection:

rust pub struct ModelState { // ... other fields pub(crate) precision_policy: PrecisionPolicy, }


The default state initializes with the `Default` policy:

rust impl ModelState { pub fn new(device: Device) -> Self { Self { // ... other fields precision_policy: PrecisionPolicy::Default, } } }


### Model Loading Process
When loading models from safetensors files, the system uses the precision policy to determine the appropriate dtype:

1. The `load_local_safetensors_model` and `load_hub_safetensors_model` functions access the precision policy from the model state
2. They pass this policy to `build_varbuilder_with_precision` 
3. The weights module uses `select_dtype_by_policy` to determine the correct dtype
4. The dtype is applied when creating the VarBuilder for model initialization

Key integration points:

rust // In model loading functions let dtype = build_varbuilder_with_precision( &filenames, &dev, Some(&guard.precision_policy) ) .map_err(|e| format!("Failed to determine dtype: {}", e))? .dtype();


This integration ensures that the user's precision policy selection directly affects how model weights are loaded into memory.

``mermaid
flowchart TD
A[User selects policy] --> B[Store in ModelState]
B --> C[Model loading begins]
C --> D{Is config available?}
D --> |Yes| E[Detect architecture]
E --> F[Get dtype from policy]
F --> G[Create VarBuilder with dtype]
G --> H[Initialize model]
D --> |No| I[Use alternative loading]
I --> H
H --> J[Model ready for inference]

Diagram sources

src-tauri/src/core/state.rs
src-tauri/src/api/model_loading/safetensors.rs

Section sources

src-tauri/src/core/state.rs
src-tauri/src/api/model_loading/safetensors.rs

Practical Usage Examples

The precision policy system can be applied in various scenarios depending on hardware capabilities and application requirements.

Example 1: Default Configuration for General Use

For most users with modern GPUs, the default policy provides the best balance of performance and compatibility:

// This is automatically applied when no specific policy is selected
let policy = PrecisionPolicy::Default;
// Results in: CPU=F32, GPU=BF16

Use case: General inference tasks on systems with dedicated GPUs where you want optimal performance without sacrificing too much precision.

Example 2: Memory-Constrained GPU Systems

On systems with limited VRAM (e.g., 8GB or less), the memory-efficient policy allows loading larger models:

let policy = PrecisionPolicy::MemoryEfficient;
// Results in: CPU=F32, GPU=F16

Use case: Running large language models on consumer GPUs where memory is the limiting factor rather than computational precision.

Example 3: Scientific Computing and Research

For applications requiring maximum numerical accuracy:

let policy = PrecisionPolicy::MaximumPrecision;
// Results in: CPU=F32, GPU=F32

Use case: Scientific simulations, financial modeling, or research applications where numerical precision is critical and performance is secondary.

Example 4: CPU-Only Systems

On systems without dedicated GPUs, the policy system still ensures consistent behavior:

// Regardless of policy, CPU always uses F32
let device = Device::Cpu;
let dtype = select_dtype_by_policy(&device, &policy);
// Always returns DType::F32

Use case: Running models on CPU-only servers or older hardware where GPU acceleration is not available.

Integration with Frontend

The TypeScript definition shows how the policy is exposed to the frontend:

export type PrecisionPolicy = 
  | { Default: null }
  | { MemoryEfficient: null }
  | { MaximumPrecision: null };

This allows the user interface to present these options to users in a type-safe manner.

Section sources

src-tauri/src/core/precision.rs
src/lib/types.ts

Troubleshooting Precision Issues

When encountering issues related to precision policies, consider the following common problems and solutions:

Issue 1: Model Fails to Load on GPU

Symptoms: Out of memory errors when loading large models on GPU.

Solution: Switch to the MemoryEfficient policy to reduce memory usage:

guard.precision_policy = PrecisionPolicy::MemoryEfficient;

This changes the GPU dtype from BF16 to F16, reducing memory consumption by approximately 25%.

Issue 2: Numerical Instability in Results

Symptoms: Inconsistent or unstable outputs, particularly in mathematical operations.

Solution: Switch to the MaximumPrecision policy to ensure higher numerical accuracy:

guard.precision_policy = PrecisionPolicy::MaximumPrecision;

This ensures all operations use 32-bit floating point precision on both CPU and GPU.

Issue 3: Performance Issues on GPU

Symptoms: Slower than expected inference times on GPU.

Solution: Ensure you're not using MaximumPrecision policy unless necessary. The default policy (BF16 on GPU) typically provides the best performance for inference tasks.

Issue 4: Type Mismatch Errors

Symptoms: Errors indicating dtype mismatches between model components.

Solution: Ensure consistency by using the centralized precision policy system rather than manually specifying dtypes. The policy system ensures all components use compatible types.

Diagnostic Tips

Check the current policy setting in the model state
Verify the detected device type (CPU vs GPU)
Monitor memory usage during model loading
Check the actual dtype being used by examining the VarBuilder configuration

The system's modular design makes it easy to modify precision settings without changing the core model loading logic.

Section sources

src-tauri/src/core/precision.rs
src-tauri/src/core/weights.rs

Conclusion

The precision policy system in Oxide-Lab provides a flexible and user-friendly approach to managing data type selection for machine learning models. By offering three distinct policies—Default, MemoryEfficient, and MaximumPrecision—the system accommodates a wide range of hardware configurations and use cases. The centralized implementation ensures consistent behavior across different models and loading scenarios, while the integration with the model state allows for dynamic policy selection.

Key advantages of this approach include:

Simplicity: Users can select from intuitive policy options rather than dealing with low-level dtype settings
Consistency: The same policy applies across all model loading operations
Flexibility: Easy to extend with additional policies if needed
Performance optimization: Automatic selection of optimal dtypes for different hardware

The system demonstrates a well-designed balance between user accessibility and technical control, making it suitable for both novice users who want sensible defaults and advanced users who need precise control over precision settings.

Referenced Files in This Document

src-tauri/src/core/precision.rs
src-tauri/src/core/weights.rs
src-tauri/src/api/model_loading/safetensors.rs
src-tauri/src/core/state.rs
src/lib/types.ts
docs/WEIGHTS_LOADER.md

13. Precision Policy Configuration

Precision Policy Configuration

Table of Contents

Introduction

Precision Policy Overview

Policy Configuration and Implementation

Core Data Structures

Configuration Methods

Dtype Selection Logic

Practical Usage Examples

Example 1: Default Configuration for General Use

Example 2: Memory-Constrained GPU Systems

Example 3: Scientific Computing and Research

Example 4: CPU-Only Systems

Integration with Frontend

Troubleshooting Precision Issues

Issue 1: Model Fails to Load on GPU

Issue 2: Numerical Instability in Results

Issue 3: Performance Issues on GPU

Issue 4: Type Mismatch Errors

Diagnostic Tips

Conclusion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally