-
Notifications
You must be signed in to change notification settings - Fork 3
13. Precision Policy Configuration
- Introduction
- Precision Policy Overview
- Available Precision Policies
- Policy Configuration and Implementation
- Integration with Model Loading
- Practical Usage Examples
- Troubleshooting Precision Issues
- Conclusion
The precision policy system in the Oxide-Lab repository provides a centralized mechanism for managing data type selection during model loading and inference. This system ensures consistent memory usage and performance across different hardware platforms by allowing users to select from predefined precision policies that optimize for compatibility, memory efficiency, or maximum precision. The implementation is designed to work seamlessly with both CPU and GPU devices, automatically selecting appropriate data types based on the target hardware and user preferences.
Section sources
- src-tauri/src/core/precision.rs
The precision policy system is implemented as a unified configuration framework that determines the appropriate data types (dtype) for model weights based on the target device and user preferences. This system is centralized in the precision.rs module and integrates with the model loading pipeline to ensure consistent behavior across different models and hardware configurations.
The core components of the precision policy system include:
- PrecisionPolicy enum: Defines the available policy options that users can select
- PrecisionConfig struct: Stores the configuration for CPU and GPU data types
- dtype selection functions: Determine the appropriate data type based on device and policy
The system follows a hierarchical approach where policies map to specific configurations, which in turn determine the data types used during model loading. This abstraction allows the application to maintain a simple user interface while providing sophisticated control over precision settings.
``mermaid flowchart TD A[Precision Policy Selection] --> B{Policy Type} B --> C[Default] B --> D[MemoryEfficient] B --> E[MaximumPrecision] C --> F[CPU: F32, GPU: BF16] D --> G[CPU: F32, GPU: F16] E --> H[CPU: F32, GPU: F32] F --> I[Model Loading] G --> I H --> I I --> J[Inference Execution]
**Diagram sources**
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L15-L35)
- [src-tauri/src/core/weights.rs](file://src-tauri/src/core/weights.rs#L156-L186)
**Section sources**
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L1-L50)
## Available Precision Policies
The system provides three distinct precision policies to accommodate different use cases and hardware constraints:
### Default Policy
The **Default** policy is designed to balance performance and compatibility across different hardware platforms:
- **CPU devices**: Uses F32 (32-bit floating point) for maximum compatibility
- **GPU devices**: Uses BF16 (Brain Floating Point 16-bit) for better performance while maintaining reasonable precision
This policy is recommended for most users as it provides good performance on GPU devices while ensuring compatibility with CPU-only systems.
### MemoryEfficient Policy
The **MemoryEfficient** policy prioritizes reduced memory usage, making it ideal for systems with limited VRAM:
- **CPU devices**: Uses F32 (no change from default)
- **GPU devices**: Uses F16 (16-bit floating point) which consumes less memory than BF16
This policy is particularly useful when running large models on GPUs with limited memory capacity, though it may result in slightly reduced numerical precision.
### MaximumPrecision Policy
The **MaximumPrecision** policy prioritizes numerical accuracy over performance and memory efficiency:
- **CPU devices**: Uses F32
- **GPU devices**: Uses F32 (32-bit floating point)
This policy ensures the highest possible precision for both training and inference operations, making it suitable for applications where numerical accuracy is critical, though it comes at the cost of increased memory usage and potentially slower performance.
``mermaid
classDiagram
class PrecisionPolicy {
+Default
+MemoryEfficient
+MaximumPrecision
}
class PrecisionConfig {
+cpu_dtype : DType
+gpu_dtype : DType
+allow_override : bool
+default() PrecisionConfig
+memory_efficient() PrecisionConfig
+maximum_precision() PrecisionConfig
}
PrecisionPolicy --> PrecisionConfig : "maps to"
Diagram sources
- src-tauri/src/core/precision.rs
Section sources
- src-tauri/src/core/precision.rs
The precision policy system is implemented through a combination of Rust enums, structs, and utility functions that provide a clean API for policy selection and dtype determination.
The system defines two primary data structures:
PrecisionPolicy enum: Represents the user-selectable policies
pub enum PrecisionPolicy {
Default,
MemoryEfficient,
MaximumPrecision,
}PrecisionConfig struct: Contains the actual dtype configuration
pub struct PrecisionConfig {
pub cpu_dtype: DType,
pub gpu_dtype: DType,
pub allow_override: bool,
}The PrecisionConfig struct provides factory methods for creating configurations corresponding to each policy:
-
default(): Creates the default configuration (CPU=F32, GPU=BF16) -
memory_efficient(): Creates a memory-efficient configuration (CPU=F32, GPU=F16) -
maximum_precision(): Creates a maximum precision configuration (CPU=F32, GPU=F32)
The system uses two primary functions for dtype selection:
select_dtype: Selects dtype based on device and configuration
pub fn select_dtype(device: &Device, config: &PrecisionConfig) -> DType {
match device {
Device::Cpu => config.cpu_dtype,
Device::Cuda(_) | Device::Metal(_) => config.gpu_dtype,
}
}select_dtype_by_policy: Selects dtype based on device and policy
pub fn select_dtype_by_policy(device: &Device, policy: &PrecisionPolicy) -> DType {
let config = policy_to_config(policy);
select_dtype(device, &config)
}The selection logic follows a clear hierarchy: device type determines whether CPU or GPU settings are used, while the policy determines the specific dtype values within those categories.
``mermaid sequenceDiagram participant User as "User Interface" participant State as "ModelState" participant Policy as "Precision Policy" participant Config as "PrecisionConfig" participant Dtype as "dtype Selection" participant Loader as "Model Loader" User->>State : Select Policy State->>Policy : Store PrecisionPolicy Policy->>Config : Convert to PrecisionConfig Config->>Dtype : Provide cpu_dtype/gpu_dtype Dtype->>Loader : Return appropriate dtype Loader->>Loader : Use dtype for model loading
**Diagram sources**
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L77-L90)
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L145-L155)
**Section sources**
- [src-tauri/src/core/precision.rs](file://src-tauri/src/core/precision.rs#L45-L155)
## Integration with Model Loading
The precision policy system is tightly integrated with the model loading pipeline, ensuring that the selected policy is applied consistently when loading model weights.
### State Management
The `ModelState` struct in `state.rs` includes a `precision_policy` field that stores the current policy selection:
rust pub struct ModelState { // ... other fields pub(crate) precision_policy: PrecisionPolicy, }
The default state initializes with the `Default` policy:
rust impl ModelState { pub fn new(device: Device) -> Self { Self { // ... other fields precision_policy: PrecisionPolicy::Default, } } }
### Model Loading Process
When loading models from safetensors files, the system uses the precision policy to determine the appropriate dtype:
1. The `load_local_safetensors_model` and `load_hub_safetensors_model` functions access the precision policy from the model state
2. They pass this policy to `build_varbuilder_with_precision`
3. The weights module uses `select_dtype_by_policy` to determine the correct dtype
4. The dtype is applied when creating the VarBuilder for model initialization
Key integration points:
rust // In model loading functions let dtype = build_varbuilder_with_precision( &filenames, &dev, Some(&guard.precision_policy) ) .map_err(|e| format!("Failed to determine dtype: {}", e))? .dtype();
This integration ensures that the user's precision policy selection directly affects how model weights are loaded into memory.
``mermaid
flowchart TD
A[User selects policy] --> B[Store in ModelState]
B --> C[Model loading begins]
C --> D{Is config available?}
D --> |Yes| E[Detect architecture]
E --> F[Get dtype from policy]
F --> G[Create VarBuilder with dtype]
G --> H[Initialize model]
D --> |No| I[Use alternative loading]
I --> H
H --> J[Model ready for inference]
Diagram sources
- src-tauri/src/core/state.rs
- src-tauri/src/api/model_loading/safetensors.rs
Section sources
- src-tauri/src/core/state.rs
- src-tauri/src/api/model_loading/safetensors.rs
The precision policy system can be applied in various scenarios depending on hardware capabilities and application requirements.
For most users with modern GPUs, the default policy provides the best balance of performance and compatibility:
// This is automatically applied when no specific policy is selected
let policy = PrecisionPolicy::Default;
// Results in: CPU=F32, GPU=BF16Use case: General inference tasks on systems with dedicated GPUs where you want optimal performance without sacrificing too much precision.
On systems with limited VRAM (e.g., 8GB or less), the memory-efficient policy allows loading larger models:
let policy = PrecisionPolicy::MemoryEfficient;
// Results in: CPU=F32, GPU=F16Use case: Running large language models on consumer GPUs where memory is the limiting factor rather than computational precision.
For applications requiring maximum numerical accuracy:
let policy = PrecisionPolicy::MaximumPrecision;
// Results in: CPU=F32, GPU=F32Use case: Scientific simulations, financial modeling, or research applications where numerical precision is critical and performance is secondary.
On systems without dedicated GPUs, the policy system still ensures consistent behavior:
// Regardless of policy, CPU always uses F32
let device = Device::Cpu;
let dtype = select_dtype_by_policy(&device, &policy);
// Always returns DType::F32Use case: Running models on CPU-only servers or older hardware where GPU acceleration is not available.
The TypeScript definition shows how the policy is exposed to the frontend:
export type PrecisionPolicy =
| { Default: null }
| { MemoryEfficient: null }
| { MaximumPrecision: null };This allows the user interface to present these options to users in a type-safe manner.
Section sources
- src-tauri/src/core/precision.rs
- src/lib/types.ts
When encountering issues related to precision policies, consider the following common problems and solutions:
Symptoms: Out of memory errors when loading large models on GPU.
Solution: Switch to the MemoryEfficient policy to reduce memory usage:
guard.precision_policy = PrecisionPolicy::MemoryEfficient;This changes the GPU dtype from BF16 to F16, reducing memory consumption by approximately 25%.
Symptoms: Inconsistent or unstable outputs, particularly in mathematical operations.
Solution: Switch to the MaximumPrecision policy to ensure higher numerical accuracy:
guard.precision_policy = PrecisionPolicy::MaximumPrecision;This ensures all operations use 32-bit floating point precision on both CPU and GPU.
Symptoms: Slower than expected inference times on GPU.
Solution: Ensure you're not using MaximumPrecision policy unless necessary. The default policy (BF16 on GPU) typically provides the best performance for inference tasks.
Symptoms: Errors indicating dtype mismatches between model components.
Solution: Ensure consistency by using the centralized precision policy system rather than manually specifying dtypes. The policy system ensures all components use compatible types.
- Check the current policy setting in the model state
- Verify the detected device type (CPU vs GPU)
- Monitor memory usage during model loading
- Check the actual dtype being used by examining the VarBuilder configuration
The system's modular design makes it easy to modify precision settings without changing the core model loading logic.
Section sources
- src-tauri/src/core/precision.rs
- src-tauri/src/core/weights.rs
The precision policy system in Oxide-Lab provides a flexible and user-friendly approach to managing data type selection for machine learning models. By offering three distinct policies—Default, MemoryEfficient, and MaximumPrecision—the system accommodates a wide range of hardware configurations and use cases. The centralized implementation ensures consistent behavior across different models and loading scenarios, while the integration with the model state allows for dynamic policy selection.
Key advantages of this approach include:
- Simplicity: Users can select from intuitive policy options rather than dealing with low-level dtype settings
- Consistency: The same policy applies across all model loading operations
- Flexibility: Easy to extend with additional policies if needed
- Performance optimization: Automatic selection of optimal dtypes for different hardware
The system demonstrates a well-designed balance between user accessibility and technical control, making it suitable for both novice users who want sensible defaults and advanced users who need precise control over precision settings.
Referenced Files in This Document
- src-tauri/src/core/precision.rs
- src-tauri/src/core/weights.rs
- src-tauri/src/api/model_loading/safetensors.rs
- src-tauri/src/core/state.rs
- src/lib/types.ts
- docs/WEIGHTS_LOADER.md