Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046
Description
Proposal: Add Default torch_dtype="auto" to model_kwargs
in HuggingFace Runtime for Optimized Model Weight Precision
Description
Currently, the model_kwargs
field in the HuggingFaceSettings
class is defined as Optional[dict] with a default value of None. This lack of specific default values may lead to suboptimal configurations, particularly when it comes to the data precision of model weights during loading and inference.
To address this issue, I suggest modifying the HuggingFaceSettings class to include a default value for torch_dtype
in model_kwargs
. Specifically, we suggest setting torch_dtype="auto" as the default behavior. This change will allow the HuggingFace runtime
to automatically select the most appropriate data type for model weights based on the available hardware (e.g., CPU, GPU, or TPU) and model architecture.
Proposed Changes
I suggest adding a model_validator method to the HuggingFaceSettings
class to ensure that model_kwargs
is initialized with a default value of {"torch_dtype": "auto"} if not explicitly provided.
Benefits
- Improved Data Precision: By setting torch_dtype="auto" as the default, the
HuggingFace runtime
can automatically select the optimal data type for model weights based on the hardware and model architecture. This can lead to better performance and memory utilization. - Simplified Configuration: Users no longer need to explicitly set torch_dtype in model_kwargs if they want to use the default behavior. This reduces configuration complexity and minimizes the risk of errors.
- Consistency: Providing a default value ensures that all instances of HuggingFaceSettings have a consistent and predictable configuration, making the code more robust and easier to maintain.
Potential Drawbacks
- Compatibility Issues: Some models or use cases may not be compatible with torch_dtype="auto" ? In such cases, users can manually override the default value by explicitly setting
torch_dtype
inmodel_kwargs
. - Performance Overhead: Automatically selecting the data type may introduce a small performance overhead during model initialization. However, this overhead is expected to be negligible compared to the benefits of improved data precision.