Optimize the Default Precision of Model Weights for HuggingFace Runtime

# Proposal: Add Default torch_dtype="auto" to `model_kwargs` in HuggingFace Runtime for Optimized Model Weight Precision

# Description
Currently, the `model_kwargs` field in the `HuggingFaceSettings` class is defined as Optional[dict] with a default value of None. This lack of specific default values may lead to suboptimal configurations, particularly when it comes to the data precision of model weights during loading and inference.

To address this issue, I suggest modifying the HuggingFaceSettings class to include a default value for `torch_dtype` in `model_kwargs`. Specifically, we suggest setting torch_dtype="auto" as the default behavior. This change will allow the `HuggingFace runtime` to automatically select the most appropriate data type for model weights based on the available hardware (e.g., CPU, GPU, or TPU) and model architecture.

# Proposed Changes
I suggest adding a model_validator method to the `HuggingFaceSettings` class to ensure that `model_kwargs` is initialized with a default value of {"torch_dtype": "auto"} if not explicitly provided. 

# Benefits
1. Improved Data Precision: By setting torch_dtype="auto" as the default, the `HuggingFace runtime` can automatically select the optimal data type for model weights based on the hardware and model architecture. This can lead to better performance and memory utilization.
2. Simplified Configuration: Users no longer need to explicitly set torch_dtype in model_kwargs if they want to use the default behavior. This reduces configuration complexity and minimizes the risk of errors.
3. Consistency: Providing a default value ensures that all instances of HuggingFaceSettings have a consistent and predictable configuration, making the code more robust and easier to maintain.

# Potential Drawbacks
1. Compatibility Issues: Some models or use cases may not be compatible with torch_dtype="auto" ? In such cases, users can manually override the default value by explicitly setting `torch_dtype` in `model_kwargs`.
2. Performance Overhead: Automatically selecting the data type may introduce a small performance overhead during model initialization. However, this overhead is expected to be negligible compared to the benefits of improved data precision.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

Proposal: Add Default torch_dtype="auto" to `model_kwargs` in HuggingFace Runtime for Optimized Model Weight Precision

Description

Proposed Changes

Benefits

Potential Drawbacks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

Description

Proposal: Add Default torch_dtype="auto" to model_kwargs in HuggingFace Runtime for Optimized Model Weight Precision

Description

Proposed Changes

Benefits

Potential Drawbacks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Proposal: Add Default torch_dtype="auto" to `model_kwargs` in HuggingFace Runtime for Optimized Model Weight Precision