Skip to content

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

Open
@fyuan1316

Description

Proposal: Add Default torch_dtype="auto" to model_kwargs in HuggingFace Runtime for Optimized Model Weight Precision

Description

Currently, the model_kwargs field in the HuggingFaceSettings class is defined as Optional[dict] with a default value of None. This lack of specific default values may lead to suboptimal configurations, particularly when it comes to the data precision of model weights during loading and inference.

To address this issue, I suggest modifying the HuggingFaceSettings class to include a default value for torch_dtype in model_kwargs. Specifically, we suggest setting torch_dtype="auto" as the default behavior. This change will allow the HuggingFace runtime to automatically select the most appropriate data type for model weights based on the available hardware (e.g., CPU, GPU, or TPU) and model architecture.

Proposed Changes

I suggest adding a model_validator method to the HuggingFaceSettings class to ensure that model_kwargs is initialized with a default value of {"torch_dtype": "auto"} if not explicitly provided.

Benefits

  1. Improved Data Precision: By setting torch_dtype="auto" as the default, the HuggingFace runtime can automatically select the optimal data type for model weights based on the hardware and model architecture. This can lead to better performance and memory utilization.
  2. Simplified Configuration: Users no longer need to explicitly set torch_dtype in model_kwargs if they want to use the default behavior. This reduces configuration complexity and minimizes the risk of errors.
  3. Consistency: Providing a default value ensures that all instances of HuggingFaceSettings have a consistent and predictable configuration, making the code more robust and easier to maintain.

Potential Drawbacks

  1. Compatibility Issues: Some models or use cases may not be compatible with torch_dtype="auto" ? In such cases, users can manually override the default value by explicitly setting torch_dtype in model_kwargs.
  2. Performance Overhead: Automatically selecting the data type may introduce a small performance overhead during model initialization. However, this overhead is expected to be negligible compared to the benefits of improved data precision.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions