Skip to content

Feature Request: Allow "Best-Effort" Optimization for Custom Models via ipex.llm.optimize on XPUs #807

Open
@unrahul

Description

@unrahul

Describe the issue

Feature Request: Allow "Best-Effort" Optimization for Custom Models via ipex.llm.optimize

Motivation:
The ipex.llm.optimize API is a powerful API for accelerated inference on supported LLM families on xpus . However, its current design seems tightly coupled to these specific, verified architectures and ipex.optimize is often limited.

Problem Description:
When working with custom decoder models face challenges when trying to leverage ipex.llm.optimize.
These models include:

  • Smaller, domain-specific decoders tailored for particular tasks.
  • Decoder components within larger Vision Language Models (VLMs).
  • Novel architectures developed during research.

Currently, applying ipex.llm.optimize to such models often requires non-trivial workarounds, such as modifying the model's config.json or using monkey-patching techniques to make the model appear as one of the supported types. This process is indirect, adds development overhead, and isn't guaranteed to apply optimizations correctly.

Proposed Solution:
Introduce a pathway for ipex.llm.optimize to apply optimizations on a "best-effort" basis to models not explicitly listed as supported. This could involve:

  1. An Opt-in Mechanism: A boolean flag like attempt_optimization_on_unsupported=True could allow users to explicitly request optimization, acknowledging it might not be fully tuned or guaranteed.
  2. Heuristic-Based Optimization: The optimizer could inspect the provided torch.nn.Module and apply optimizations known to be generally applicable to transformer decoder blocks (e.g., optimizing linear layers, specific activation functions, KV caching if patterns are detected) without relying on exact model family identification.
  3. User Hints (Optional): Potentially allow users to provide basic hints about the model structure if needed (though a fully automatic approach is preferred).

Benefits:

  • Reduced Friction: Lowers the barrier for developers to experiment with IPEX optimizations on custom models.
  • Faster Iteration: Enables quicker testing and deployment of optimized custom architectures.
  • Broader Applicability: Extends the reach and utility of IPEX optimizations beyond the core supported model list.
  • Flexibility: Allows optimizing components (like VLM decoders) independently.

Conclusion:
Providing a mechanism, even if experimental or "best-effort," to apply ipex.llm.optimize to a wider range of decoder-like models would be a valuable addition for the community building and deploying custom AI solutions on Intel hardware.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions