Add support for exporting CLIP submodels in OpenVINO #1563

msmiatac · 2025-12-17T13:51:27Z

What does this PR do?

Purpose: Add --submodel {vision,text,full} to the export CLI to allow exporting only the CLIP vision or text encoder. This change stems from the need to simplify CLIP model conversion for the DL Streamer project.
Motivation: DL Streamer benefits from separate CLIP submodules (vision/text) as standalone IRs. The flag removes downstream splitting complexity and leverages the existing TasksManager registry.
Implementation: Routes --submodel from the CLI to main_export. For Transformers CLIP under feature-extraction, main_export swaps to model.vision_model or model.text_model so TasksManager resolves the correct clip_vision_model/clip_text_model.
Documentation: Updates export.mdx with the new flag, usage, and examples for vision-only and text-only exports.
Tests: Adds CLIPSubmodelExportTest validating that the exported IRs expose the expected inputs (pixel_values for vision, input_ids for text).
Backward Compatibility: Default behavior remains unchanged (full export if the flag is omitted). No breaking changes.
Scope: Supports Transformers CLIP for feature-extraction. OpenCLIP has registry coverage and can be similarly wired in a follow-up to map its submodules; not included in this PR.
Impact: Cleaner exports, reducing integration overhead for DL Streamer and other consumers requiring separate CLIP encoders.

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

IlyasMoutawwakil · 2026-01-08T09:06:06Z

Add support for exporting CLIP submodels in OpenVINO

dc0ca1b

nikita-savelyevv requested review from echarlaix and rkazants December 19, 2025 05:35