Qwen3VLForConditionalGeneration.from_pretrained weight_only = True error

### System Info

platform: ubuntu 24.04.03
python 3.10
transformers 5.0
torch 2.10
accelerate 1.12
docker image: FROM nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04
https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct





### Who can help?

@yonigozlan @molbap

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

`
 _pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
web-1    | Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: 
`

I get the error when i try to run the code below


import logging
import os
import torch
from typing import Optional, Dict
from PIL import Image
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from app.core.config import settings

logger = logging.getLogger(__name__)

class LocalQwenVisionService:
    _instance = None

    def __new__(cls, model_path: Optional[str] = None):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance._initialized = False
        return cls._instance

    def __init__(self, model_path: Optional[str] = None):
        if self._initialized:
            return

        self.model_path = model_path or settings.QWEN_MODEL_PATH
        if not self.model_path or not os.path.exists(self.model_path):
            raise ValueError(f"Model path invalid: {self.model_path}")

        logger.info(f"🚀 Loading Qwen3-VL from: {self.model_path}")

        self.model = Qwen3VLForConditionalGeneration.from_pretrained(
            self.model_path,
            dtype=torch.float16,
            attn_implementation="sdpa",
            local_files_only=True,
            weights_only=False
        )

        self.processor = AutoProcessor.from_pretrained(
            self.model_path,
            local_files_only=True
        )

        logger.info(f"✅ Model loaded on: {next(self.model.parameters()).device}")
        self._initialized = True

    def analyze_image(
            self,
            image_path: str,
            prompt: Optional[str] = None
    ) -> Dict:
        """✅ Analyze image and return response"""

        if not os.path.exists(image_path):
            return {"status": "error", "error": f"Image not found: {image_path}"}

        if not prompt:
            prompt = (
                "Analyze this image and provide:\n"
                "1. Description of content\n"
                "2. Main objects and locations\n"
                "3. Atmosphere and mood\n"
                "4. Tags/categories\n"
                "5. One-sentence summary"
            )

        try:
            image = Image.open(image_path).convert("RGB")

            messages = [
                {
                    "role": "user",
                    "content": [
                        {"type": "image", "url": image},
                        {"type": "text", "text": prompt}
                    ]
                }
            ]

            inputs = self.processor(
                messages,
                tokenize=True,
                add_generation_prompt=True,
                return_dict=True,
                return_tensors='pt'
            )

            generated_ids = self.model.generate(
                **inputs,
                max_new_tokens=1024
            )

            generated_ids_trimmed = [
                out_ids[len(in_ids):]
                for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
            ]

            response_text = self.processor.batch_decode(
                generated_ids_trimmed,
                skip_special_tokens=True,
                clean_up_tokenization_spaces=False
            )[0]

            return {
                "status": "success",
                "message": response_text
            }

        except Exception as e:
            logger.error(f"❌ Error: {e}")
            return {"status": "error", "error": str(e)}

    def close(self):
        """Clean up resources"""
        logger.info("🛑 Closing model")
        if hasattr(self, "model"):
            del self.model
        torch.cuda.empty_cache()


What i tried
ENV TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1 - to docker file didnt help
weights_only=False - inside Qwen3VLForConditionalGeneration.from_pretrained - didnt help

import numpy
import torch.serialization

torch.serialization.add_safe_globals([
    (numpy._core.multiarray.scalar, 'numpy.core.multiarray.scalar'),
    numpy.dtype,
    numpy.dtypes.Float64DType
])
 - didnt help

### Expected behavior

I have no clue where should i dig to fix the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3VLForConditionalGeneration.from_pretrained weight_only = True error #43782

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3VLForConditionalGeneration.from_pretrained weight_only = True error #43782

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions