feat(transformers/models): add dinov3_vit and dinov3_convnext (v4.57.1) #1439

zhangfeiran · 2025-11-25T10:05:23Z

What does this PR do?

Fixes # (issue)

Adds # (feature)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

gemini-code-assist · 2025-11-25T10:05:46Z

Summary of Changes

Hello @zhangfeiran, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the mindone.transformers library by introducing the DINOv3 family of vision models, specifically the ConvNext and Vision Transformer variants. This addition provides users with state-of-the-art self-supervised learning models for computer vision tasks, complete with their respective image processing capabilities, all adapted for the MindSpore framework. The integration into the auto-configuration system ensures seamless usability.

Highlights

New Model Integration: Added DINOv3ConvNext and DINOv3ViT models to the mindone.transformers library.
Image Processor: Introduced DINOv3ViTImageProcessorFast for efficient image preprocessing specific to the DINOv3 ViT model.
Auto-Configuration Support: Integrated the new models and their configurations into the mindone.transformers auto-configuration system, enabling easy loading and usage.
MindSpore Adaptation: The new models and image processor are adapted from Hugging Face Transformers to run on the MindSpore framework.
Comprehensive Testing: Included dedicated unit tests for both DINOv3ConvNextModel and DINOv3ViTModel to ensure correctness and compatibility.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR adds dinov3_vit and dinov3_convnext models. The changes include model implementations, configurations, and tests. The overall structure is good, but there are several critical issues in the implementation that need to be addressed. These include incorrect weight initialization syntax (using PyTorch's .data API which is not supported in MindSpore), a bug in the image preprocessing logic, and usage of PyTorch-specific .contiguous() calls. I've also pointed out some minor issues like wildcard imports. Please address these points to ensure the models work correctly.

gemini-code-assist · 2025-11-25T10:12:44Z

mindone/transformers/models/dinov3_convnext/modeling_dinov3_convnext.py

+    def _init_weights(self, module):
+        """Initialize the weights"""
+        if isinstance(module, (mint.nn.Linear, mint.nn.Conv2d)):
+            # Slightly different from the TF version which uses truncated_normal for initialization
+            # cf https://github.com/pytorch/pytorch/pull/5617
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+            if module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, (mint.nn.LayerNorm, DINOv3ConvNextLayerNorm)):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+        elif isinstance(module, DINOv3ConvNextLayer):
+            if module.gamma is not None:
+                module.gamma.data.fill_(self.config.layer_scale_init_value)


The weight initialization method _init_weights uses PyTorch-style in-place modification on .data, which is not supported for mindspore.Parameter. You should use helper functions like normal_, zeros_, and constant_ from mindone.models.utils to initialize the parameters correctly. Please also add from mindone.models.utils import constant_, normal_, zeros_ to the imports at the top of the file.

Suggested change

def _init_weights(self, module):

"""Initialize the weights"""

if isinstance(module, (mint.nn.Linear, mint.nn.Conv2d)):

# Slightly different from the TF version which uses truncated_normal for initialization

# cf https://github.com/pytorch/pytorch/pull/5617

module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)

if module.bias is not None:

module.bias.data.zero_()

elif isinstance(module, (mint.nn.LayerNorm, DINOv3ConvNextLayerNorm)):

module.bias.data.zero_()

module.weight.data.fill_(1.0)

elif isinstance(module, DINOv3ConvNextLayer):

if module.gamma is not None:

module.gamma.data.fill_(self.config.layer_scale_init_value)

def _init_weights(self, module):

"""Initialize the weights"""

if isinstance(module, (mint.nn.Linear, mint.nn.Conv2d)):

# Slightly different from the TF version which uses truncated_normal for initialization

# cf https://github.com/pytorch/pytorch/pull/5617

normal_(module.weight, mean=0.0, std=self.config.initializer_range)

if module.bias is not None:

zeros_(module.bias)

elif isinstance(module, (mint.nn.LayerNorm, DINOv3ConvNextLayerNorm)):

zeros_(module.bias)

constant_(module.weight, 1.0)

elif isinstance(module, DINOv3ConvNextLayer):

if module.gamma is not None:

constant_(module.gamma, self.config.layer_scale_init_value)

gemini-code-assist · 2025-11-25T10:12:44Z

mindone/transformers/models/dinov3_vit/image_processing_dinov3_vit_fast.py

+            for i in range(len(stacked_images)):
+                image = stacked_images[i]
+                # TODO mindspore.dataset.vision.Resize could only support (H, W, 3) format,
+                #  batch_size stacked image should be computed in one iteration
+                # batch_size, channels = stacked_images.shape[0], stacked_images.shape[1]
+                # stacked_images_updated = mint.zeros((batch_size, channels, resized_height, resized_width), dtype=stacked_images.dtype)
+                # TODO: current implementation of resize require input to be unscaled image, so the order is changed to: 
+                # resize -> rescale -> normalize, causing ~e-3 precision difference
+                if do_resize:
+                    image = self.resize(
+                        image=image, size=size, interpolation=interpolation, antialias=True
+                    )
+                if do_rescale:
+                    image = self.rescale(image, rescale_factor)
+            stacked_images_updated.append(image)


There is a bug in the _preprocess method. The stacked_images_updated.append(image) call is outside the for loop, which means only the last image of each group is processed and kept. This should be inside the loop to process all images.

Suggested change

for i in range(len(stacked_images)):

image = stacked_images[i]

# TODO mindspore.dataset.vision.Resize could only support (H, W, 3) format,

# batch_size stacked image should be computed in one iteration

# batch_size, channels = stacked_images.shape[0], stacked_images.shape[1]

# stacked_images_updated = mint.zeros((batch_size, channels, resized_height, resized_width), dtype=stacked_images.dtype)

# TODO: current implementation of resize require input to be unscaled image, so the order is changed to:

# resize -> rescale -> normalize, causing ~e-3 precision difference

if do_resize:

image = self.resize(

image=image, size=size, interpolation=interpolation, antialias=True

)

if do_rescale:

image = self.rescale(image, rescale_factor)

stacked_images_updated.append(image)

for i in range(len(stacked_images)):

image = stacked_images[i]

# TODO mindspore.dataset.vision.Resize could only support (H, W, 3) format,

# batch_size stacked image should be computed in one iteration

# batch_size, channels = stacked_images.shape[0], stacked_images.shape[1]

# stacked_images_updated = mint.zeros((batch_size, channels, resized_height, resized_width), dtype=stacked_images.dtype)

# TODO: current implementation of resize require input to be unscaled image, so the order is changed to:

# resize -> rescale -> normalize, causing ~e-3 precision difference

if do_resize:

image = self.resize(

image=image, size=size, interpolation=interpolation, antialias=True

)

if do_rescale:

image = self.rescale(image, rescale_factor)

stacked_images_updated.append(image)

gemini-code-assist · 2025-11-25T10:12:45Z

mindone/transformers/models/dinov3_vit/modeling_dinov3_vit.py

+        attn_weights = attn_weights * attention_mask
+
+    attn_output = mint.matmul(attn_weights, value)
+    attn_output = attn_output.transpose(1, 2).contiguous()


The .contiguous() method is a PyTorch-specific call and is not available for MindSpore tensors. It should be removed. The transpose operation in MindSpore returns a contiguous tensor by default in most cases.

Suggested change

attn_output = attn_output.transpose(1, 2).contiguous()

attn_output = attn_output.transpose(1, 2)

gemini-code-assist · 2025-11-25T10:12:45Z

mindone/transformers/models/dinov3_vit/modeling_dinov3_vit.py

+            **kwargs,
+        )
+
+        attn_output = attn_output.reshape(batch_size, patches, -1).contiguous()


The .contiguous() method is a PyTorch-specific call and is not available for MindSpore tensors. It should be removed. The reshape operation in MindSpore returns a contiguous tensor.

Suggested change

attn_output = attn_output.reshape(batch_size, patches, -1).contiguous()

attn_output = attn_output.reshape(batch_size, patches, -1)

gemini-code-assist · 2025-11-25T10:12:45Z

mindone/transformers/models/dinov3_vit/modeling_dinov3_vit.py

+    def _init_weights(self, module) -> None:
+        """Initialize the weights"""
+        if isinstance(module, (mint.nn.Linear, mint.nn.Conv2d)):
+            trunc_normal_(module.weight,mean=0.0, std=self.config.initializer_range)
+            if module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, mint.nn.LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+        elif isinstance(module, DINOv3ViTEmbeddings):
+            trunc_normal_(module.cls_token.data,mean=0.0, std=self.config.initializer_range)
+            if module.config.num_register_tokens > 0:
+                trunc_normal_(module.register_tokens,mean=0.0, std=self.config.initializer_range)
+            module.mask_token.data.zero_()
+        elif isinstance(module, DINOv3ViTLayerScale):
+            module.lambda1.data.fill_(self.config.layerscale_value)


The weight initialization method _init_weights uses PyTorch-style in-place modification on .data, which is not supported for mindspore.Parameter. You should use helper functions like zeros_ and constant_ from mindone.models.utils to initialize the parameters correctly. Also, trunc_normal_ should be called on the Parameter object directly, not on its .data attribute. Please also add from mindone.models.utils import zeros_, constant_ to the imports at the top of the file.

Suggested change

def _init_weights(self, module) -> None:

"""Initialize the weights"""

if isinstance(module, (mint.nn.Linear, mint.nn.Conv2d)):

trunc_normal_(module.weight,mean=0.0, std=self.config.initializer_range)

if module.bias is not None:

module.bias.data.zero_()

elif isinstance(module, mint.nn.LayerNorm):

module.bias.data.zero_()

module.weight.data.fill_(1.0)

elif isinstance(module, DINOv3ViTEmbeddings):

trunc_normal_(module.cls_token.data,mean=0.0, std=self.config.initializer_range)

if module.config.num_register_tokens > 0:

trunc_normal_(module.register_tokens,mean=0.0, std=self.config.initializer_range)

module.mask_token.data.zero_()

elif isinstance(module, DINOv3ViTLayerScale):

module.lambda1.data.fill_(self.config.layerscale_value)

def _init_weights(self, module) -> None:

"""Initialize the weights"""

if isinstance(module, (mint.nn.Linear, mint.nn.Conv2d)):

trunc_normal_(module.weight, mean=0.0, std=self.config.initializer_range)

if module.bias is not None:

zeros_(module.bias)

elif isinstance(module, mint.nn.LayerNorm):

zeros_(module.bias)

constant_(module.weight, 1.0)

elif isinstance(module, DINOv3ViTEmbeddings):

trunc_normal_(module.cls_token, mean=0.0, std=self.config.initializer_range)

if module.config.num_register_tokens > 0:

trunc_normal_(module.register_tokens, mean=0.0, std=self.config.initializer_range)

zeros_(module.mask_token)

elif isinstance(module, DINOv3ViTLayerScale):

constant_(module.lambda1, self.config.layerscale_value)

gemini-code-assist · 2025-11-25T10:12:45Z

mindone/transformers/models/dinov3_convnext/__init__.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .modeling_dinov3_convnext import *


Wildcard imports (from ... import *) are discouraged by PEP 8 as they make it unclear which names are present in the namespace. It's better to explicitly import the required names. Based on __all__ in modeling_dinov3_convnext.py, you should import DINOv3ConvNextModel and DINOv3ConvNextPreTrainedModel.

Suggested change

from .modeling_dinov3_convnext import *

from .modeling_dinov3_convnext import DINOv3ConvNextModel, DINOv3ConvNextPreTrainedModel

gemini-code-assist · 2025-11-25T10:12:45Z

mindone/transformers/models/dinov3_vit/__init__.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .image_processing_dinov3_vit_fast import DINOv3ViTImageProcessorFast
+from .modeling_dinov3_vit import *


Wildcard imports (from ... import *) are discouraged by PEP 8 as they make it unclear which names are present in the namespace. It's better to explicitly import the required names. Based on __all__ in modeling_dinov3_vit.py, you should import DINOv3ViTModel and DINOv3ViTPreTrainedModel.

Suggested change

from .modeling_dinov3_vit import *

from .modeling_dinov3_vit import DINOv3ViTModel, DINOv3ViTPreTrainedModel

zhangfeiran added 6 commits November 19, 2025 12:24

add DINOv3_vit

212a3d0

fix import nn mistake

96b18af

add image_processing_dinov3_fast

45c52ca

fix image_processing

c58dd34

fix rescale/resize order

00e05ed

add DINOv3 ConvNext

d44eb3f

zhangfeiran requested a review from vigo999 as a code owner November 25, 2025 10:05

gemini-code-assist bot reviewed Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(transformers/models): add dinov3_vit and dinov3_convnext (v4.57.1) #1439

feat(transformers/models): add dinov3_vit and dinov3_convnext (v4.57.1) #1439

Uh oh!

zhangfeiran commented Nov 25, 2025

Uh oh!

gemini-code-assist bot commented Nov 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	attn_output = attn_output.transpose(1, 2).contiguous()
	attn_output = attn_output.transpose(1, 2)

	attn_output = attn_output.reshape(batch_size, patches, -1).contiguous()
	attn_output = attn_output.reshape(batch_size, patches, -1)

	from .modeling_dinov3_convnext import *
	from .modeling_dinov3_convnext import DINOv3ConvNextModel, DINOv3ConvNextPreTrainedModel

	from .modeling_dinov3_vit import *
	from .modeling_dinov3_vit import DINOv3ViTModel, DINOv3ViTPreTrainedModel

feat(transformers/models): add dinov3_vit and dinov3_convnext (v4.57.1) #1439

Are you sure you want to change the base?

feat(transformers/models): add dinov3_vit and dinov3_convnext (v4.57.1) #1439

Uh oh!

Conversation

zhangfeiran commented Nov 25, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

gemini-code-assist bot commented Nov 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant