Add Flax Dinov2 #31960

MHRDYN7 · 2024-07-14T18:39:02Z

This PR adds the Flax implementation of Dinov2, which seems to have been due since #25579.

All the components of the pytorch Dinov2 model can be converted to flax except "interpolate_pos_encoding" which uses torch.nn.functional.interpolate. The closest jax function to replicate this is jax.image.scale_and_translate, however there seems to be a slight difference between these functions in the "Bicubic" mode (https://github.com/google/jax/issues/15768).

In Dinov2, the pretrained weights of the position encoding are for the image size of 512, but we load images of size 224 into the model. The interpolate function acts to convert the shapes of the position encoding according to the size of the input images. The ViT model does have this interpolate function, but it's not there in the FlaxViT implementation as the config and input image sizes are the same.

For now, I have directly loaded the pos_encoding weights from the pt model to flax, right after interpolation (which is saved in a safetensors file). This passes all the tests (including the two new integration tests added on top of the FlaxViT tests). Surely, this brute force approach to loading the original interpolated pos_encodings will not work, but otherwise, the slight deviations from jax scale_and _translate will fail the tests. @amyeroberts @sanchit-gandhi

Other Remaining Tasks:

~~Add the flax weights in .msgpack files to hub~~
~~Test the SwiGLUFFN dense layer for vit giant~~

into add_FlaxDinov2 after completing implementation of FlaxDinov2

amyeroberts · 2024-07-17T20:01:40Z

Hi @MHRDYN7, thanks for working on this conversion!

The interpolate logic is tricky. If loading the weights directly at the moment means this model passes, then that's a good guide the conversion is OK. We might have to do something where we remove this before merge and skip the equivalence tests.

In the meantime, the first thing to do is get the other tests passing. Some of the failing tests are unrelated and have fixes upstream. Could you rebase on main to include these? For the quality checks, running make fixup should resolve

MHRDYN7 · 2024-07-24T14:18:48Z

Hi @amyeroberts, I did try make fixup and I'm not really sure why the two tests are still failing. Moreover, what should be done for skipping the equivalence test? Should I just remove the directly loaded tensors after interpolate and change the "expected_slice" tensor in the integration tests accordingly to make them pass?

amyeroberts · 2024-07-26T08:45:35Z

@MHRDYN7 For the quality checks, you'll need to run make fix-copies and then possibly make fixup afterwards (I would just to be safe). You can see this from the CI logs.

Moreover, what should be done for skipping the equivalence test?

For this, in general, we would add a @unittest.skip decorator on the test. Typically, this means overriding the test in the specific model's tests and then adding the decorator e.g. like here

For your proposal re weights, this might be a good option as we'd still be checking the rest of the model. When you say "remove", I'm guessing you mean from the respective state dicts?

MHRDYN7 · 2024-07-29T14:45:26Z

@amyeroberts thank you. I have tried to solve all the issues. To summarize, I have finally decided to keep the ~~jax.image.resize~~ jax.image.scale_and_translate layer as it is, even though it results in slightly different output tensors compared to the pytorch model. This is because: 1. the outputs are still close enough to correctly predict the class of the input image and 2. the implementation will be exact once jax updates the resize layer to follow the pytorch conventions in the future. As a result, there was no need to skip any tests. In addition, a PR has been opened on the hub to add the flax weights. Please let me know if anything is out of order.

amyeroberts · 2024-07-30T17:33:05Z

@MHRDYN7 Great!

the implementation will be exact once jax updates the resize layer to follow the pytorch conventions in the future.

Is there a plan for this to happen in the future?

In addition, a PR has been opened on the hub to add the flax weights.

This shouldn't be necessary, all the model frameworks: TF, PyTorch, Flax should be able to load the safetensors file.

MHRDYN7 · 2024-07-30T18:18:35Z

Is there a plan for this to happen in the future?

The issue from the jax repo mentioned on my first comment, suggests that they did come up with the fix but no steps were taken and also there are no PRs related to the issue. I might just open a PR there if I can; shouldn't be hard to solve.

This shouldn't be necessary, all the model frameworks: TF, PyTorch, Flax should be able to load the safetensors file.

It's good to hear that. Indeed, the models don't necessarily need the .msgpack weights. I observed the flax weights on the hub for many models and thought it was a convention to add those.

amyeroberts

Thanks for adding this!

Mostly just some nits. Overall LGTM @sanchit-gandhi could you give. quick once-over to confirm flax is OK?

src/transformers/models/dinov2/modeling_flax_dinov2.py

amyeroberts · 2024-07-30T19:45:25Z

src/transformers/models/dinov2/modeling_flax_dinov2.py

+    new_height_ratio = jnp.float32(height / math.sqrt(num_positions))  # ? 16/37
+    new_width_ratio = jnp.float32(width / math.sqrt(num_positions))  # ? 16/37
+
+    # patch_pos_embed = jax.image.resize(patch_pos_embed, shape=(hidden_states.shape[0], dim, height, width), method='bicubic', antialias=False)


Why commented out?

It seems that I mistakenly wrote in my last comment that I used jax.image.resize whilst I actually used jax.image.scale_and_translate. Both these functions ultimately call the same helper function internally and therefore both of these could be used for interpolating the tensor. The reason why scale_and_translate() is the better fit is that it allows us to set the scale argument (which is key according to the original Dinov2 repo) while resize() determines the scale on its own. I'll remove the commented out line of code

src/transformers/models/dinov2/modeling_flax_dinov2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

MHRDYN7 · 2024-08-01T10:05:24Z

@amyeroberts thanks a lot for the review. All the tests are passing again.

sanchit-gandhi

The PR generally looks in good shape! Well done on handling all of the weight initialisations carefully @MHRDYN7 and porting the new functions over to Flax.

The main request from my review is using # Copied from statements as much as possible. There are many modules / methods that are copied 1-for-1 from existing models in the library. Here, prepending them with a # Copied from helps:

Keep code sync'd across models
The reviewer pinpoint which parts of the code to focus on!

Regarding your PR description: I didn't fully understand what the issue was with the position embedding weights - you've defined them as a standard self.param, and the keys look to match those from PyTorch? Let me know if I'm missing something here!

src/transformers/models/dinov2/modeling_flax_dinov2.py

sanchit-gandhi · 2024-08-02T08:04:23Z

src/transformers/models/dinov2/modeling_flax_dinov2.py

+        )
+
+
+class FlaxDinov2PreTrainedModel(FlaxPreTrainedModel):


I would copy this from Beit

Suggested change

class FlaxDinov2PreTrainedModel(FlaxPreTrainedModel):

# Copied from transformers.models.beit.modeling_flax_beit.FlaxBeitPreTrainedModel with Beit-> Dinov2, beit -> dinov2

sanchit-gandhi · 2024-08-02T08:04:39Z

src/transformers/models/dinov2/modeling_flax_dinov2.py

+        # init input tensors
+        pixel_values = jnp.zeros(input_shape, dtype=self.dtype)
+
+        params_rng, dropout_rng = jax.random.split(rng)


We're missing the rng for the droppath - copying from Beit is going to fix this

tests/models/dinov2/test_modeling_flax_dinov2.py

sanchit-gandhi · 2024-08-02T08:10:21Z

tests/models/dinov2/test_modeling_flax_dinov2.py

@@ -0,0 +1,259 @@
+# coding=utf-8


It'd be super helpful to add # Copied from statements in the tests as well!

sanchit-gandhi · 2024-08-02T08:13:52Z

src/transformers/models/dinov2/modeling_flax_dinov2.py

+    >>> image_processor = AutoImageProcessor.from_pretrained("facebook/dinov2-base-imagenet1k-1-layer")
+    >>> model = FlaxDinov2ForImageClassification.from_pretrained("facebook/dinov2-base-imagenet1k-1-layer")
+
+    >>> inputs = image_processor(images=image, return_tensors="np")


Fine to do "np", since we convert "np" arrays to "jnp" arrays before calling the Flax module! (In fact, doing "np" is preferable here, since "jnp" arrays are automatically created on the accelerator device, whereas "np" is always on cpu -> creating your input on cpu and only moving it to accelerator when required is better for async dispatch)

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

MHRDYN7 · 2024-08-03T22:08:43Z

@sanchit-gandhi thanks a lot for the careful review.

A summary of the updates

Almost all of your suggestions have been incorporated
Unable to use # Copied from Beit for the PreTrainedModel class
~~Facing some issues trying to use # Copied from in the tests, and that's why they are kept as they were~~ Now, this has been solved, everything lgtm.

MHRDYN7 · 2024-08-03T22:28:25Z

Regarding your PR description: I didn't fully understand what the issue was with the position embedding weights - you've defined them as a standard self.param, and the keys look to match those from PyTorch? Let me know if I'm missing something here!

The position embedding weights can be loaded perfectly. However, these weights are later modified according to the number of patches using F.interpolate (with bicubic mode) in torch. We can replicate this behavior with jax.image.scale_and_translate (or with image.resize), but it seems that this function is slight different from torch interpolate only in case of the bicubic mode, resulting slightly different output hidden_states.

Please Note:
the default config is image_size = 224, patch_size = 16 (and interpolation of the pos_embeds is not needed in this case);
but the pretrained weights config on hub is image_size = 518, patch_size = 14

amyeroberts · 2024-08-16T11:09:18Z

Thanks for the detailed explanation and iterating with us @MHRDYN7! @sanchit-gandhi is off at the moment, but I can see you've addressed his comments, so I think we're OK to merge without his second review.

Final step is running the slow tests for the model before merge. Could you push an empty commit with the message [run_slow] dinov2?

HuggingFaceDocBuilderDev · 2024-08-16T13:03:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MHRDYN7 · 2024-08-16T13:14:43Z

@amyeroberts, I've pushed the required commit. Now I guess, it requires your approval for running the slow tests

MHRDYN7 · 2024-08-18T12:11:56Z

@amyeroberts slow tests passed! Ready to be merged

amyeroberts

Great piece of work - thanks for adding!

MHRDYN7 added 6 commits July 9, 2024 19:38

tfmsenv restored in main

cee4cf8

installed flax

7782a52

Merge branch 'huggingface:main' into add_FlaxDinov2

ee04716

forward pass done and all tests passed

53b4f36

Merge branch 'add_FlaxDinov2' of https://github.com/MHRDYN7/transformers

6505d5c

into add_FlaxDinov2 after completing implementation of FlaxDinov2

make fix-copies and cleaning the scripts

aacfc65

MHRDYN7 force-pushed the add_FlaxDinov2 branch from fae5604 to aacfc65 Compare July 23, 2024 17:19

MHRDYN7 added 7 commits July 23, 2024 23:20

Merge branch 'huggingface:main' into add_FlaxDinov2

d9e93f8

fixup attempt 1

94857a2

fixup attempt 2

a53426d

fixup third attempt

c805f2e

fixup attempt 4

518f4ba

Merge branch 'huggingface:main' into add_FlaxDinov2

44ec67b

fixup attempt 5

f957993

MHRDYN7 added 7 commits July 26, 2024 15:50

dinov2 doc fixed

60666b3

FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE

d985288

external pos_encoding layer removed

0750b05

fixup attempt 6

50278a1

fixed integration test values

4b1c4c8

fixup attempt 7

5d23c13

Merge branch 'huggingface:main' into add_FlaxDinov2

f10d90b

amyeroberts reviewed Jul 30, 2024

View reviewed changes

MHRDYN7 and others added 2 commits August 1, 2024 12:09

Update src/transformers/models/dinov2/modeling_flax_dinov2.py

20fd4c9

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update src/transformers/models/dinov2/modeling_flax_dinov2.py

85ca2c8

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

MHRDYN7 and others added 7 commits August 1, 2024 13:03

Update src/transformers/models/dinov2/modeling_flax_dinov2.py

7c627a9

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update src/transformers/models/dinov2/modeling_flax_dinov2.py

f58b9b3

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update src/transformers/models/dinov2/modeling_flax_dinov2.py

6634d03

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Merge branch 'huggingface:main' into add_FlaxDinov2

c0a20e6

comments removed

9cedc0f

comment removed from the test

1372ec7

fixup

a87235d

sanchit-gandhi reviewed Aug 2, 2024

View reviewed changes

MHRDYN7 and others added 8 commits August 2, 2024 18:12

Update src/transformers/models/dinov2/modeling_flax_dinov2.py

305b6db

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

new fixes 1

0606e67

interpolate_pos_encoding function removed

b543a6d

droppath rng fixed, pretrained beit copied-from still not working

0056b6f

modeling_flax_dinov2.py reformatted

1f954ee

Update tests/models/dinov2/test_modeling_flax_dinov2.py

86f619f

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

added Copied from, to the tests

5a9eb9e

copied from statements removed from tests

373850d

MHRDYN7 added 2 commits August 4, 2024 12:37

Merge branch 'huggingface:main' into add_FlaxDinov2

3b04809

fixed copied from statements in the tests

6772dd7

amyeroberts added the run-slow label Aug 16, 2024

MHRDYN7 added 2 commits August 16, 2024 18:36

Merge branch 'huggingface:main' into add_FlaxDinov2

1603d56

[run_slow] dinov2

e92ea55

amyeroberts approved these changes Aug 19, 2024

View reviewed changes

amyeroberts merged commit 843e5e2 into huggingface:main Aug 19, 2024
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flax Dinov2 #31960

Add Flax Dinov2 #31960

MHRDYN7 commented Jul 14, 2024 •

edited

Loading

amyeroberts commented Jul 17, 2024

MHRDYN7 commented Jul 24, 2024

amyeroberts commented Jul 26, 2024 •

edited

Loading

MHRDYN7 commented Jul 29, 2024 •

edited

Loading

amyeroberts commented Jul 30, 2024

MHRDYN7 commented Jul 30, 2024 •

edited

Loading

amyeroberts left a comment

amyeroberts Jul 30, 2024

MHRDYN7 Aug 1, 2024

MHRDYN7 commented Aug 1, 2024

sanchit-gandhi left a comment

sanchit-gandhi Aug 2, 2024

sanchit-gandhi Aug 2, 2024

sanchit-gandhi Aug 2, 2024

sanchit-gandhi Aug 2, 2024 •

edited

Loading

MHRDYN7 commented Aug 3, 2024 •

edited

Loading

MHRDYN7 commented Aug 3, 2024 •

edited

Loading

amyeroberts commented Aug 16, 2024

HuggingFaceDocBuilderDev commented Aug 16, 2024

MHRDYN7 commented Aug 16, 2024

MHRDYN7 commented Aug 18, 2024

amyeroberts left a comment

	class FlaxDinov2PreTrainedModel(FlaxPreTrainedModel):
	# Copied from transformers.models.beit.modeling_flax_beit.FlaxBeitPreTrainedModel with Beit-> Dinov2, beit -> dinov2

Add Flax Dinov2 #31960

Add Flax Dinov2 #31960

Conversation

MHRDYN7 commented Jul 14, 2024 • edited Loading

amyeroberts commented Jul 17, 2024

MHRDYN7 commented Jul 24, 2024

amyeroberts commented Jul 26, 2024 • edited Loading

MHRDYN7 commented Jul 29, 2024 • edited Loading

amyeroberts commented Jul 30, 2024

MHRDYN7 commented Jul 30, 2024 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Jul 30, 2024

Choose a reason for hiding this comment

MHRDYN7 Aug 1, 2024

Choose a reason for hiding this comment

MHRDYN7 commented Aug 1, 2024

sanchit-gandhi left a comment

Choose a reason for hiding this comment

sanchit-gandhi Aug 2, 2024

Choose a reason for hiding this comment

sanchit-gandhi Aug 2, 2024

Choose a reason for hiding this comment

sanchit-gandhi Aug 2, 2024

Choose a reason for hiding this comment

sanchit-gandhi Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

MHRDYN7 commented Aug 3, 2024 • edited Loading

MHRDYN7 commented Aug 3, 2024 • edited Loading

amyeroberts commented Aug 16, 2024

HuggingFaceDocBuilderDev commented Aug 16, 2024

MHRDYN7 commented Aug 16, 2024

MHRDYN7 commented Aug 18, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

MHRDYN7 commented Jul 14, 2024 •

edited

Loading

amyeroberts commented Jul 26, 2024 •

edited

Loading

MHRDYN7 commented Jul 29, 2024 •

edited

Loading

MHRDYN7 commented Jul 30, 2024 •

edited

Loading

sanchit-gandhi Aug 2, 2024 •

edited

Loading

MHRDYN7 commented Aug 3, 2024 •

edited

Loading

MHRDYN7 commented Aug 3, 2024 •

edited

Loading