Skip to content

Add test for Phi-3-vision-128k-instruct #1850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

kshitij12345
Copy link
Collaborator

@kshitij12345 kshitij12345 commented Mar 7, 2025

Adds a test for Phi-3-vision-128k-instruct

The test takes around 4GB of device memory while running.

The relaxed tolerances worked fine with 50 repeats of the test - pytest thunder/tests/test_networks.py -k phi3_vi --count 50

@kshitij12345 kshitij12345 marked this pull request as ready for review March 7, 2025 13:07
Copy link
Collaborator

@IvanYashchuk IvanYashchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible the test should be skipped if there's not enough memory on a GPU to run the test. Another alternative is to modify the config to improve test duration and memory consumption.

from thunder.dynamo import thunderfx

cfg = Phi3Config(**phi3_vision_cfg)
cfg.num_hidden_layers = 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the memory requirement for 1 layer?

cfg.num_hidden_layers = 2

with torch.device("cuda"):
model = AutoModelForCausalLM.from_config(cfg, trust_remote_code=False, torch_dtype=torch.bfloat16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing vocab_size from 32064 to a smaller number should decrease the memory requirements of this test.


@requiresCUDA
def test_hf_phi3_vision():
# This test takes around 4045406208 bytes (~4GB) of memory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a decorator to skip the test based on the memory requirements? There are NVIDIA internal CI jobs on hardware with a limited amount of memory that could potentially fail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice +1 here

@@ -558,6 +558,189 @@ def test_hf_llama():
assert len(get_fusion_symbols(thunder.last_traces(jm)[-1])) == 6


# We need to copy config here as the AutoModel doesn't work with `trust_remote_code=True`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error is - ValueError: Loading microsoft/Phi-3-vision-128k-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.

Script

import torch
from transformers import AutoModelForCausalLM, AutoConfig
from transformers.models.phi3 import Phi3Config

model_id = "microsoft/Phi-3-vision-128k-instruct"

# Initialize the pre-trained model
cfg = AutoConfig.from_pretrained(model_id, trust_remote_code=False)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
cfg.num_hidden_layers = 2

from thunder.dynamo import thunderfx
from thunder.dynamo.report import get_thunder_fxgraph_reports, fx_report, ThunderCompileSpecification

with torch.device("cuda"):
    model = AutoModelForCausalLM.from_config(cfg, trust_remote_code=False, torch_dtype=torch.bfloat16)
    print(model)

"original_max_position_embeddings": 4096,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"long_factor": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of this array of values is given by ~hidden_size/2*num_attention_heads so reducing the size of the model will improve this line count here. Also I think these are just numbers that you can set programmatically since we don't care about correctness in this test.

# for `Phi-3-vision-128k-instruct`.
phi3_vision_cfg = {
"_name_or_path": "Phi-3-vision-128k-instruct",
"architectures": ["Phi3VForCausalLM"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that this is not a model included in the transformers library and therefore one would think that with thrust_remote_code=False this model wouldn't be loaded at all, but it doesn't seem to be the case 🤔

64.81001281738281,
64.81001281738281,
],
"short_factor": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the long_factor

"transformers_version": "4.38.1",
"use_cache": True,
"vocab_size": 32064,
"_attn_implementation": "sdpa",
Copy link
Collaborator

@riccardofelluga riccardofelluga Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if _attn_implementation is not set?


@requiresCUDA
def test_hf_phi3_vision():
# This test takes around 4045406208 bytes (~4GB) of memory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice +1 here

loss_grad = torch.randn_like(expected.loss)
actual_grads = torch.autograd.grad(actual.loss, model.parameters(), grad_outputs=loss_grad)
expected_grads = torch.autograd.grad(expected.loss, model.parameters(), grad_outputs=loss_grad)
torch.testing.assert_close(actual_grads, expected_grads, rtol=1e-2, atol=1e-2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking for info now that I see the custom tolerances, on what order of magnitude is the mismatch that you are getting?

@kshitij12345
Copy link
Collaborator Author

Interesting that this is not a model included in the transformers library and therefore one would think that with thrust_remote_code=False this model wouldn't be loaded at all, but it doesn't seem to be the case 🤔

Good catch, on checking the type of model, I see it is Phi3ForCausalLM even if the config specifies a modeling_phi3_v.Phi3VForCausalLM architecture.

With trust_remote_code, it seems to be hitting an error on thunderfx, will investigate and proceed by filing relevant issue. Will turn the PR back to draft till it is ready again.

Thanks @riccardofelluga @IvanYashchuk

Script to check model

import torch
from transformers import AutoModelForCausalLM
from transformers.models.phi3 import Phi3Config

model_id = "microsoft/Phi-3-vision-128k-instruct"

phi3_vision_cfg = {
  "_name_or_path": "Phi-3-vision-128k-instruct",
  "architectures": [
    "Phi3VForCausalLM"
  ],
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "configuration_phi3_v.Phi3VConfig",
    "AutoModelForCausalLM": "modeling_phi3_v.Phi3VForCausalLM"
  },
  "bos_token_id": 1,
  "embd_layer": {
    "embedding_cls": "image",
    "hd_transform_order": "sub_glb",
    "projection_cls": "mlp",
    "use_hd_transform": True,
    "with_learnable_separator": True
  },
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "img_processor": {
    "image_dim_out": 1024,
    "model_name": "openai/clip-vit-large-patch14-336",
    "name": "clip_vision_model",
    "num_img_tokens": 144
  },
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3_v",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "original_max_position_embeddings": 4096,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "long_factor": [
      1.0299999713897705,
      1.0499999523162842,
      1.0499999523162842,
      1.0799999237060547,
      1.2299998998641968,
      1.2299998998641968,
      1.2999999523162842,
      1.4499999284744263,
      1.5999999046325684,
      1.6499998569488525,
      1.8999998569488525,
      2.859999895095825,
      3.68999981880188,
      5.419999599456787,
      5.489999771118164,
      5.489999771118164,
      9.09000015258789,
      11.579999923706055,
      15.65999984741211,
      15.769999504089355,
      15.789999961853027,
      18.360000610351562,
      21.989999771118164,
      23.079999923706055,
      30.009998321533203,
      32.35000228881836,
      32.590003967285156,
      35.56000518798828,
      39.95000457763672,
      53.840003967285156,
      56.20000457763672,
      57.95000457763672,
      59.29000473022461,
      59.77000427246094,
      59.920005798339844,
      61.190006256103516,
      61.96000671386719,
      62.50000762939453,
      63.3700065612793,
      63.48000717163086,
      63.48000717163086,
      63.66000747680664,
      63.850006103515625,
      64.08000946044922,
      64.760009765625,
      64.80001068115234,
      64.81001281738281,
      64.81001281738281
    ],
    "short_factor": [
      1.05,
      1.05,
      1.05,
      1.1,
      1.1,
      1.1,
      1.2500000000000002,
      1.2500000000000002,
      1.4000000000000004,
      1.4500000000000004,
      1.5500000000000005,
      1.8500000000000008,
      1.9000000000000008,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.000000000000001,
      2.1000000000000005,
      2.1000000000000005,
      2.2,
      2.3499999999999996,
      2.3499999999999996,
      2.3499999999999996,
      2.3499999999999996,
      2.3999999999999995,
      2.3999999999999995,
      2.6499999999999986,
      2.6999999999999984,
      2.8999999999999977,
      2.9499999999999975,
      3.049999999999997,
      3.049999999999997,
      3.049999999999997
    ],
    "type": "su"
  },
  "rope_theta": 10000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": False,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.1",
  "use_cache": True,
  "vocab_size": 32064,
  "_attn_implementation": "sdpa"
}

# Initialize the pre-trained model
cfg = Phi3Config(**phi3_vision_cfg)

# cfg = AutoConfig.from_pretrained(model_id, trust_remote_code=False)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
cfg.num_hidden_layers = 2

from thunder.dynamo import thunderfx
from thunder.dynamo.report import get_thunder_fxgraph_reports, fx_report, ThunderCompileSpecification

with torch.device("cuda"):
    model = AutoModelForCausalLM.from_config(cfg, trust_remote_code=False, torch_dtype=torch.bfloat16)
    print(model)

# eager - 3596534784
@requiresCUDA
@requiresDeviceMemory(required_memory_bytes=int(3.6 * 1024 * 1024 * 1024))
@pytest.mark.parametrize("attn_implementation", [None, "eager"])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't add sdpa here as

ValueError: Phi3VForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: https://github.com/huggingface/transformers/issues/28005. If you believe this error is a bug, please open an issue in Transformers GitHub repository and load your model with the argument `attn_implementation="eager"` meanwhile. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="eager")`

@kshitij12345 kshitij12345 marked this pull request as ready for review June 2, 2025 10:06
@t-vi
Copy link
Collaborator

t-vi commented Jun 2, 2025

The test takes around 4GB of device memory while running.

would there be a chance to have a cut-down model even more similar to what we do with the other models?

I'm quite weary of this and we have been seeing OOM lately.
We used to have all tests take well below 1GB before.

@kshitij12345
Copy link
Collaborator Author

would there be a chance to have a cut-down model even more similar to what we do with the other models?

I'm quite weary of this and we have been seeing OOM lately.
We used to have all tests take well below 1GB before.

Updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants