-
Notifications
You must be signed in to change notification settings - Fork 96
Add test for Phi-3-vision-128k-instruct #1850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible the test should be skipped if there's not enough memory on a GPU to run the test. Another alternative is to modify the config to improve test duration and memory consumption.
thunder/tests/test_networks.py
Outdated
from thunder.dynamo import thunderfx | ||
|
||
cfg = Phi3Config(**phi3_vision_cfg) | ||
cfg.num_hidden_layers = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the memory requirement for 1 layer?
thunder/tests/test_networks.py
Outdated
cfg.num_hidden_layers = 2 | ||
|
||
with torch.device("cuda"): | ||
model = AutoModelForCausalLM.from_config(cfg, trust_remote_code=False, torch_dtype=torch.bfloat16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing vocab_size
from 32064 to a smaller number should decrease the memory requirements of this test.
thunder/tests/test_networks.py
Outdated
|
||
@requiresCUDA | ||
def test_hf_phi3_vision(): | ||
# This test takes around 4045406208 bytes (~4GB) of memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a decorator to skip the test based on the memory requirements? There are NVIDIA internal CI jobs on hardware with a limited amount of memory that could potentially fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice +1 here
thunder/tests/test_networks.py
Outdated
@@ -558,6 +558,189 @@ def test_hf_llama(): | |||
assert len(get_fusion_symbols(thunder.last_traces(jm)[-1])) == 6 | |||
|
|||
|
|||
# We need to copy config here as the AutoModel doesn't work with `trust_remote_code=True` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error is - ValueError: Loading microsoft/Phi-3-vision-128k-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option
trust_remote_code=True to remove this error.
Script
import torch
from transformers import AutoModelForCausalLM, AutoConfig
from transformers.models.phi3 import Phi3Config
model_id = "microsoft/Phi-3-vision-128k-instruct"
# Initialize the pre-trained model
cfg = AutoConfig.from_pretrained(model_id, trust_remote_code=False)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
cfg.num_hidden_layers = 2
from thunder.dynamo import thunderfx
from thunder.dynamo.report import get_thunder_fxgraph_reports, fx_report, ThunderCompileSpecification
with torch.device("cuda"):
model = AutoModelForCausalLM.from_config(cfg, trust_remote_code=False, torch_dtype=torch.bfloat16)
print(model)
thunder/tests/test_networks.py
Outdated
"original_max_position_embeddings": 4096, | ||
"rms_norm_eps": 1e-05, | ||
"rope_scaling": { | ||
"long_factor": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length of this array of values is given by ~hidden_size
/2*num_attention_heads
so reducing the size of the model will improve this line count here. Also I think these are just numbers that you can set programmatically since we don't care about correctness in this test.
thunder/tests/test_networks.py
Outdated
# for `Phi-3-vision-128k-instruct`. | ||
phi3_vision_cfg = { | ||
"_name_or_path": "Phi-3-vision-128k-instruct", | ||
"architectures": ["Phi3VForCausalLM"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that this is not a model included in the transformers
library and therefore one would think that with thrust_remote_code=False
this model wouldn't be loaded at all, but it doesn't seem to be the case 🤔
thunder/tests/test_networks.py
Outdated
64.81001281738281, | ||
64.81001281738281, | ||
], | ||
"short_factor": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the long_factor
thunder/tests/test_networks.py
Outdated
"transformers_version": "4.38.1", | ||
"use_cache": True, | ||
"vocab_size": 32064, | ||
"_attn_implementation": "sdpa", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if _attn_implementation
is not set?
thunder/tests/test_networks.py
Outdated
|
||
@requiresCUDA | ||
def test_hf_phi3_vision(): | ||
# This test takes around 4045406208 bytes (~4GB) of memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice +1 here
thunder/tests/test_networks.py
Outdated
loss_grad = torch.randn_like(expected.loss) | ||
actual_grads = torch.autograd.grad(actual.loss, model.parameters(), grad_outputs=loss_grad) | ||
expected_grads = torch.autograd.grad(expected.loss, model.parameters(), grad_outputs=loss_grad) | ||
torch.testing.assert_close(actual_grads, expected_grads, rtol=1e-2, atol=1e-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking for info now that I see the custom tolerances, on what order of magnitude is the mismatch that you are getting?
Good catch, on checking the type of model, I see it is With Thanks @riccardofelluga @IvanYashchuk Script to check model import torch
from transformers import AutoModelForCausalLM
from transformers.models.phi3 import Phi3Config
model_id = "microsoft/Phi-3-vision-128k-instruct"
phi3_vision_cfg = {
"_name_or_path": "Phi-3-vision-128k-instruct",
"architectures": [
"Phi3VForCausalLM"
],
"attention_dropout": 0.0,
"auto_map": {
"AutoConfig": "configuration_phi3_v.Phi3VConfig",
"AutoModelForCausalLM": "modeling_phi3_v.Phi3VForCausalLM"
},
"bos_token_id": 1,
"embd_layer": {
"embedding_cls": "image",
"hd_transform_order": "sub_glb",
"projection_cls": "mlp",
"use_hd_transform": True,
"with_learnable_separator": True
},
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 3072,
"img_processor": {
"image_dim_out": 1024,
"model_name": "openai/clip-vit-large-patch14-336",
"name": "clip_vision_model",
"num_img_tokens": 144
},
"initializer_range": 0.02,
"intermediate_size": 8192,
"max_position_embeddings": 131072,
"model_type": "phi3_v",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"original_max_position_embeddings": 4096,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"long_factor": [
1.0299999713897705,
1.0499999523162842,
1.0499999523162842,
1.0799999237060547,
1.2299998998641968,
1.2299998998641968,
1.2999999523162842,
1.4499999284744263,
1.5999999046325684,
1.6499998569488525,
1.8999998569488525,
2.859999895095825,
3.68999981880188,
5.419999599456787,
5.489999771118164,
5.489999771118164,
9.09000015258789,
11.579999923706055,
15.65999984741211,
15.769999504089355,
15.789999961853027,
18.360000610351562,
21.989999771118164,
23.079999923706055,
30.009998321533203,
32.35000228881836,
32.590003967285156,
35.56000518798828,
39.95000457763672,
53.840003967285156,
56.20000457763672,
57.95000457763672,
59.29000473022461,
59.77000427246094,
59.920005798339844,
61.190006256103516,
61.96000671386719,
62.50000762939453,
63.3700065612793,
63.48000717163086,
63.48000717163086,
63.66000747680664,
63.850006103515625,
64.08000946044922,
64.760009765625,
64.80001068115234,
64.81001281738281,
64.81001281738281
],
"short_factor": [
1.05,
1.05,
1.05,
1.1,
1.1,
1.1,
1.2500000000000002,
1.2500000000000002,
1.4000000000000004,
1.4500000000000004,
1.5500000000000005,
1.8500000000000008,
1.9000000000000008,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.000000000000001,
2.1000000000000005,
2.1000000000000005,
2.2,
2.3499999999999996,
2.3499999999999996,
2.3499999999999996,
2.3499999999999996,
2.3999999999999995,
2.3999999999999995,
2.6499999999999986,
2.6999999999999984,
2.8999999999999977,
2.9499999999999975,
3.049999999999997,
3.049999999999997,
3.049999999999997
],
"type": "su"
},
"rope_theta": 10000.0,
"sliding_window": 131072,
"tie_word_embeddings": False,
"torch_dtype": "bfloat16",
"transformers_version": "4.38.1",
"use_cache": True,
"vocab_size": 32064,
"_attn_implementation": "sdpa"
}
# Initialize the pre-trained model
cfg = Phi3Config(**phi3_vision_cfg)
# cfg = AutoConfig.from_pretrained(model_id, trust_remote_code=False)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
cfg.num_hidden_layers = 2
from thunder.dynamo import thunderfx
from thunder.dynamo.report import get_thunder_fxgraph_reports, fx_report, ThunderCompileSpecification
with torch.device("cuda"):
model = AutoModelForCausalLM.from_config(cfg, trust_remote_code=False, torch_dtype=torch.bfloat16)
print(model) |
… phi3-vision-test
… phi3-vision-test
# eager - 3596534784 | ||
@requiresCUDA | ||
@requiresDeviceMemory(required_memory_bytes=int(3.6 * 1024 * 1024 * 1024)) | ||
@pytest.mark.parametrize("attn_implementation", [None, "eager"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't add sdpa
here as
ValueError: Phi3VForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: https://github.com/huggingface/transformers/issues/28005. If you believe this error is a bug, please open an issue in Transformers GitHub repository and load your model with the argument `attn_implementation="eager"` meanwhile. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="eager")`
would there be a chance to have a cut-down model even more similar to what we do with the other models? I'm quite weary of this and we have been seeing OOM lately. |
Updated. |
Adds a test for Phi-3-vision-128k-instruct
The test takes around 4GB of device memory while running.
The relaxed tolerances worked fine with 50 repeats of the test -
pytest thunder/tests/test_networks.py -k phi3_vi --count 50