Fix eval for .pte #1053

vmpuri · 2024-08-22T23:57:26Z

Issue
Inputs aren't set up correctly for .pte files. The input tensors must be static and cannot be reshaped. Currently, running eval will result in this error:

python3 torchchat.py eval llama3 --pte-path llama3.pte --limit 3    

...

Running loglikelihood_rolling requests
  0%|                                                                                                                                                                                                         | 0/3 [00:00<?, ?it/s][tensor_impl.cpp:93] Attempted to resize a static tensor to a new shape at dimension 1 old_size: 1 new_size: 1263
[method.cpp:829] Error setting input 0: 0x10
  0%|                                                                                                                                                                                                         | 0/3 [00:00<?, ?it/s]
Time to run eval: 4.57s.
Traceback (most recent call last):
  File "/Users/puri/torchchat/torchchat.py", line 92, in <module>
    eval_main(args)
  File "/Users/puri/torchchat/eval.py", line 252, in main
    result = eval(
             ^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/eval.py", line 198, in eval
    eval_results = evaluate(
                   ^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/evaluator.py", line 373, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 840, in loglikelihood_rolling
    string_nll = self._loglikelihood_tokens(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 1033, in _loglikelihood_tokens
    self._model_call(batched_inps, **call_kwargs), dim=-1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/eval.py", line 146, in _model_call
    logits = self._model_forward(x, input_pos)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/eval.py", line 240, in <lambda>
    model_forward = lambda x, input_pos: model(x, input_pos)  # noqa
                                         ^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/build/model_et.py", line 23, in forward
    logits = self.model_.forward(forward_inputs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: method->set_inputs() for method 'forward' failed with error 0x12

This issue originates from setting input shapes incorrectly during prefill.

Testing
Run eval on exported llama3.pte

python3 torchchat.py eval llama3 --pte-path llama3.pte --limit 3    
Warning: compilation is not available with device MPS, ignoring option to engage compilation
NumExpr defaulting to 16 threads.
PyTorch version 2.5.0.dev20240716 available.
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Using device=mps
Loading model...
Cannot load specified PTE to mps. Attempting to load model to CPU instead
Time to load model: 0.05 seconds
Loading custom ops library: /Users/puri/torchchat/.venv/lib/python3.11/site-packages/executorch/examples/models/llama2/custom_ops/libcustom_ops_aot_lib.dylib
[program.cpp:134] InternalConsistency verification requested but not available
-----------------------------------------------------------
Using device 'cpu'
[Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte
[Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False
Repo card metadata block was not found. Setting CardData to empty.
Repo card metadata block was not found. Setting CardData to empty.
Building contexts for wikitext on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1368.01it/s]
Running loglikelihood_rolling requests
  0%|                                                                                                                                                                                                         | 0/3 [00:00<?, ?it/s]torch.Size([1, 1263, 128256])
 33%|████████████████████████████████████████████████████████████████                                                                                                                                | 1/3 [01:45<03:31, 105.91s/it]torch.Size([1, 2048, 128256])
torch.Size([1, 2048, 128256])
torch.Size([1, 2048, 128256])
 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                | 2/3 [10:49<06:03, 363.60s/it]torch.Size([1, 2048, 128256])
torch.Size([1, 2048, 128256])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [16:53<00:00, 337.73s/it]
Time to run eval: 1017.84s.
Time in model.forward: 1012.67s, over 6 model evaluations
forward run time stats - Median: 180.97s Min: 105.85s Max: 181.98s
For model llama3.pte
wikitext:
 word_perplexity,none: 14.2482
 byte_perplexity,none: 1.6776
 bits_per_byte,none: 0.7464
 alias: wikitext

pytorch-bot · 2024-08-22T23:57:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1053

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 67b23c1 with merge base d5bb3c6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jack-Khuu

Thanks for digging into this

eval.py

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 22, 2024

vmpuri force-pushed the vmpuri-eval_pte_fix branch 4 times, most recently from 4f26636 to 4e7f0a6 Compare August 24, 2024 00:12

vmpuri marked this pull request as ready for review August 24, 2024 00:13

Jack-Khuu approved these changes Aug 24, 2024

View reviewed changes

eval.py Outdated Show resolved Hide resolved

eval.py Show resolved Hide resolved

Fix eval for .pte

67b23c1

vmpuri force-pushed the vmpuri-eval_pte_fix branch from 4e7f0a6 to 67b23c1 Compare August 26, 2024 22:54

vmpuri merged commit 0922e65 into main Aug 27, 2024
51 checks passed

vmpuri mentioned this pull request Aug 27, 2024

Slow eval performance for .pte models #1066

Open

mikekgfb mentioned this pull request Nov 7, 2024

Eval fails on CUDA with AOTI exported model #1311

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix eval for .pte #1053

Fix eval for .pte #1053

Uh oh!

vmpuri commented Aug 22, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 22, 2024 •

edited

Loading

Uh oh!

Jack-Khuu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix eval for .pte #1053

Fix eval for .pte #1053

Uh oh!

Conversation

vmpuri commented Aug 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1053

✅ No Failures

Uh oh!

Jack-Khuu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vmpuri commented Aug 22, 2024 •

edited

Loading

pytorch-bot bot commented Aug 22, 2024 •

edited

Loading