export llama failing with errors for runtime errors

Export llama is failing with errors for llama and stories models

Error for llama model: `Could not import fairseq2 modules....RuntimeError: Trying to create tensor with negative dimension -1: [-1, 4096]`
Error for stories model: `Could not import fairseq2 modules....RuntimeError: mmap can only be used with files saved with `torch.save(./stories/stories110M.pt, _use_new_zipfile_serialization=True), please torch.save your checkpoint with this option in order to use mmap.`

### Steps to run for Llama model 
Follow the steps from [LLM manual](https://github.com/pytorch/executorch/tree/release/0.2/examples/models/llama2)
Download the meta versions of llama weights
Run export_llama script
```
python -m examples.models.llama2.export_llama --checkpoint $MODEL_PATH/consolidated.00.pth --params $MODEL_PATH/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32
```

#### Error details for llama2 model export
```
Could not import fairseq2 modules.
INFO:root:Loading model with checkpoint=/Users/gchauhan/dev/llama-fast/checkpoints/meta-llama/Llama-2-7b/consolidated.00.pth, params=/Users/gchauhan/dev/llama-fast/checkpoints/meta-llama/Llama-2-7b/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
    ^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 408, in export_llama
    return _export_llama(modelname, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 529, in _export_llama
    builder_exported_to_edge = _prepare_for_llama_export(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 486, in _prepare_for_llama_export
    load_llama_model(
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/builder.py", line 83, in load_llama_model
    model, example_inputs, _ = EagerModelFactory.create_model(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/model_factory.py", line 44, in create_model
    model = model_class(**kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/model.py", line 139, in __init__
    self.model_ = Transformer(model_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 418, in __init__
    self.tok_embeddings = nn.Embedding(params.vocab_size, params.dim)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 143, in __init__
    self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Trying to create tensor with negative dimension -1: [-1, 4096]
```


### Steps for Stories model
Download the model from the links specified
Run
`python -m examples.models.llama2.export_llama -c ./stories/stories110M.pt -p ./stories/params.json`

#### Error details for Stories model export
```
Could not import fairseq2 modules.
INFO:root:Loading model with checkpoint=./stories/stories110M.pt, params=./stories/params.json, use_kv_cache=False, weight_type=WeightType.LLAMA
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
    ^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 408, in export_llama
    return _export_llama(modelname, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 529, in _export_llama
    builder_exported_to_edge = _prepare_for_llama_export(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 486, in _prepare_for_llama_export
    load_llama_model(
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/builder.py", line 83, in load_llama_model
    model, example_inputs, _ = EagerModelFactory.create_model(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/model_factory.py", line 44, in create_model
    model = model_class(**kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/model.py", line 75, in __init__
    checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/serialization.py", line 1032, in load
    raise RuntimeError("mmap can only be used with files saved with "
RuntimeError: mmap can only be used with files saved with `torch.save(./stories/stories110M.pt, _use_new_zipfile_serialization=True), please torch.save your checkpoint with this option in order to use mmap.
```

### Environment
```
python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 2.4.0.dev20240324
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.4.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: version 3.29.0
Libc version: N/A

Python version: 3.11.8 (main, Feb 26 2024, 15:36:12) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-14.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] executorch==0.1.0
[pip3] numpy==1.26.4
[pip3] torch==2.4.0.dev20240324
[pip3] torchao==0.1
[pip3] torchaudio==2.2.0.dev20240324
[pip3] torchsr==1.0.4
[pip3] torchvision==0.19.0.dev20240324
[conda] executorch                0.1.0                    pypi_0    pypi
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] torch                     2.4.0.dev20240324          pypi_0    pypi
[conda] torchao                   0.1                      pypi_0    pypi
[conda] torchaudio                2.2.0.dev20240324          pypi_0    pypi
[conda] torchsr                   1.0.4                    pypi_0    pypi
[conda] torchvision               0.19.0.dev20240324          pypi_0    pypi
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

export llama failing with errors for runtime errors #2907

Steps to run for Llama model

Error details for llama2 model export

Steps for Stories model

Error details for Stories model export

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

export llama failing with errors for runtime errors #2907

Description

Steps to run for Llama model

Error details for llama2 model export

Steps for Stories model

Error details for Stories model export

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions