chore: Fixes required for LLM models #3002

peri044 · 2024-07-11T16:41:24Z

Description

Converter additions for LLM models
Fix memory allocations on GPU - Models can now be exported on CPU and only use GPU for TRT compilation.

Inputs: List[Tensor: (1, (min=1, max=64))@int64]
    ...
    TRT Engine #1 - Submodule name: _run_on_acc_0
     Engine Inputs: List[Tensor: (1, (min=1, max=64))@int64]
     Number of Operators in Engine: 143
     Engine Outputs: List[Tensor: (1, (min=1, max=64), 32000)@float32]
    ...
   Outputs: List[Tensor: (1, (min=1, max=64), 32000)@float32]

Modifications to dryrun tracker to handle dynamic shapes.
LLM examples

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

…llm_examples_main

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-19 21:00:09.967336+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-19 21:00:32.451960+00:00
@@ -224,20 +224,22 @@
    """Format shapes and dtypes of input Tensors into a readable string"""

    def input_formatter_helper(shapes: Any, dtypes: Any) -> str:
        """Helper for input formatter"""
        # Base case 1 - single static/dynamic shape, single dtype
-        if isinstance(shapes, tuple) and all(isinstance(elt, (int, tuple)) for elt in shapes):
+        if isinstance(shapes, tuple) and all(
+            isinstance(elt, (int, tuple)) for elt in shapes
+        ):
            input_shape_string = "Tensor: ("
            for elt in shapes:
                if isinstance(elt, tuple):
-                    input_shape_string+= f"(min={elt[0]}, max={elt[1]}), "
+                    input_shape_string += f"(min={elt[0]}, max={elt[1]}), "
                else:
-                    input_shape_string+= f"{elt}, "
+                    input_shape_string += f"{elt}, "
            input_shape_string = input_shape_string[:-2] + ")" + f"@{str(dtypes)[6:]}, "
            return input_shape_string
-        
+
        # Base case 2 - dynamic shape, single dtype
        elif (
            isinstance(shapes, dict)
            and len(shapes) == 3
            and all(
--- /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-19 21:00:10.003336+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-19 21:00:37.999905+00:00
@@ -28,19 +28,16 @@
}


def load_hf_model(model_name_hf):
    print("Loading user-specified HF model: ", model_name_hf)
-    model_hf = (
-        AutoModelForCausalLM.from_pretrained(
-            model_name_hf,
-            trust_remote_code=True,
-            use_cache=False,
-            attn_implementation="eager",
-        )
-        .eval()
-    )
+    model_hf = AutoModelForCausalLM.from_pretrained(
+        model_name_hf,
+        trust_remote_code=True,
+        use_cache=False,
+        attn_implementation="eager",
+    ).eval()

    return {"model": model_hf}


class ModelStorage:

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-20 22:09:12.087830+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-20 22:09:32.444447+00:00
@@ -224,20 +224,22 @@
    """Format shapes and dtypes of input Tensors into a readable string"""

    def input_formatter_helper(shapes: Any, dtypes: Any) -> str:
        """Helper for input formatter"""
        # Base case 1 - single static/dynamic shape, single dtype
-        if isinstance(shapes, tuple) and all(isinstance(elt, (int, tuple)) for elt in shapes):
+        if isinstance(shapes, tuple) and all(
+            isinstance(elt, (int, tuple)) for elt in shapes
+        ):
            input_shape_string = "Tensor: ("
            for elt in shapes:
                if isinstance(elt, tuple):
-                    input_shape_string+= f"(min={elt[0]}, max={elt[1]}), "
+                    input_shape_string += f"(min={elt[0]}, max={elt[1]}), "
                else:
-                    input_shape_string+= f"{elt}, "
+                    input_shape_string += f"{elt}, "
            input_shape_string = input_shape_string[:-2] + ")" + f"@{str(dtypes)[6:]}, "
            return input_shape_string
-        
+
        # Base case 2 - dynamic shape, single dtype
        elif (
            isinstance(shapes, dict)
            and len(shapes) == 3
            and all(
--- /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-20 22:09:12.123831+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-20 22:09:37.620289+00:00
@@ -28,19 +28,16 @@
}


def load_hf_model(model_name_hf):
    print("Loading user-specified HF model: ", model_name_hf)
-    model_hf = (
-        AutoModelForCausalLM.from_pretrained(
-            model_name_hf,
-            trust_remote_code=True,
-            use_cache=False,
-            attn_implementation="eager",
-        )
-        .eval()
-    )
+    model_hf = AutoModelForCausalLM.from_pretrained(
+        model_name_hf,
+        trust_remote_code=True,
+        use_cache=False,
+        attn_implementation="eager",
+    ).eval()

    return {"model": model_hf}


class ModelStorage:

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-21 00:33:04.790449+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-21 00:33:27.241394+00:00
@@ -224,20 +224,22 @@
    """Format shapes and dtypes of input Tensors into a readable string"""

    def input_formatter_helper(shapes: Any, dtypes: Any) -> str:
        """Helper for input formatter"""
        # Base case 1 - single static/dynamic shape, single dtype
-        if isinstance(shapes, tuple) and all(isinstance(elt, (int, tuple)) for elt in shapes):
+        if isinstance(shapes, tuple) and all(
+            isinstance(elt, (int, tuple)) for elt in shapes
+        ):
            input_shape_string = "Tensor: ("
            for elt in shapes:
                if isinstance(elt, tuple):
-                    input_shape_string+= f"(min={elt[0]}, max={elt[1]}), "
+                    input_shape_string += f"(min={elt[0]}, max={elt[1]}), "
                else:
-                    input_shape_string+= f"{elt}, "
+                    input_shape_string += f"{elt}, "
            input_shape_string = input_shape_string[:-2] + ")" + f"@{str(dtypes)[6:]}, "
            return input_shape_string
-        
+
        # Base case 2 - dynamic shape, single dtype
        elif (
            isinstance(shapes, dict)
            and len(shapes) == 3
            and all(
--- /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-21 00:33:04.830449+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-21 00:33:32.549136+00:00
@@ -28,19 +28,16 @@
}


def load_hf_model(model_name_hf):
    print("Loading user-specified HF model: ", model_name_hf)
-    model_hf = (
-        AutoModelForCausalLM.from_pretrained(
-            model_name_hf,
-            trust_remote_code=True,
-            use_cache=False,
-            attn_implementation="eager",
-        )
-        .eval()
-    )
+    model_hf = AutoModelForCausalLM.from_pretrained(
+        model_name_hf,
+        trust_remote_code=True,
+        use_cache=False,
+        attn_implementation="eager",
+    ).eval()

    return {"model": model_hf}


class ModelStorage:

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-21 00:41:21.928115+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-21 00:41:47.424333+00:00
@@ -224,20 +224,22 @@
    """Format shapes and dtypes of input Tensors into a readable string"""

    def input_formatter_helper(shapes: Any, dtypes: Any) -> str:
        """Helper for input formatter"""
        # Base case 1 - single static/dynamic shape, single dtype
-        if isinstance(shapes, tuple) and all(isinstance(elt, (int, tuple)) for elt in shapes):
+        if isinstance(shapes, tuple) and all(
+            isinstance(elt, (int, tuple)) for elt in shapes
+        ):
            input_shape_string = "Tensor: ("
            for elt in shapes:
                if isinstance(elt, tuple):
-                    input_shape_string+= f"(min={elt[0]}, max={elt[1]}), "
+                    input_shape_string += f"(min={elt[0]}, max={elt[1]}), "
                else:
-                    input_shape_string+= f"{elt}, "
+                    input_shape_string += f"{elt}, "
            input_shape_string = input_shape_string[:-2] + ")" + f"@{str(dtypes)[6:]}, "
            return input_shape_string
-        
+
        # Base case 2 - dynamic shape, single dtype
        elif (
            isinstance(shapes, dict)
            and len(shapes) == 3
            and all(
--- /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-21 00:41:21.964116+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-21 00:41:52.826949+00:00
@@ -28,19 +28,16 @@
}


def load_hf_model(model_name_hf):
    print("Loading user-specified HF model: ", model_name_hf)
-    model_hf = (
-        AutoModelForCausalLM.from_pretrained(
-            model_name_hf,
-            trust_remote_code=True,
-            use_cache=False,
-            attn_implementation="eager",
-        )
-        .eval()
-    )
+    model_hf = AutoModelForCausalLM.from_pretrained(
+        model_name_hf,
+        trust_remote_code=True,
+        use_cache=False,
+        attn_implementation="eager",
+    ).eval()

    return {"model": model_hf}


class ModelStorage:

py/torch_tensorrt/dynamo/conversion/_conversion.py

narendasan

LGTM

py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py

peri044 added 24 commits June 12, 2024 17:24

chore: add gpt2 example

2ea181a

chore: add llama2 example

37b65a5

Merge branch 'main' into llm_examples_main

bd12b12

Merge branch 'main' into llm_examples_main

4a9f73e

Merge branch 'main' into llm_examples_main

0387d0b

chore: updates

6193939

Merge branch 'main' into llm_examples_main

9d3296e

Merge branch 'main' into llm_examples_main

84fc49c

chore: rebase

ff17d91

Merge branch 'llm_examples_main' of github.com:pytorch/TensorRT into …

8e6ba26

…llm_examples_main

Merge branch 'main' into llm_examples_main

67ec408

chore: remove aten.full decomposition

9af8e39

chore: fix expand DS support

50d4096

chore: minor fix

59febf5

chore: updates

c3e4382

chore: add testcase

0673db4

Merge branch 'main' into full

0b62f8f

Merge branch 'full' into fix_expand_ds

54f6410

Merge branch 'fix_expand_ds' into llm_examples_main

ae3d6b2

chore: updates

4464fd5

chore: updates

63b13cf

Merge branch 'main' into llm_examples_main

3d10b92

chore: updates

e97a94f

chore: updates

4f503a8

facebook-github-bot added the cla signed label Jul 11, 2024

github-actions bot requested changes Aug 19, 2024

View reviewed changes

peri044 requested a review from narendasan August 20, 2024 22:08

github-actions bot requested changes Aug 20, 2024

View reviewed changes

chore: updates

7be8604

github-actions bot removed the component: tests Issues re: Tests label Aug 21, 2024

github-actions bot requested changes Aug 21, 2024

View reviewed changes

peri044 requested a review from zewenli98 August 21, 2024 00:41

github-actions bot requested changes Aug 21, 2024

View reviewed changes

Merge branch 'main' into llm_examples_main

4d75a2e

This comment was marked as resolved.

Sign in to view

narendasan reviewed Aug 21, 2024

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_conversion.py Show resolved Hide resolved

narendasan approved these changes Aug 21, 2024

View reviewed changes

Merge branch 'main' into llm_examples_main

6a215f8

This comment was marked as resolved.

Sign in to view

chore: updates

510c5ae

This was referenced Aug 23, 2024

🐛 [Bug] torch._export.verifier.SpecViolationError when using Torch-TensorRT #2655

Open

🐛 [torch.export][llama2] Accuracy issues with llama model #2964

Closed

HolyWu reviewed Aug 24, 2024

View reviewed changes

py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated Show resolved Hide resolved

Merge branch 'main' into llm_examples_main

2fcdaad

github-actions bot added the component: torch_compile label Aug 28, 2024

This comment was marked as resolved.

Sign in to view

peri044 added 2 commits August 28, 2024 09:49

chore: updates

0a429c3

chore: updates

58b5bfd

github-actions bot added the component: tests Issues re: Tests label Aug 28, 2024

chore: CI test fixes

4e404df

github-actions bot removed the component: torch_compile label Aug 28, 2024

peri044 merged commit fa812a9 into main Aug 29, 2024
67 checks passed

keehyuna mentioned this pull request Sep 4, 2024

🐛 [Bug] Compiler error while running sd_unet model #3144

Open

HolyWu mentioned this pull request Sep 8, 2024

feat: log_softmax decomposition #3137

Merged

7 tasks

chohk88 mentioned this pull request Dec 18, 2024

[Coverage] ValueError: The meta val for input node max_pool2d_default is of type : <class 'tuple'> #3185

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Fixes required for LLM models #3002

chore: Fixes required for LLM models #3002

peri044 commented Jul 11, 2024 •

edited

Loading

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

This comment was marked as resolved.

narendasan left a comment

This comment was marked as resolved.

This comment was marked as resolved.

chore: Fixes required for LLM models #3002

chore: Fixes required for LLM models #3002

Conversation

peri044 commented Jul 11, 2024 • edited Loading

Description

Type of change

Checklist:

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

narendasan left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

peri044 commented Jul 11, 2024 •

edited

Loading