feat: reduce cg memory footprint #225

pommedeterresautee · 2022-12-21T16:53:28Z

pommedeterresautee · 2022-12-22T14:28:40Z

test pass

================================================================================================== 2849 passed in 8941.45s (2:29:01) ===================================================================================================

gaetansnl · 2023-01-04T14:02:18Z

test/test_torchdynamo.py

        transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True, normalize=True)[0]
        assert (
-            transcription == "mister quilter is the apostle of the middle classes and we are glad to welcome his gospel"


about this, is it a good idea to use "in" ? it happened that the beggining was correct but not the end

it doesn't fail with beam > 1 or larger model, but in test we need to make it fast, so I guess this is the only way to make it ok. Do you see another way which doesn't require more computation?

I mean using "in" instead of "=="

gaetansnl · 2023-01-04T14:04:10Z

src/kernl/optimizer/attention.py

@@ -31,8 +42,11 @@ def attention_wrapper(q, k, v, output, sm_scale, is_causal, attention_mask):

    # When there is a large difference between those dimensions, our kernel become inefficient
    # (almost no parallelization), so we use pytorch instead
-    if q.size(-2) == 1 and k.size(-2) > 50:
-        attention_reference(q, k, v, output, sm_scale, is_causal=is_causal, attention_mask=attention_mask)
+    if q.size(-2) == 1 and k.size(-2) > 50 and (attention_mask is None) and not is_causal:


don't understand "(attention_mask is None) and not is_causal" the condition is only for attention_vec_mat_forward, for attention_reference it's ok

you are right, I ll modify it.

pommedeterresautee · 2023-01-05T08:45:57Z

test pass after all the commits above


================================================================================================== 2855 passed in 9623.61s (2:40:23) ===================================================================================================

gaetansnl

minor changes only

gaetansnl · 2023-01-05T12:58:12Z

src/kernl/optimizer/attention.py

-        k = k.view(k.size(0), 1, k.size(-2), k.size(-1))
-        v = v.view(v.size(0), 1, v.size(-2), v.size(-1))
-        output = output.view(output.size(0), 1, output.size(-2), output.size(-1))
+        q.unsqueeze_(dim=1)


you are mutating the input ?

Yes on PyTorch the underscore at the end of a method always mean the op is done in place, aka the original object is mutated

gaetansnl · 2023-01-05T12:58:59Z

src/kernl/optimizer/attention.py

    else:
        attention_forward(q, k, v, output, sm_scale, is_causal=is_causal, attention_mask=attention_mask)

    if extend_head:
-        return output.view(q.size(0), q.size(-2), q.size(-1))
+        output.squeeze_(dim=1)


idem, you are mutating the input ?

Same as above

gaetansnl · 2023-01-05T13:05:20Z

requirements.txt

@@ -1,5 +1,5 @@
 triton==2.0.0.dev20221202
-torch== 2.0.0.dev20221214+cu117
+torch==2.0.0.dev20230104+cu117


dockerfile update missing

gaetansnl · 2023-01-05T13:27:56Z

src/kernl/implementations/attention_vec_mat.py

-        v_col_major = v.permute(0, 1, 3, 2).contiguous().permute(0, 1, 3, 2)
-        # mutate v, so its storage is col major
-        v.set_(source=v_col_major)
+    # print("q", q.size(), q.stride(), len(q.untyped_storage()), q.dtype)


gaetansnl · 2023-01-05T13:42:26Z

src/kernl/model_optimization.py

@@ -29,7 +29,7 @@ def _compiler(gm: torch.fx.GraphModule, example_inputs: List[torch.Tensor]):
    return cuda_graphs_wrapper(gm, example_inputs)


-def optimize_model(original_model: PreTrainedModel) -> None:
+def optimize_model(model: PreTrainedModel) -> None:


this is a breaking change, but I think we will do breaking changes anyway, could be nice to document it

gaetansnl · 2023-01-05T14:21:57Z

test/test_torchdynamo.py

        transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True, normalize=True)[0]
        assert (
-            transcription == "mister quilter is the apostle of the middle classes and we are glad to welcome his gospel"


I mean using "in" instead of "=="

gaetansnl · 2023-01-05T14:25:02Z

src/kernl/implementations/cuda_graph.py

@@ -12,31 +12,77 @@
 #  See the License for the specific language governing permissions and


I don't know why this file is in this folder, it should be in optimizer, maybe mistake in previous commits

pommedeterresautee added 2 commits December 21, 2022 14:29

feat: reduce memory footprint of whisper

76f3313

fix: small model for tests

567ce2f

pommedeterresautee self-assigned this Dec 21, 2022

github-actions bot added the feature label Dec 21, 2022

pommedeterresautee added performance make things faster, always and removed feature labels Dec 21, 2022

fix: no memory leak

f2c63ba

pommedeterresautee added 7 commits December 28, 2022 21:15

feat: enable new kernel

96d189b

fix: refactoring

90135d0

feat: script whisper

5083e65

feat: switch to managed memory

4355210

fix: unit tests

0acc061

fix: unit tests

9e72390

fix: clean

828c108

pommedeterresautee requested a review from gaetansnl January 4, 2023 07:03

gaetansnl reviewed Jan 4, 2023

View reviewed changes

pommedeterresautee added 6 commits January 4, 2023 15:41

fix: PR remark

15fcaae

fix: update torch

4648734

fix: use attention small in more cases

f0b7b9c

fix: refactoring

d440837

feat: add tiny beam 5 test

a210df3

fix: refactoring

0e18e00

pommedeterresautee requested a review from gaetansnl January 4, 2023 23:41

fix: memory leak

325666a

feat: add test

a6a6699

gaetansnl requested changes Jan 5, 2023

View reviewed changes

feat: PR review

df05074

pommedeterresautee added 2 commits January 5, 2023 16:46

fix: remove print

87ca373

fix: update Dockerfile torch version

944432e

pommedeterresautee requested a review from gaetansnl January 5, 2023 15:47

fix: pr review

63aba49

gaetansnl approved these changes Jan 5, 2023

View reviewed changes

pommedeterresautee merged commit 8b0ec72 into main Jan 6, 2023

pommedeterresautee deleted the feat/recycle_tensor_cg branch January 6, 2023 10:22

		@@ -12,31 +12,77 @@
		# See the License for the specific language governing permissions and

feat: reduce cg memory footprint #225

feat: reduce cg memory footprint #225

Uh oh!

Conversation

pommedeterresautee commented Dec 21, 2022

Uh oh!

pommedeterresautee commented Dec 22, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pommedeterresautee commented Jan 5, 2023

Uh oh!

gaetansnl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pommedeterresautee Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gaetansnl left a comment •

edited

Loading

pommedeterresautee Jan 5, 2023 •

edited

Loading