Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Remove repeat operation and add acceleration support for other architectures #395

Merged
merged 13 commits into from
Aug 13, 2023

Conversation

LLukas22
Copy link
Contributor

@LLukas22 LLukas22 commented Aug 3, 2023

Removes the op_repeat ggml operation and replaces it with broadcasting.

This should enable the gpu acceleration of: gpt2, gptj, gptneox and falcon

Also fixes: #391

@philpax
Copy link
Collaborator

philpax commented Aug 6, 2023

This is very cool and looks reasonable from my glance over the diff. (Couple of docs tweaks, but I can do those.) What do you need from me to get this across the line?

@LLukas22
Copy link
Contributor Author

LLukas22 commented Aug 6, 2023

I just need some more time to get falcon/gptneox working. GPT2 needs some additional cuda-copy kernels, i'll try to ask the cuda god himself for any advice there.

I'll probably merge this and create additional PRs if i get one of the other architectures working.

This should also enable metal support for all of the above mentioned architectures but i can't test those. Currently i can only confirm gpt-j as working with cuda acceleration

@philpax
Copy link
Collaborator

philpax commented Aug 6, 2023

No problem, take your time. Let me know if you want me to test anything Metal.

@LLukas22
Copy link
Contributor Author

LLukas22 commented Aug 8, 2023

Ok i played around a bit and here are my results:

  • GPT-J: Works with CUDA support without any problems.
  • GPT-2: Needs a f16-f16 copy kernel. After i added it it can generate some tokens before the output becomes NaN.
  • GPT-NEOX: Infers but only produces NaN output.
  • Falcon: 7B and 40B also need the f16-f16 copy kernel. 40B then fails on the RoPE opperation as its tensor has a invalid size meaning there is something wrong with the inference code.

I would like to keep the offloading code and work on these problems in additional PRs. Would be great if you could test these architectures on metal and report if they work/what error they throw.

@LLukas22 LLukas22 marked this pull request as ready for review August 8, 2023 09:13
@philpax
Copy link
Collaborator

philpax commented Aug 8, 2023

Will test soon-ish!

@philpax
Copy link
Collaborator

philpax commented Aug 13, 2023

Tested with Metal.

Llama-2: Works great. No issues.
Bloom: Produces an invalid weight after mostly sensible generation:

This is a description of what a banana is:  You have to imagine that it has many, long and soft trunks. They’re covered with little round seeds.  What if you put them in the mouth? Then they would be chewed up!   */
    static void func_150The application panicked (crashed).
Message:  WeightedIndex error: InvalidWeight
Location: crates/llm-base/src/samplers.rs:157

GPT-2: Immediate not-implemented:

<fim_prefix>This is a description of what a banana is: 
GGML_ASSERT: llm/target/release/build/ggml-sys-152c0cd097a1ac69/out/ggml-metal.m:846: false && "not implemented"

GPT-J: Similar to Bloom:

"This is a description of what a banana is:  it's yellow, it has two halves that come apart, and inside there are long fibers." "It tastes good when you eat it but also makes for interesting problems in science class because everything about the physical properties of bananas suggests they should be heavy. But asThe application panicked (crashed).
Message:  WeightedIndex error: InvalidWeight
Location: crates/llm-base/src/samplers.rs:157

GPT-NeoX: Immediate invalid-weight:

<|padding|>This is a description of what a banana is: 
The application panicked (crashed).
Message:  WeightedIndex error: InvalidWeight
Location: crates/llm-base/src/samplers.rs:157

MPT: Similar to GPT-2:

This is a description of what a banana is: 
GGML_ASSERT: llm/target/release/build/ggml-sys-152c0cd097a1ac69/out/ggml-metal.m:905: false && "not implemented"

So, not a magic fix - looks like Metal is still undercooked - but it's an improvement over the current state of affairs, so I'm going to merge it.

@philpax philpax merged commit c3eab08 into rustformers:main Aug 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stack overflow after new merge
2 participants