-
Notifications
You must be signed in to change notification settings - Fork 14k
Closed
Labels
modelModel specificModel specific
Description
There is a working bert.cpp implementation.
We should try to implement this in llama.cpp and update the embedding example to use it.
The implementation should follow mostly what we did to integrate Falcon.
Here are the main steps:
- Update
gguf.pywith BERT arch KV pairs and tensors - Python convert script using
gguf.pyto generate F16 model - add tokenizer implementation in
llama.cpp - add function to build BERT graph
- add any new ops in
ggmlif needed - add CUDA offloading
- add tokenizer tests
lin72h, Green-Sky, monatis, FSSRepo, JohnClaw and 38 more
Metadata
Metadata
Assignees
Labels
modelModel specificModel specific