Embedding.cpp

embedding.cpp is a text embedding tool via BERT base model upon ggml.

embedding.cpp is a fork from bert.cpp.

Thanks to bert.cpp contributors. Here is also an original README.

This fork is a result from pr 32, pr 31 and issue 36.

Mention: embedding.cpp is still WIP and not ready for production.

Feature (Origin)

Plain C/C++ implementation without dependencies
Inherit support for various architectures from ggml (x86 with AVX2, ARM, etc.)
Choose your model size from 32/16/4 bits per model weigth
all-MiniLM-L6-v2 with 4bit quantization is only 14MB. Inference RAM usage depends on the length of the input
Sample cpp server over tcp socket and a python test client
Benchmarks to validate correctness and speed of inference

Feature (Improve)

Build tokenizer with tokenizers-cpp.
- Can correctly handle asian writing (CJK, and so on).
- Can process cased/uncased with respect to origin config in tokenizer.json.
Upgrade to use GGUF model file format. So it is easy to expand and keep compatible.

With above, we can run embedding.cpp with more models like m3e, e5 and so on.

Limitation

Only support bert base model for embedding. other architecture like SGPT is not supported.
Only run on CPU.
All outputs are mean pooled and normalized.
Batching support is WIP.
- Lack of real batching means that this library is slower than it could be in usecases where you have multiple sentences.

Usage

Checkout submodules

git submodule update --init --recursive

Build

By default, it build both

the native binaries, like the example server, with static libraries;
and the dynamic library for usage from e.g. Python.

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
cd ..

rust should be installed. see rust or run curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Converting models to gguf format

Converting models is similar to llama.cpp. Use models/convert-to-gguf.py to make hf models into either f32 or f16 gguf models. Then use ./build/bin/quantize to turn those into Q4_0, 4bit per weight models.

There is also models/run_conversions.sh which creates all 4 versions (f32, f16, Q4_0, Q4_1) at once.

pip install -r requirements.txt
cd models
# Clone a model from hf
git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
# Run conversions to 4 ggml formats (f32, f16, Q4_0, Q4_1)
sh run_conversions.sh all-MiniLM-L6-v2

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
benchmarks		benchmarks
examples		examples
ggml @ 57c468b		ggml @ 57c468b
include_win		include_win
models		models
tokenizers-cpp @ 467d5cd		tokenizers-cpp @ 467d5cd
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
README.origin.md		README.origin.md
bert.cpp		bert.cpp
bert.h		bert.h
gguf.h		gguf.h
requirements.txt		requirements.txt
tokenizer.cpp		tokenizer.cpp
tokenizer.h		tokenizer.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embedding.cpp

Feature (Origin)

Feature (Improve)

Limitation

Usage

Checkout submodules

Build

Converting models to gguf format

About

Releases 2

Packages

Languages

License

FFengIll/embedding.cpp

Folders and files

Latest commit

History

Repository files navigation

Embedding.cpp

Feature (Origin)

Feature (Improve)

Limitation

Usage

Checkout submodules

Build

Converting models to gguf format

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages