Dependency-free MiniLM embeddings in C. In ~2000 lines of code you can load model weights, tokenize text, and get embeddings.
This toy project re-implements a distilled BERT (MiniLM) with:
- A custom tensor library
- A
.tbf
tensor file format - A WordPiece tokenizer
Quick background:
- BERT: Bidirectional Encoder Representations from Transformers
- Transformer: Neural net architecture built on attention
- Attention: Figures out which words matter most in context
make run
Expected files at runtime, included in the assets
directory:
bert_weights.tbf # model weights in custom TBF format
vocab.txt # tokenizer vocabulary
#include "minilm.h"
#include "s8.h"
#include "nn.h"
int main(void) {
const char *question = "what's the capital of germany?";
minilm_t m;
minilm_create(&m, "bert_weights.tbf", "vocab.txt");
// candidates
da_s8 choices = {0};
da_s8_append(&choices, m_s8("paris"));
da_s8_append(&choices, m_s8("london"));
da_s8_append(&choices, m_s8("berlin"));
da_s8_append(&choices, m_s8("madrid"));
da_s8_append(&choices, m_s8("rome"));
// embed
da_tensor_t vecs = {0};
for (size_t i = 0; i < choices.len; i++) {
tensor_t v;
minilm_embed(m, (char*)choices.data[i].data, choices.data[i].len, &v);
da_tensor_t_append(&vecs, v);
}
tensor_t q; minilm_embed(m, (char*)question, strlen(question), &q);
// nearest neighbor (L2)
size_t best = nearest_index(vecs, q);
printf("query : %s\nanswer: %s\n", question, choices.data[best].data);
minilm_destroy(&m);
return 0;
}
// lifecycle
int minilm_create(minilm_t *m, const char *tbf_path, const char *vocab_txt_path);
void minilm_destroy(minilm_t *m);
// inference (tokenize → encode → return embedding `tensor_t`)
t_status minilm_embed(minilm_t m, char *str, size_t str_len, tensor_t *out);
t_status minilm_tokenize(minilm_t m, s8 str, da_u32 *ids);
t_status minilm_encode(minilm_t m, da_u32 ids, tensor_t *out);
minilm_embed
= tokenize → encode → return embeddingtensor_t
.- Embedding size/architecture match MiniLM (hidden size 384, 6 layers) as reflected in the structs.
- Weights: expected in
.tbf
format namedbert_weights.tbf
. Seescripts/dump_tbf1.py
for an example. - Vocab:
vocab.txt
(one token per line, BERT-style).