Inconsistent Bert Embedding output from embedding.cpp vs llama.cpp server #5801

tybalex · 2024-02-29T19:10:06Z

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
System: Mac M2 Max, OS version Sonoma 14.2.1
llama.cpp version: the latest main branch as of today -- Feb 29 2024
Steps to reproduce:

get the miniLM v6 embedding model from https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
convert it to a gguf: python convert-hf-to-gguf.py --outfile minilm.gguf --outtype f16 all-MiniLM-L6-v2
get embedding by running the embedding.cpp example:

./embedding -m minilm.gguf --log-disable -p "prince"  -ngl 99

output:

embedding 0: -0.036860 0.041229 0.041869 0.041696 -0.024387 0.021268 0.099094 0.011098 0.004876 -0.046105 -0.059572 -0.029206 -0.002308 -0.083870 0.013792 0.045119 0.036514 0.097176 -0.023481 0.003588 -0.043461 -0.074656 -0.040720 0.011558 -0.078210 0.046951 0.077435 0.004489 -0.060391 -0.080243 0.012820 -0.069973 0.019214 0.052969 -0.030262 -0.031453 -0.038022 0.017023 -0.020902 0.049196 0.017024 0.016393 0.008936 0.048631 0.069907 -0.020548 -0.046252 -0.026483 0.003169 0.013299 -0.079572 0.067772 0.040068 -0.022476 -0.014747 0.048846 0.033857 -0.032799 0.091509 0.051690 0.008189 -0.021081 0.001400 0.008993 0.021849 0.031127 -0.006536 0.050509 0.000690 0.028135 -0.014092 0.036171 -0.006388 -0.098974 -0.053224 0.001512 0.039952 -0.103960 -0.017700 0.057037 -0.057424 -0.036943 -0.117668 -0.062323 -0.055580 -0.013580 0.019112 -0.006367 -0.044186 0.028640 0.034396 0.007887 -0.003876 0.035907 -0.105881 0.043552 0.109067 -0.079995 -0.056017 0.249003 -0.012027 0.053932 0.008201 0.036213 -0.010334 -0.036057 0.002546 0.002683 0.025382 -0.012716 -0.020140 -0.078044 -0.039887 0.008628 0.067469 -0.022117 0.026651 0.109164 0.025518 0.054249 0.032891 0.058545 -0.060185 -0.029968 -0.064054 -0.055134 -0.032062 -0.000000 0.023324 -0.049649 0.117273 0.017088 0.019652 0.014610 -0.022676 0.011467 -0.011677 0.041203 0.031052 -0.027259 -0.110914 -0.038720 0.008130 0.052588 -0.028568 -0.003301 0.003100 0.006353 -0.040994 0.127952 0.021877 0.031906 -0.059635 -0.035798 0.031328 -0.068779 0.069636 0.045009 0.014252 0.011030 0.031737 0.002799 -0.033560 -0.047101 -0.037738 -0.087675 -0.027649 -0.025608 -0.060913 -0.039024 0.059457 0.010664 -0.121365 0.004126 0.012507 -0.000979 0.027688 -0.006558 -0.002946 -0.058397 -0.050696 0.001921 -0.053137 -0.115931 -0.066889 0.010602 0.062312 -0.006016 0.122319 0.041626 0.040150 0.072560 0.002546 -0.014622 0.038451 0.046488 0.053678 -0.031259 -0.035950 0.078569 0.064906 0.009808 0.041488 0.000068 -0.116803 0.023502 -0.065634 0.007458 -0.072649 0.051103 0.002858 0.019553 0.069425 -0.018859 -0.049330 -0.055800 0.067243 0.079522 -0.080934 -0.026456 0.001355 0.035326 -0.037046 0.000000 -0.017859 -0.090733 0.059320 0.116348 0.035735 -0.031313 -0.003781 0.047099 -0.047402 -0.008223 -0.070947 -0.049794 0.117659 -0.091668 0.087372 0.001555 0.072384 0.017297 -0.044615 0.050339 -0.011038 -0.085056 -0.026181 -0.050121 0.014113 0.036018 -0.056591 -0.023850 -0.032202 0.011580 0.039819 0.107372 -0.033940 0.037161 -0.045220 0.065391 -0.047808 -0.007598 0.053954 0.021879 -0.010463 -0.055802 0.013686 0.107812 0.052527 0.006023 0.043839 0.066618 0.125267 0.018725 -0.106315 0.037015 -0.106218 -0.013532 0.039010 -0.004414 -0.057585 -0.003809 0.003101 0.074716 0.009683 0.002036 0.006178 -0.003161 -0.073237 0.124757 0.086564 0.037509 0.049667 -0.044155 0.084439 0.070531 -0.017895 0.068558 -0.018488 0.014648 -0.024090 0.032648 -0.036987 -0.051995 -0.023524 -0.045418 -0.060536 0.016920 -0.006316 -0.036464 0.091840 0.011222 -0.040349 -0.045818 0.034356 -0.018599 -0.035700 -0.064079 -0.020323 -0.000000 -0.027605 -0.003925 0.010733 -0.067407 0.036873 0.058545 -0.006928 -0.027731 0.018732 0.016576 0.076662 -0.074990 0.019847 -0.015870 0.035534 -0.048891 0.000796 -0.033112 -0.000590 0.017791 -0.023484 -0.021673 0.042539 -0.044049 -0.049730 -0.017081 -0.047250 -0.016705 0.033407 -0.013550 0.081062 -0.000081 -0.044371 0.035021 0.037153 0.030417 -0.021360 -0.015494 0.048064 -0.026997 -0.028125 0.004956 -0.014228 0.006874 -0.015346 -0.013070 0.011872 -0.036261 -0.044620 0.003815 -0.025334 -0.007049 0.077197 0.041967 0.083626 0.010578 0.028892 0.044975 -0.127511 0.023095 0.127018 -0.046558 -0.012191 0.023901

run the model as server mode: ./server -ngl 99 -m minilm.gguf --port 8019 --host 0.0.0.0 --embedding
make a curl request for embedding the same word 'prince':

curl http://localhost:8019/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
        "input": ["prince"],
        "model":"minilm",
        "encoding_format": "float"
}'

Output:

{"data":[{"embedding":[-0.9113406538963318,0.20497266948223114,-0.05761261284351349,-0.1636194884777069,0.44455260038375854,0.08682599663734436,0.744450569152832,-0.03699329495429993,-0.5454420447349548,-0.4367120862007141,-0.07723181694746017,-0.19148540496826172,0.13116762042045593,-0.11302244663238525,0.22115185856819153,-0.2790590822696686,-0.35611769556999207,-0.3604397475719452,-0.046874552965164185,0.2406463325023651,-0.3163539171218872,0.10239306837320328,0.19693052768707275,-0.0763457864522934,-0.09630294144153595,-0.05589187890291214,-0.09276457875967026,0.21844173967838287,-0.0804717019200325,-0.36773544549942017,-0.34730562567710876,0.15269294381141663,0.525522768497467,-0.07095099985599518,-0.23811766505241394,-0.006564701441675425,-0.4374021589756012,-0.46858465671539307,0.12920492887496948,0.33534786105155945,0.11649847030639648,-0.35319891571998596,-0.1711725890636444,0.4761241674423218,0.413016676902771,-0.5310176014900208,0.13408483564853668,0.10075130313634872,0.42842817306518555,-0.2796638011932373,-0.15863917768001556,-0.14105407893657684,0.008249428123235703,-0.20821517705917358,0.3040767014026642,-0.14677953720092773,0.02599211037158966,-0.43027985095977783,0.18312017619609833,-0.13386058807373047,0.04302752763032913,0.07234133034944534,-0.9491630792617798,0.5000106692314148,-0.2805364727973938,0.13198977708816528,0.2868933081626892,-0.023559710010886192,-0.10929538309574127,0.3662813603878021,0.4353457987308502,-0.2235320508480072,0.03139911964535713,-0.6918119192123413,0.2975662350654602,0.20270520448684692,0.36246049404144287,0.1681171953678131,0.382007896900177,0.2189856618642807,-0.03305833786725998,0.07860100269317627,-0.5726684331893921,-0.016888771206140518,0.05296405777335167,-0.2583688795566559,-0.10431863367557526,-0.17162971198558807,-0.20041799545288086,0.3255464434623718,-0.33213892579078674,-0.38952669501304626,0.5833108425140381,-0.3714039921760559,-0.5180072784423828,-0.3262985646724701,0.19199618697166443,0.2610461413860321,-0.5567361116409302,2.5617096424102783,0.30518174171447754,0.17281579971313477,0.6737371683120728,-0.06384599953889847,0.2474653720855713,-0.3317680358886719,-0.3517586588859558,-0.1164378821849823,0.042621731758117676,0.059900663793087006,-0.0794406533241272,-0.17526094615459442,-0.08849326521158218,0.26080191135406494,0.23311978578567505,0.5595173239707947,-0.2197084128856659,-0.03350114822387695,0.20333600044250488,0.0007340013980865479,-0.04374226927757263,-0.23907612264156342,0.050346773117780685,0.06973200291395187,0.521949827671051,-0.774983286857605,0.026607058942317963,-4.3562683350689554e-32,0.04155222326517105,0.09096819907426834,0.053881049156188965,0.06294772773981094,0.2517699599266052,0.2299978882074356,0.039326563477516174,-0.26331254839897156,-0.106712207198143,0.3398314118385315,0.18540745973587036,0.18217933177947998,-0.09577855467796326,0.2027318924665451,0.3415830135345459,-0.0406864732503891,0.029755856841802597,-0.07023705542087555,0.01803077943623066,0.050523675978183746,-0.01507059670984745,-0.10759429633617401,-0.21249070763587952,0.25786375999450684,-0.14086893200874329,-0.34865590929985046,0.20274148881435394,-0.5245954990386963,0.19924187660217285,0.05390845239162445,-0.0734206885099411,0.31001031398773193,-0.2408670336008072,0.07897859811782837,-0.28300997614860535,-0.10916518419981003,0.11163628846406937,-0.46925920248031616,-0.21956908702850342,0.24989114701747894,-0.2724396586418152,0.07712733000516891,-0.18821056187152863,0.03654399886727333,-0.11255963891744614,1.0593624114990234,-0.11500044167041779,-0.03266199678182602,-0.3653186559677124,0.3872261941432953,-0.04578177630901337,-0.12280713766813278,-0.7358079552650452,0.20281924307346344,-0.1127614974975586,-0.21654832363128662,0.12830042839050293,-0.16045168042182922,0.02366318553686142,-0.005510266870260239,0.07435282319784164,0.17047753930091858,-0.01223795861005783,-0.0014442875981330872,-0.5768333077430725,-0.33469757437705994,0.20493263006210327,-0.0976196825504303,0.15795372426509857,-0.1524704098701477,-0.3397308588027954,0.1795063316822052,0.2294308990240097,0.2730967402458191,-0.1142972856760025,-0.13708257675170898,-0.12865528464317322,0.15601131319999695,0.44888660311698914,0.015956934541463852,0.37817105650901794,-0.34698113799095154,-0.0024727322161197662,-0.09719957411289215,0.6553595662117004,0.31680095195770264,-0.04055440053343773,-0.1605125516653061,0.15430790185928345,-0.23208746314048767,-0.2697872519493103,0.42774200439453125,-0.10204043984413147,0.5559654831886292,0.11096914112567902,2.616153190820014e-32,-0.5045391917228699,0.11107780784368515,-0.18468350172042847,0.5339704155921936,-0.08558398485183716,0.03379639610648155,0.2631123960018158,0.7414908409118652,-0.5430196523666382,0.5138211250305176,0.12716256082057953,-0.31557697057724,0.19252167642116547,0.253345787525177,-0.02899402379989624,0.11646998673677444,0.39367929100990295,-0.03387301415205002,-0.28333616256713867,-0.21347321569919586,-0.11846308410167694,-0.20535527169704437,-0.016133740544319153,0.18760168552398682,-0.2259252518415451,0.1494227647781372,0.18264584243297577,0.3958255350589752,-0.14623713493347168,0.2270394116640091,0.1714852750301361,-0.07063024491071701,-0.2905765771865845,-0.36629199981689453,0.3616235852241516,0.5592570304870605,0.10643155872821808,0.024111144244670868,0.21289567649364471,-0.31754186749458313,0.15855254232883453,-0.008792846463620663,0.004980940371751785,0.6545747518539429,-0.19380448758602142,-0.03998925909399986,-0.20742420852184296,0.12010551989078522,0.28698107600212097,0.5303142666816711,-0.6662395000457764,-0.17525595426559448,-0.09595256298780441,0.3452081084251404,-0.152899831533432,0.10215282440185547,-0.20828397572040558,0.5832223296165466,-0.18213698267936707,0.0345468744635582,-0.10129953920841217,0.16032829880714417,-0.13427498936653137,0.047849901020526886,-0.0865342766046524,0.22245003283023834,-0.1609944850206375,-0.1925918459892273,-0.07725381851196289,-0.37833285331726074,0.05138351023197174,-0.012934232130646706,-0.8680322766304016,0.10170803964138031,-0.2544623613357544,-0.32966041564941406,-0.08313576132059097,0.008441970683634281,-0.03992752358317375,-0.2733607590198517,-0.2615377604961395,0.10742823779582977,-0.06549906730651855,-0.1308857947587967,-0.09799693524837494,-0.08787287026643753,0.06873519718647003,0.09506669640541077,-0.3629777133464813,0.15921449661254883,-0.06500373035669327,0.17909352481365204,0.8790280818939209,0.03772491589188576,0.08673910051584244,-8.565214670852583e-08,-0.33929646015167236,0.3118705451488495,-0.4805440902709961,-0.032953061163425446,0.34920498728752136,0.24195101857185364,-0.1646021157503128,0.2519652247428894,0.1799640655517578,-0.22356748580932617,0.5443459749221802,0.13451939821243286,-0.29193437099456787,0.3693276643753052,0.6018487811088562,-0.07137903571128845,-0.7007991671562195,-0.38815778493881226,-0.09892711043357849,-0.21787768602371216,-0.07027873396873474,-0.118606798350811,0.10142236948013306,-0.5028542280197144,0.061366863548755646,-0.010358983650803566,-0.3588971197605133,0.42797258496284485,0.46163269877433777,-0.1796817183494568,0.044185034930706024,0.10516093671321869,0.07580085843801498,0.1000712662935257,0.17967012524604797,-0.46841052174568176,-0.4112969636917114,0.10569800436496735,-0.0003655441105365753,0.034334391355514526,-0.37717923521995544,-0.11426356434822083,0.39737415313720703,0.1742405742406845,-0.43300509452819824,-0.09926651418209076,0.07001541554927826,-0.06793393939733505,-0.1917962282896042,-0.4800603985786438,0.06117616593837738,-0.08069508522748947,-0.025915905833244324,0.5345154404640198,0.6164687275886536,0.056552939116954803,0.11888577044010162,-0.17393869161605835,0.52826988697052,0.18463830649852753,1.0523483753204346,0.8015890717506409,0.259770929813385,-0.11053761094808578],"index":0,"object":"embedding"}],"model":"minilm","object":"list","usage":{"prompt_tokens":0,"total_tokens":0}}

Expected Behavior: the embedding from these 2 approaches should yield the same output

Actual Behavior: As you can see, the output embedding looks completely different from the one from step 3, not only the values, but the scales are different too.

=============================================================
And by the way, the embedding output I get from step 3 is almost the same with the one I got from using sentence_transformer python library, for example:

from sentence_transformers import SentenceTransformer
sentences = ["prince"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)

This indicates that the model conversion works correctly.
I think there's something wrong with the Bert Embedding of server mode.

The text was updated successfully, but these errors were encountered:

ngxson · 2024-02-29T21:00:44Z

Maybe related to #5796

tybalex · 2024-02-29T23:06:25Z

Maybe related to #5796

I think so. hopefully will be fixed by that.

tybalex · 2024-03-04T23:13:12Z

#5796 did NOT fix this issue.

ngxson · 2024-03-04T23:37:09Z

Can you check the cosine distance of vector produced by embedding.cpp vs server.cpp?

Also maybe try without GPU offloading?

tybalex · 2024-03-05T20:43:14Z

Can you check the cosine distance of vector produced by embedding.cpp vs server.cpp?

Also maybe try without GPU offloading?

I tried without GPU offloading, got the same output.

As for the cosine distance, I calculated cosine distance between word prince and a list of words ["king", "queen", "apple", "orange"] and sorted them:

from embedding.cpp output:
[('king', 0.4116078336488638),
('queen', 0.4211467172721288),
('apple', 0.6682980126084468),
('orange', 0.6874219028515791)]

from server.cpp:
[('orange', 0.009215513122614483),
('king', 0.009233457008902879),
('queen', 0.01777521161063844),
('apple', 0.020477966154721194)]

ngxson · 2024-03-06T10:16:28Z

There's currently a refactoring on server code, maybe this will be fixed: #5882

iamlemec · 2024-03-06T22:14:50Z

It looks like this is actually a tokenization issue. I'm seeing the output of /tokenize as being pretty garbled. First, it doesn't appear to be adding an BOS token. We're currently not specifying the add_bos_token flag in the GGUF files for embeddings, so we might want to do that.

Second, it looks like something is up with special_tokens_cache. It seems to be adding in any token that is the concatenation of two other valid tokens. But that ends up being tons of regular words in addition to actual special tokens. The cache isn't used for regular embeddings, but the server seems to want to use it.

Edit: If you force it to add a BOS token and turn off special token processing, the tokenization comes out correct. And in that case the embedding numbers are correct too, though they're not normalized, so they won't look the same as the output from embedding.

ggerganov · 2024-03-06T22:29:12Z

Yes, the special flag is always on in server:

llama.cpp/examples/server/server.cpp

Lines 471 to 477 in e04e04f

    
           std::vector<llama_token> tokenize(const json & json_prompt, bool add_bos) const 
        
           { 
        
               // TODO: currently, we tokenize using special tokens by default 
        
               //       this is not always correct (see https://github.com/ggerganov/llama.cpp/pull/4160#issuecomment-1824826216) 
        
               //       but it's better compared to completely ignoring ChatML and other chat templates 
        
               const bool TMP_FORCE_SPECIAL = true;

And this seems to tokenize incorrectly. Not sure if this is somehow a problem with the vocab or if we simply need to turn off special flag when using embeddings models.

We should fix this and the normalization after we merge #5882

iamlemec · 2024-03-11T06:54:30Z

Trying to figure out what's up with special_tokens_cache and looking through #3538 for guidance. Most models I'm looking at seem to correctly label special tokens with token_type. Do we have any examples of models that fail to do this properly? Seems like the kind of thing that should be taken care of during GGUF conversion.

tybalex · 2024-03-20T00:19:00Z

As I posted above, the embedding I got from embedding.cpp is the same with what I got from the origin model, so guess it's not a GGUF conversion issue. My observation is, with the SAME input and SAME gguf model, embedding.cpp and server.cpp yield different output.

github-actions · 2024-05-04T01:06:35Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

tybalex added the bug-unconfirmed label Feb 29, 2024

ggerganov mentioned this issue Mar 7, 2024

server : refactor #5882

Merged

4 tasks

github-actions bot added the stale label Apr 19, 2024

github-actions bot closed this as completed May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Bert Embedding output from embedding.cpp vs llama.cpp server #5801

Inconsistent Bert Embedding output from embedding.cpp vs llama.cpp server #5801

tybalex commented Feb 29, 2024 •

edited

Loading

ngxson commented Feb 29, 2024

tybalex commented Feb 29, 2024

tybalex commented Mar 4, 2024 •

edited

Loading

ngxson commented Mar 4, 2024 •

edited

Loading

tybalex commented Mar 5, 2024 •

edited

Loading

ngxson commented Mar 6, 2024

iamlemec commented Mar 6, 2024 •

edited

Loading

ggerganov commented Mar 6, 2024

iamlemec commented Mar 11, 2024

tybalex commented Mar 20, 2024

github-actions bot commented May 4, 2024

Inconsistent Bert Embedding output from embedding.cpp vs llama.cpp server #5801

Inconsistent Bert Embedding output from embedding.cpp vs llama.cpp server #5801

Comments

tybalex commented Feb 29, 2024 • edited Loading

ngxson commented Feb 29, 2024

tybalex commented Feb 29, 2024

tybalex commented Mar 4, 2024 • edited Loading

ngxson commented Mar 4, 2024 • edited Loading

tybalex commented Mar 5, 2024 • edited Loading

ngxson commented Mar 6, 2024

iamlemec commented Mar 6, 2024 • edited Loading

ggerganov commented Mar 6, 2024

iamlemec commented Mar 11, 2024

tybalex commented Mar 20, 2024

github-actions bot commented May 4, 2024

tybalex commented Feb 29, 2024 •

edited

Loading

tybalex commented Mar 4, 2024 •

edited

Loading

ngxson commented Mar 4, 2024 •

edited

Loading

tybalex commented Mar 5, 2024 •

edited

Loading

iamlemec commented Mar 6, 2024 •

edited

Loading