Skip to content

Inconsistent Bert Embedding output from embedding.cpp vs llama.cpp server #5801

Closed
@tybalex

Description

@tybalex

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
System: Mac M2 Max, OS version Sonoma 14.2.1
llama.cpp version: the latest main branch as of today -- Feb 29 2024
Steps to reproduce:

  1. get the miniLM v6 embedding model from https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
  2. convert it to a gguf: python convert-hf-to-gguf.py --outfile minilm.gguf --outtype f16 all-MiniLM-L6-v2
  3. get embedding by running the embedding.cpp example:
./embedding -m minilm.gguf --log-disable -p "prince"  -ngl 99

output:

embedding 0: -0.036860 0.041229 0.041869 0.041696 -0.024387 0.021268 0.099094 0.011098 0.004876 -0.046105 -0.059572 -0.029206 -0.002308 -0.083870 0.013792 0.045119 0.036514 0.097176 -0.023481 0.003588 -0.043461 -0.074656 -0.040720 0.011558 -0.078210 0.046951 0.077435 0.004489 -0.060391 -0.080243 0.012820 -0.069973 0.019214 0.052969 -0.030262 -0.031453 -0.038022 0.017023 -0.020902 0.049196 0.017024 0.016393 0.008936 0.048631 0.069907 -0.020548 -0.046252 -0.026483 0.003169 0.013299 -0.079572 0.067772 0.040068 -0.022476 -0.014747 0.048846 0.033857 -0.032799 0.091509 0.051690 0.008189 -0.021081 0.001400 0.008993 0.021849 0.031127 -0.006536 0.050509 0.000690 0.028135 -0.014092 0.036171 -0.006388 -0.098974 -0.053224 0.001512 0.039952 -0.103960 -0.017700 0.057037 -0.057424 -0.036943 -0.117668 -0.062323 -0.055580 -0.013580 0.019112 -0.006367 -0.044186 0.028640 0.034396 0.007887 -0.003876 0.035907 -0.105881 0.043552 0.109067 -0.079995 -0.056017 0.249003 -0.012027 0.053932 0.008201 0.036213 -0.010334 -0.036057 0.002546 0.002683 0.025382 -0.012716 -0.020140 -0.078044 -0.039887 0.008628 0.067469 -0.022117 0.026651 0.109164 0.025518 0.054249 0.032891 0.058545 -0.060185 -0.029968 -0.064054 -0.055134 -0.032062 -0.000000 0.023324 -0.049649 0.117273 0.017088 0.019652 0.014610 -0.022676 0.011467 -0.011677 0.041203 0.031052 -0.027259 -0.110914 -0.038720 0.008130 0.052588 -0.028568 -0.003301 0.003100 0.006353 -0.040994 0.127952 0.021877 0.031906 -0.059635 -0.035798 0.031328 -0.068779 0.069636 0.045009 0.014252 0.011030 0.031737 0.002799 -0.033560 -0.047101 -0.037738 -0.087675 -0.027649 -0.025608 -0.060913 -0.039024 0.059457 0.010664 -0.121365 0.004126 0.012507 -0.000979 0.027688 -0.006558 -0.002946 -0.058397 -0.050696 0.001921 -0.053137 -0.115931 -0.066889 0.010602 0.062312 -0.006016 0.122319 0.041626 0.040150 0.072560 0.002546 -0.014622 0.038451 0.046488 0.053678 -0.031259 -0.035950 0.078569 0.064906 0.009808 0.041488 0.000068 -0.116803 0.023502 -0.065634 0.007458 -0.072649 0.051103 0.002858 0.019553 0.069425 -0.018859 -0.049330 -0.055800 0.067243 0.079522 -0.080934 -0.026456 0.001355 0.035326 -0.037046 0.000000 -0.017859 -0.090733 0.059320 0.116348 0.035735 -0.031313 -0.003781 0.047099 -0.047402 -0.008223 -0.070947 -0.049794 0.117659 -0.091668 0.087372 0.001555 0.072384 0.017297 -0.044615 0.050339 -0.011038 -0.085056 -0.026181 -0.050121 0.014113 0.036018 -0.056591 -0.023850 -0.032202 0.011580 0.039819 0.107372 -0.033940 0.037161 -0.045220 0.065391 -0.047808 -0.007598 0.053954 0.021879 -0.010463 -0.055802 0.013686 0.107812 0.052527 0.006023 0.043839 0.066618 0.125267 0.018725 -0.106315 0.037015 -0.106218 -0.013532 0.039010 -0.004414 -0.057585 -0.003809 0.003101 0.074716 0.009683 0.002036 0.006178 -0.003161 -0.073237 0.124757 0.086564 0.037509 0.049667 -0.044155 0.084439 0.070531 -0.017895 0.068558 -0.018488 0.014648 -0.024090 0.032648 -0.036987 -0.051995 -0.023524 -0.045418 -0.060536 0.016920 -0.006316 -0.036464 0.091840 0.011222 -0.040349 -0.045818 0.034356 -0.018599 -0.035700 -0.064079 -0.020323 -0.000000 -0.027605 -0.003925 0.010733 -0.067407 0.036873 0.058545 -0.006928 -0.027731 0.018732 0.016576 0.076662 -0.074990 0.019847 -0.015870 0.035534 -0.048891 0.000796 -0.033112 -0.000590 0.017791 -0.023484 -0.021673 0.042539 -0.044049 -0.049730 -0.017081 -0.047250 -0.016705 0.033407 -0.013550 0.081062 -0.000081 -0.044371 0.035021 0.037153 0.030417 -0.021360 -0.015494 0.048064 -0.026997 -0.028125 0.004956 -0.014228 0.006874 -0.015346 -0.013070 0.011872 -0.036261 -0.044620 0.003815 -0.025334 -0.007049 0.077197 0.041967 0.083626 0.010578 0.028892 0.044975 -0.127511 0.023095 0.127018 -0.046558 -0.012191 0.023901
  1. run the model as server mode: ./server -ngl 99 -m minilm.gguf --port 8019 --host 0.0.0.0 --embedding
  2. make a curl request for embedding the same word 'prince':
curl http://localhost:8019/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
        "input": ["prince"],
        "model":"minilm",
        "encoding_format": "float"
}'

Output:

{"data":[{"embedding":[-0.9113406538963318,0.20497266948223114,-0.05761261284351349,-0.1636194884777069,0.44455260038375854,0.08682599663734436,0.744450569152832,-0.03699329495429993,-0.5454420447349548,-0.4367120862007141,-0.07723181694746017,-0.19148540496826172,0.13116762042045593,-0.11302244663238525,0.22115185856819153,-0.2790590822696686,-0.35611769556999207,-0.3604397475719452,-0.046874552965164185,0.2406463325023651,-0.3163539171218872,0.10239306837320328,0.19693052768707275,-0.0763457864522934,-0.09630294144153595,-0.05589187890291214,-0.09276457875967026,0.21844173967838287,-0.0804717019200325,-0.36773544549942017,-0.34730562567710876,0.15269294381141663,0.525522768497467,-0.07095099985599518,-0.23811766505241394,-0.006564701441675425,-0.4374021589756012,-0.46858465671539307,0.12920492887496948,0.33534786105155945,0.11649847030639648,-0.35319891571998596,-0.1711725890636444,0.4761241674423218,0.413016676902771,-0.5310176014900208,0.13408483564853668,0.10075130313634872,0.42842817306518555,-0.2796638011932373,-0.15863917768001556,-0.14105407893657684,0.008249428123235703,-0.20821517705917358,0.3040767014026642,-0.14677953720092773,0.02599211037158966,-0.43027985095977783,0.18312017619609833,-0.13386058807373047,0.04302752763032913,0.07234133034944534,-0.9491630792617798,0.5000106692314148,-0.2805364727973938,0.13198977708816528,0.2868933081626892,-0.023559710010886192,-0.10929538309574127,0.3662813603878021,0.4353457987308502,-0.2235320508480072,0.03139911964535713,-0.6918119192123413,0.2975662350654602,0.20270520448684692,0.36246049404144287,0.1681171953678131,0.382007896900177,0.2189856618642807,-0.03305833786725998,0.07860100269317627,-0.5726684331893921,-0.016888771206140518,0.05296405777335167,-0.2583688795566559,-0.10431863367557526,-0.17162971198558807,-0.20041799545288086,0.3255464434623718,-0.33213892579078674,-0.38952669501304626,0.5833108425140381,-0.3714039921760559,-0.5180072784423828,-0.3262985646724701,0.19199618697166443,0.2610461413860321,-0.5567361116409302,2.5617096424102783,0.30518174171447754,0.17281579971313477,0.6737371683120728,-0.06384599953889847,0.2474653720855713,-0.3317680358886719,-0.3517586588859558,-0.1164378821849823,0.042621731758117676,0.059900663793087006,-0.0794406533241272,-0.17526094615459442,-0.08849326521158218,0.26080191135406494,0.23311978578567505,0.5595173239707947,-0.2197084128856659,-0.03350114822387695,0.20333600044250488,0.0007340013980865479,-0.04374226927757263,-0.23907612264156342,0.050346773117780685,0.06973200291395187,0.521949827671051,-0.774983286857605,0.026607058942317963,-4.3562683350689554e-32,0.04155222326517105,0.09096819907426834,0.053881049156188965,0.06294772773981094,0.2517699599266052,0.2299978882074356,0.039326563477516174,-0.26331254839897156,-0.106712207198143,0.3398314118385315,0.18540745973587036,0.18217933177947998,-0.09577855467796326,0.2027318924665451,0.3415830135345459,-0.0406864732503891,0.029755856841802597,-0.07023705542087555,0.01803077943623066,0.050523675978183746,-0.01507059670984745,-0.10759429633617401,-0.21249070763587952,0.25786375999450684,-0.14086893200874329,-0.34865590929985046,0.20274148881435394,-0.5245954990386963,0.19924187660217285,0.05390845239162445,-0.0734206885099411,0.31001031398773193,-0.2408670336008072,0.07897859811782837,-0.28300997614860535,-0.10916518419981003,0.11163628846406937,-0.46925920248031616,-0.21956908702850342,0.24989114701747894,-0.2724396586418152,0.07712733000516891,-0.18821056187152863,0.03654399886727333,-0.11255963891744614,1.0593624114990234,-0.11500044167041779,-0.03266199678182602,-0.3653186559677124,0.3872261941432953,-0.04578177630901337,-0.12280713766813278,-0.7358079552650452,0.20281924307346344,-0.1127614974975586,-0.21654832363128662,0.12830042839050293,-0.16045168042182922,0.02366318553686142,-0.005510266870260239,0.07435282319784164,0.17047753930091858,-0.01223795861005783,-0.0014442875981330872,-0.5768333077430725,-0.33469757437705994,0.20493263006210327,-0.0976196825504303,0.15795372426509857,-0.1524704098701477,-0.3397308588027954,0.1795063316822052,0.2294308990240097,0.2730967402458191,-0.1142972856760025,-0.13708257675170898,-0.12865528464317322,0.15601131319999695,0.44888660311698914,0.015956934541463852,0.37817105650901794,-0.34698113799095154,-0.0024727322161197662,-0.09719957411289215,0.6553595662117004,0.31680095195770264,-0.04055440053343773,-0.1605125516653061,0.15430790185928345,-0.23208746314048767,-0.2697872519493103,0.42774200439453125,-0.10204043984413147,0.5559654831886292,0.11096914112567902,2.616153190820014e-32,-0.5045391917228699,0.11107780784368515,-0.18468350172042847,0.5339704155921936,-0.08558398485183716,0.03379639610648155,0.2631123960018158,0.7414908409118652,-0.5430196523666382,0.5138211250305176,0.12716256082057953,-0.31557697057724,0.19252167642116547,0.253345787525177,-0.02899402379989624,0.11646998673677444,0.39367929100990295,-0.03387301415205002,-0.28333616256713867,-0.21347321569919586,-0.11846308410167694,-0.20535527169704437,-0.016133740544319153,0.18760168552398682,-0.2259252518415451,0.1494227647781372,0.18264584243297577,0.3958255350589752,-0.14623713493347168,0.2270394116640091,0.1714852750301361,-0.07063024491071701,-0.2905765771865845,-0.36629199981689453,0.3616235852241516,0.5592570304870605,0.10643155872821808,0.024111144244670868,0.21289567649364471,-0.31754186749458313,0.15855254232883453,-0.008792846463620663,0.004980940371751785,0.6545747518539429,-0.19380448758602142,-0.03998925909399986,-0.20742420852184296,0.12010551989078522,0.28698107600212097,0.5303142666816711,-0.6662395000457764,-0.17525595426559448,-0.09595256298780441,0.3452081084251404,-0.152899831533432,0.10215282440185547,-0.20828397572040558,0.5832223296165466,-0.18213698267936707,0.0345468744635582,-0.10129953920841217,0.16032829880714417,-0.13427498936653137,0.047849901020526886,-0.0865342766046524,0.22245003283023834,-0.1609944850206375,-0.1925918459892273,-0.07725381851196289,-0.37833285331726074,0.05138351023197174,-0.012934232130646706,-0.8680322766304016,0.10170803964138031,-0.2544623613357544,-0.32966041564941406,-0.08313576132059097,0.008441970683634281,-0.03992752358317375,-0.2733607590198517,-0.2615377604961395,0.10742823779582977,-0.06549906730651855,-0.1308857947587967,-0.09799693524837494,-0.08787287026643753,0.06873519718647003,0.09506669640541077,-0.3629777133464813,0.15921449661254883,-0.06500373035669327,0.17909352481365204,0.8790280818939209,0.03772491589188576,0.08673910051584244,-8.565214670852583e-08,-0.33929646015167236,0.3118705451488495,-0.4805440902709961,-0.032953061163425446,0.34920498728752136,0.24195101857185364,-0.1646021157503128,0.2519652247428894,0.1799640655517578,-0.22356748580932617,0.5443459749221802,0.13451939821243286,-0.29193437099456787,0.3693276643753052,0.6018487811088562,-0.07137903571128845,-0.7007991671562195,-0.38815778493881226,-0.09892711043357849,-0.21787768602371216,-0.07027873396873474,-0.118606798350811,0.10142236948013306,-0.5028542280197144,0.061366863548755646,-0.010358983650803566,-0.3588971197605133,0.42797258496284485,0.46163269877433777,-0.1796817183494568,0.044185034930706024,0.10516093671321869,0.07580085843801498,0.1000712662935257,0.17967012524604797,-0.46841052174568176,-0.4112969636917114,0.10569800436496735,-0.0003655441105365753,0.034334391355514526,-0.37717923521995544,-0.11426356434822083,0.39737415313720703,0.1742405742406845,-0.43300509452819824,-0.09926651418209076,0.07001541554927826,-0.06793393939733505,-0.1917962282896042,-0.4800603985786438,0.06117616593837738,-0.08069508522748947,-0.025915905833244324,0.5345154404640198,0.6164687275886536,0.056552939116954803,0.11888577044010162,-0.17393869161605835,0.52826988697052,0.18463830649852753,1.0523483753204346,0.8015890717506409,0.259770929813385,-0.11053761094808578],"index":0,"object":"embedding"}],"model":"minilm","object":"list","usage":{"prompt_tokens":0,"total_tokens":0}}

Expected Behavior: the embedding from these 2 approaches should yield the same output

Actual Behavior: As you can see, the output embedding looks completely different from the one from step 3, not only the values, but the scales are different too.

=============================================================
And by the way, the embedding output I get from step 3 is almost the same with the one I got from using sentence_transformer python library, for example:

from sentence_transformers import SentenceTransformer
sentences = ["prince"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)

This indicates that the model conversion works correctly.
I think there's something wrong with the Bert Embedding of server mode.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions