Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Bert Embedding output from embedding.cpp vs llama.cpp server #5801

Closed
tybalex opened this issue Feb 29, 2024 · 11 comments
Closed

Comments

@tybalex
Copy link

tybalex commented Feb 29, 2024

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
System: Mac M2 Max, OS version Sonoma 14.2.1
llama.cpp version: the latest main branch as of today -- Feb 29 2024
Steps to reproduce:

  1. get the miniLM v6 embedding model from https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
  2. convert it to a gguf: python convert-hf-to-gguf.py --outfile minilm.gguf --outtype f16 all-MiniLM-L6-v2
  3. get embedding by running the embedding.cpp example:
./embedding -m minilm.gguf --log-disable -p "prince"  -ngl 99

output:

embedding 0: -0.036860 0.041229 0.041869 0.041696 -0.024387 0.021268 0.099094 0.011098 0.004876 -0.046105 -0.059572 -0.029206 -0.002308 -0.083870 0.013792 0.045119 0.036514 0.097176 -0.023481 0.003588 -0.043461 -0.074656 -0.040720 0.011558 -0.078210 0.046951 0.077435 0.004489 -0.060391 -0.080243 0.012820 -0.069973 0.019214 0.052969 -0.030262 -0.031453 -0.038022 0.017023 -0.020902 0.049196 0.017024 0.016393 0.008936 0.048631 0.069907 -0.020548 -0.046252 -0.026483 0.003169 0.013299 -0.079572 0.067772 0.040068 -0.022476 -0.014747 0.048846 0.033857 -0.032799 0.091509 0.051690 0.008189 -0.021081 0.001400 0.008993 0.021849 0.031127 -0.006536 0.050509 0.000690 0.028135 -0.014092 0.036171 -0.006388 -0.098974 -0.053224 0.001512 0.039952 -0.103960 -0.017700 0.057037 -0.057424 -0.036943 -0.117668 -0.062323 -0.055580 -0.013580 0.019112 -0.006367 -0.044186 0.028640 0.034396 0.007887 -0.003876 0.035907 -0.105881 0.043552 0.109067 -0.079995 -0.056017 0.249003 -0.012027 0.053932 0.008201 0.036213 -0.010334 -0.036057 0.002546 0.002683 0.025382 -0.012716 -0.020140 -0.078044 -0.039887 0.008628 0.067469 -0.022117 0.026651 0.109164 0.025518 0.054249 0.032891 0.058545 -0.060185 -0.029968 -0.064054 -0.055134 -0.032062 -0.000000 0.023324 -0.049649 0.117273 0.017088 0.019652 0.014610 -0.022676 0.011467 -0.011677 0.041203 0.031052 -0.027259 -0.110914 -0.038720 0.008130 0.052588 -0.028568 -0.003301 0.003100 0.006353 -0.040994 0.127952 0.021877 0.031906 -0.059635 -0.035798 0.031328 -0.068779 0.069636 0.045009 0.014252 0.011030 0.031737 0.002799 -0.033560 -0.047101 -0.037738 -0.087675 -0.027649 -0.025608 -0.060913 -0.039024 0.059457 0.010664 -0.121365 0.004126 0.012507 -0.000979 0.027688 -0.006558 -0.002946 -0.058397 -0.050696 0.001921 -0.053137 -0.115931 -0.066889 0.010602 0.062312 -0.006016 0.122319 0.041626 0.040150 0.072560 0.002546 -0.014622 0.038451 0.046488 0.053678 -0.031259 -0.035950 0.078569 0.064906 0.009808 0.041488 0.000068 -0.116803 0.023502 -0.065634 0.007458 -0.072649 0.051103 0.002858 0.019553 0.069425 -0.018859 -0.049330 -0.055800 0.067243 0.079522 -0.080934 -0.026456 0.001355 0.035326 -0.037046 0.000000 -0.017859 -0.090733 0.059320 0.116348 0.035735 -0.031313 -0.003781 0.047099 -0.047402 -0.008223 -0.070947 -0.049794 0.117659 -0.091668 0.087372 0.001555 0.072384 0.017297 -0.044615 0.050339 -0.011038 -0.085056 -0.026181 -0.050121 0.014113 0.036018 -0.056591 -0.023850 -0.032202 0.011580 0.039819 0.107372 -0.033940 0.037161 -0.045220 0.065391 -0.047808 -0.007598 0.053954 0.021879 -0.010463 -0.055802 0.013686 0.107812 0.052527 0.006023 0.043839 0.066618 0.125267 0.018725 -0.106315 0.037015 -0.106218 -0.013532 0.039010 -0.004414 -0.057585 -0.003809 0.003101 0.074716 0.009683 0.002036 0.006178 -0.003161 -0.073237 0.124757 0.086564 0.037509 0.049667 -0.044155 0.084439 0.070531 -0.017895 0.068558 -0.018488 0.014648 -0.024090 0.032648 -0.036987 -0.051995 -0.023524 -0.045418 -0.060536 0.016920 -0.006316 -0.036464 0.091840 0.011222 -0.040349 -0.045818 0.034356 -0.018599 -0.035700 -0.064079 -0.020323 -0.000000 -0.027605 -0.003925 0.010733 -0.067407 0.036873 0.058545 -0.006928 -0.027731 0.018732 0.016576 0.076662 -0.074990 0.019847 -0.015870 0.035534 -0.048891 0.000796 -0.033112 -0.000590 0.017791 -0.023484 -0.021673 0.042539 -0.044049 -0.049730 -0.017081 -0.047250 -0.016705 0.033407 -0.013550 0.081062 -0.000081 -0.044371 0.035021 0.037153 0.030417 -0.021360 -0.015494 0.048064 -0.026997 -0.028125 0.004956 -0.014228 0.006874 -0.015346 -0.013070 0.011872 -0.036261 -0.044620 0.003815 -0.025334 -0.007049 0.077197 0.041967 0.083626 0.010578 0.028892 0.044975 -0.127511 0.023095 0.127018 -0.046558 -0.012191 0.023901
  1. run the model as server mode: ./server -ngl 99 -m minilm.gguf --port 8019 --host 0.0.0.0 --embedding
  2. make a curl request for embedding the same word 'prince':
curl http://localhost:8019/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
        "input": ["prince"],
        "model":"minilm",
        "encoding_format": "float"
}'

Output:

{"data":[{"embedding":[-0.9113406538963318,0.20497266948223114,-0.05761261284351349,-0.1636194884777069,0.44455260038375854,0.08682599663734436,0.744450569152832,-0.03699329495429993,-0.5454420447349548,-0.4367120862007141,-0.07723181694746017,-0.19148540496826172,0.13116762042045593,-0.11302244663238525,0.22115185856819153,-0.2790590822696686,-0.35611769556999207,-0.3604397475719452,-0.046874552965164185,0.2406463325023651,-0.3163539171218872,0.10239306837320328,0.19693052768707275,-0.0763457864522934,-0.09630294144153595,-0.05589187890291214,-0.09276457875967026,0.21844173967838287,-0.0804717019200325,-0.36773544549942017,-0.34730562567710876,0.15269294381141663,0.525522768497467,-0.07095099985599518,-0.23811766505241394,-0.006564701441675425,-0.4374021589756012,-0.46858465671539307,0.12920492887496948,0.33534786105155945,0.11649847030639648,-0.35319891571998596,-0.1711725890636444,0.4761241674423218,0.413016676902771,-0.5310176014900208,0.13408483564853668,0.10075130313634872,0.42842817306518555,-0.2796638011932373,-0.15863917768001556,-0.14105407893657684,0.008249428123235703,-0.20821517705917358,0.3040767014026642,-0.14677953720092773,0.02599211037158966,-0.43027985095977783,0.18312017619609833,-0.13386058807373047,0.04302752763032913,0.07234133034944534,-0.9491630792617798,0.5000106692314148,-0.2805364727973938,0.13198977708816528,0.2868933081626892,-0.023559710010886192,-0.10929538309574127,0.3662813603878021,0.4353457987308502,-0.2235320508480072,0.03139911964535713,-0.6918119192123413,0.2975662350654602,0.20270520448684692,0.36246049404144287,0.1681171953678131,0.382007896900177,0.2189856618642807,-0.03305833786725998,0.07860100269317627,-0.5726684331893921,-0.016888771206140518,0.05296405777335167,-0.2583688795566559,-0.10431863367557526,-0.17162971198558807,-0.20041799545288086,0.3255464434623718,-0.33213892579078674,-0.38952669501304626,0.5833108425140381,-0.3714039921760559,-0.5180072784423828,-0.3262985646724701,0.19199618697166443,0.2610461413860321,-0.5567361116409302,2.5617096424102783,0.30518174171447754,0.17281579971313477,0.6737371683120728,-0.06384599953889847,0.2474653720855713,-0.3317680358886719,-0.3517586588859558,-0.1164378821849823,0.042621731758117676,0.059900663793087006,-0.0794406533241272,-0.17526094615459442,-0.08849326521158218,0.26080191135406494,0.23311978578567505,0.5595173239707947,-0.2197084128856659,-0.03350114822387695,0.20333600044250488,0.0007340013980865479,-0.04374226927757263,-0.23907612264156342,0.050346773117780685,0.06973200291395187,0.521949827671051,-0.774983286857605,0.026607058942317963,-4.3562683350689554e-32,0.04155222326517105,0.09096819907426834,0.053881049156188965,0.06294772773981094,0.2517699599266052,0.2299978882074356,0.039326563477516174,-0.26331254839897156,-0.106712207198143,0.3398314118385315,0.18540745973587036,0.18217933177947998,-0.09577855467796326,0.2027318924665451,0.3415830135345459,-0.0406864732503891,0.029755856841802597,-0.07023705542087555,0.01803077943623066,0.050523675978183746,-0.01507059670984745,-0.10759429633617401,-0.21249070763587952,0.25786375999450684,-0.14086893200874329,-0.34865590929985046,0.20274148881435394,-0.5245954990386963,0.19924187660217285,0.05390845239162445,-0.0734206885099411,0.31001031398773193,-0.2408670336008072,0.07897859811782837,-0.28300997614860535,-0.10916518419981003,0.11163628846406937,-0.46925920248031616,-0.21956908702850342,0.24989114701747894,-0.2724396586418152,0.07712733000516891,-0.18821056187152863,0.03654399886727333,-0.11255963891744614,1.0593624114990234,-0.11500044167041779,-0.03266199678182602,-0.3653186559677124,0.3872261941432953,-0.04578177630901337,-0.12280713766813278,-0.7358079552650452,0.20281924307346344,-0.1127614974975586,-0.21654832363128662,0.12830042839050293,-0.16045168042182922,0.02366318553686142,-0.005510266870260239,0.07435282319784164,0.17047753930091858,-0.01223795861005783,-0.0014442875981330872,-0.5768333077430725,-0.33469757437705994,0.20493263006210327,-0.0976196825504303,0.15795372426509857,-0.1524704098701477,-0.3397308588027954,0.1795063316822052,0.2294308990240097,0.2730967402458191,-0.1142972856760025,-0.13708257675170898,-0.12865528464317322,0.15601131319999695,0.44888660311698914,0.015956934541463852,0.37817105650901794,-0.34698113799095154,-0.0024727322161197662,-0.09719957411289215,0.6553595662117004,0.31680095195770264,-0.04055440053343773,-0.1605125516653061,0.15430790185928345,-0.23208746314048767,-0.2697872519493103,0.42774200439453125,-0.10204043984413147,0.5559654831886292,0.11096914112567902,2.616153190820014e-32,-0.5045391917228699,0.11107780784368515,-0.18468350172042847,0.5339704155921936,-0.08558398485183716,0.03379639610648155,0.2631123960018158,0.7414908409118652,-0.5430196523666382,0.5138211250305176,0.12716256082057953,-0.31557697057724,0.19252167642116547,0.253345787525177,-0.02899402379989624,0.11646998673677444,0.39367929100990295,-0.03387301415205002,-0.28333616256713867,-0.21347321569919586,-0.11846308410167694,-0.20535527169704437,-0.016133740544319153,0.18760168552398682,-0.2259252518415451,0.1494227647781372,0.18264584243297577,0.3958255350589752,-0.14623713493347168,0.2270394116640091,0.1714852750301361,-0.07063024491071701,-0.2905765771865845,-0.36629199981689453,0.3616235852241516,0.5592570304870605,0.10643155872821808,0.024111144244670868,0.21289567649364471,-0.31754186749458313,0.15855254232883453,-0.008792846463620663,0.004980940371751785,0.6545747518539429,-0.19380448758602142,-0.03998925909399986,-0.20742420852184296,0.12010551989078522,0.28698107600212097,0.5303142666816711,-0.6662395000457764,-0.17525595426559448,-0.09595256298780441,0.3452081084251404,-0.152899831533432,0.10215282440185547,-0.20828397572040558,0.5832223296165466,-0.18213698267936707,0.0345468744635582,-0.10129953920841217,0.16032829880714417,-0.13427498936653137,0.047849901020526886,-0.0865342766046524,0.22245003283023834,-0.1609944850206375,-0.1925918459892273,-0.07725381851196289,-0.37833285331726074,0.05138351023197174,-0.012934232130646706,-0.8680322766304016,0.10170803964138031,-0.2544623613357544,-0.32966041564941406,-0.08313576132059097,0.008441970683634281,-0.03992752358317375,-0.2733607590198517,-0.2615377604961395,0.10742823779582977,-0.06549906730651855,-0.1308857947587967,-0.09799693524837494,-0.08787287026643753,0.06873519718647003,0.09506669640541077,-0.3629777133464813,0.15921449661254883,-0.06500373035669327,0.17909352481365204,0.8790280818939209,0.03772491589188576,0.08673910051584244,-8.565214670852583e-08,-0.33929646015167236,0.3118705451488495,-0.4805440902709961,-0.032953061163425446,0.34920498728752136,0.24195101857185364,-0.1646021157503128,0.2519652247428894,0.1799640655517578,-0.22356748580932617,0.5443459749221802,0.13451939821243286,-0.29193437099456787,0.3693276643753052,0.6018487811088562,-0.07137903571128845,-0.7007991671562195,-0.38815778493881226,-0.09892711043357849,-0.21787768602371216,-0.07027873396873474,-0.118606798350811,0.10142236948013306,-0.5028542280197144,0.061366863548755646,-0.010358983650803566,-0.3588971197605133,0.42797258496284485,0.46163269877433777,-0.1796817183494568,0.044185034930706024,0.10516093671321869,0.07580085843801498,0.1000712662935257,0.17967012524604797,-0.46841052174568176,-0.4112969636917114,0.10569800436496735,-0.0003655441105365753,0.034334391355514526,-0.37717923521995544,-0.11426356434822083,0.39737415313720703,0.1742405742406845,-0.43300509452819824,-0.09926651418209076,0.07001541554927826,-0.06793393939733505,-0.1917962282896042,-0.4800603985786438,0.06117616593837738,-0.08069508522748947,-0.025915905833244324,0.5345154404640198,0.6164687275886536,0.056552939116954803,0.11888577044010162,-0.17393869161605835,0.52826988697052,0.18463830649852753,1.0523483753204346,0.8015890717506409,0.259770929813385,-0.11053761094808578],"index":0,"object":"embedding"}],"model":"minilm","object":"list","usage":{"prompt_tokens":0,"total_tokens":0}}

Expected Behavior: the embedding from these 2 approaches should yield the same output

Actual Behavior: As you can see, the output embedding looks completely different from the one from step 3, not only the values, but the scales are different too.

=============================================================
And by the way, the embedding output I get from step 3 is almost the same with the one I got from using sentence_transformer python library, for example:

from sentence_transformers import SentenceTransformer
sentences = ["prince"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)

This indicates that the model conversion works correctly.
I think there's something wrong with the Bert Embedding of server mode.

@ngxson
Copy link
Collaborator

ngxson commented Feb 29, 2024

Maybe related to #5796

@tybalex
Copy link
Author

tybalex commented Feb 29, 2024

Maybe related to #5796

I think so. hopefully will be fixed by that.

@tybalex
Copy link
Author

tybalex commented Mar 4, 2024

#5796 did NOT fix this issue.

@ngxson
Copy link
Collaborator

ngxson commented Mar 4, 2024

Can you check the cosine distance of vector produced by embedding.cpp vs server.cpp?

Also maybe try without GPU offloading?

@tybalex
Copy link
Author

tybalex commented Mar 5, 2024

Can you check the cosine distance of vector produced by embedding.cpp vs server.cpp?

Also maybe try without GPU offloading?

I tried without GPU offloading, got the same output.

As for the cosine distance, I calculated cosine distance between word prince and a list of words ["king", "queen", "apple", "orange"] and sorted them:

from embedding.cpp output:
[('king', 0.4116078336488638),
('queen', 0.4211467172721288),
('apple', 0.6682980126084468),
('orange', 0.6874219028515791)]

from server.cpp:
[('orange', 0.009215513122614483),
('king', 0.009233457008902879),
('queen', 0.01777521161063844),
('apple', 0.020477966154721194)]

@ngxson
Copy link
Collaborator

ngxson commented Mar 6, 2024

There's currently a refactoring on server code, maybe this will be fixed: #5882

@iamlemec
Copy link
Collaborator

iamlemec commented Mar 6, 2024

It looks like this is actually a tokenization issue. I'm seeing the output of /tokenize as being pretty garbled. First, it doesn't appear to be adding an BOS token. We're currently not specifying the add_bos_token flag in the GGUF files for embeddings, so we might want to do that.

Second, it looks like something is up with special_tokens_cache. It seems to be adding in any token that is the concatenation of two other valid tokens. But that ends up being tons of regular words in addition to actual special tokens. The cache isn't used for regular embeddings, but the server seems to want to use it.

Edit: If you force it to add a BOS token and turn off special token processing, the tokenization comes out correct. And in that case the embedding numbers are correct too, though they're not normalized, so they won't look the same as the output from embedding.

@ggerganov
Copy link
Owner

Yes, the special flag is always on in server:

std::vector<llama_token> tokenize(const json & json_prompt, bool add_bos) const
{
// TODO: currently, we tokenize using special tokens by default
// this is not always correct (see https://github.com/ggerganov/llama.cpp/pull/4160#issuecomment-1824826216)
// but it's better compared to completely ignoring ChatML and other chat templates
const bool TMP_FORCE_SPECIAL = true;

And this seems to tokenize incorrectly. Not sure if this is somehow a problem with the vocab or if we simply need to turn off special flag when using embeddings models.

We should fix this and the normalization after we merge #5882

@ggerganov ggerganov mentioned this issue Mar 7, 2024
4 tasks
@iamlemec
Copy link
Collaborator

Trying to figure out what's up with special_tokens_cache and looking through #3538 for guidance. Most models I'm looking at seem to correctly label special tokens with token_type. Do we have any examples of models that fail to do this properly? Seems like the kind of thing that should be taken care of during GGUF conversion.

@tybalex
Copy link
Author

tybalex commented Mar 20, 2024

As I posted above, the embedding I got from embedding.cpp is the same with what I got from the origin model, so guess it's not a GGUF conversion issue. My observation is, with the SAME input and SAME gguf model, embedding.cpp and server.cpp yield different output.

@github-actions github-actions bot added the stale label Apr 19, 2024
Copy link
Contributor

github-actions bot commented May 4, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants