Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make rms_norm_eps a parameter #2374

Merged
merged 4 commits into from
Jul 24, 2023
Merged

make rms_norm_eps a parameter #2374

merged 4 commits into from
Jul 24, 2023

Conversation

slaren
Copy link
Member

@slaren slaren commented Jul 24, 2023

Fixes #2373

Use -eps 1e-5 with llama 2, defaults to 1e-6 (same as current, for llama v1).

@slaren slaren marked this pull request as draft July 24, 2023 14:20
@slaren slaren marked this pull request as ready for review July 24, 2023 14:44
@slaren
Copy link
Member Author

slaren commented Jul 24, 2023

I think this is good enough until we update the model files.

@slaren slaren requested a review from ggerganov July 24, 2023 14:49
@slaren
Copy link
Member Author

slaren commented Jul 24, 2023

Perplexity with llama-2-7b.ggmlv3.q5_K_M.bin:
default (1e-6): 5.8986
-eps 1e-5: 5.8283

$ ./perplexity -m models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q5_K_M.bin -f wikitext-2-raw/wiki.test.raw -t 1 -ngl 99
main: build = 901 (3855ea3)
main: seed = 1690211247
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6
llama.cpp: loading model from models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q5_K_M.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 1.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 17 (mostly Q5_K - Medium)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 388.03 MB (+ 256.00 MB per state)
llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 288 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: offloading v cache to GPU
llama_model_load_internal: offloading k cache to GPU
llama_model_load_internal: offloaded 35/35 layers to GPU
llama_model_load_internal: total VRAM used: 5019 MB
llama_new_context_with_model: kv self size = 256.00 MB

system_info: n_threads = 1 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 0.53 seconds per pass - ETA 5 minutes
[1]4.3674,[2]4.8772,[3]5.4703,[4]6.0528,[5]6.1549,[6]6.0904,[7]6.2464,[8]6.3207,[9]6.6626,[10]6.8545,[11]7.0741,[12]7.1172,[13]7.0489,[14]7.1165,[15]7.3277,[16]6.9831,[17]6.8609,[18]6.8479,[19]6.5196,[20]6.5113,[21]6.4226,[22]6.2566,[23]6.2229,[24]6.1291,[25]6.1114,[26]5.9566,[27]5.7708,[28]5.6698,[29]5.5840,[30]5.4296,[31]5.3886,[32]5.4073,[33]5.3664,[34]5.3970,[35]5.4107,[36]5.4389,[37]5.4388,[38]5.4415,[39]5.4660,[40]5.5178,[41]5.5402,[42]5.5780,[43]5.5397,[44]5.5931,[45]5.6025,[46]5.5830,[47]5.6068,[48]5.5896,[49]5.5945,[50]5.5608,[51]5.5603,[52]5.5501,[53]5.5962,[54]5.5818,[55]5.5677,[56]5.5967,[57]5.6145,[58]5.6426,[59]5.6633,[60]5.7148,[61]5.7150,[62]5.7776,[63]5.8112,[64]5.8189,[65]5.8613,[66]5.8720,[67]5.8893,[68]5.9084,[69]5.9440,[70]5.9832,[71]6.0108,[72]6.0451,[73]6.0979,[74]6.1070,[75]6.1167,[76]6.1337,[77]6.1486,[78]6.1371,[79]6.1631,[80]6.1623,[81]6.1867,[82]6.1917,[83]6.1417,[84]6.1311,[85]6.1267,[86]6.1096,[87]6.0551,[88]6.0302,[89]6.0149,[90]6.0062,[91]6.0255,[92]6.0246,[93]6.0271,[94]6.0264,[95]6.0564,[96]6.0551,[97]6.0485,[98]6.0419,[99]6.0308,[100]6.0327,[101]6.0552,[102]6.0504,[103]6.0672,[104]6.0766,[105]6.0849,[106]6.1015,[107]6.1028,[108]6.1170,[109]6.1150,[110]6.1108,[111]6.1307,[112]6.1508,[113]6.1529,[114]6.1529,[115]6.1618,[116]6.1525,[117]6.1600,[118]6.1858,[119]6.2058,[120]6.2387,[121]6.2568,[122]6.2781,[123]6.3199,[124]6.3385,[125]6.3313,[126]6.3662,[127]6.4006,[128]6.4252,[129]6.4095,[130]6.4182,[131]6.4126,[132]6.4029,[133]6.3891,[134]6.3962,[135]6.3940,[136]6.3823,[137]6.3765,[138]6.3600,[139]6.3530,[140]6.3494,[141]6.3256,[142]6.3206,[143]6.2923,[144]6.2728,[145]6.2629,[146]6.2517,[147]6.2591,[148]6.2600,[149]6.2526,[150]6.2524,[151]6.2586,[152]6.2508,[153]6.2371,[154]6.2297,[155]6.2376,[156]6.2375,[157]6.2528,[158]6.2558,[159]6.2586,[160]6.2607,[161]6.2746,[162]6.2460,[163]6.2342,[164]6.2084,[165]6.1771,[166]6.1490,[167]6.1113,[168]6.0796,[169]6.0649,[170]6.0535,[171]6.0281,[172]6.0136,[173]5.9974,[174]5.9689,[175]5.9487,[176]5.9356,[177]5.9148,[178]5.8926,[179]5.8763,[180]5.8668,[181]5.8476,[182]5.8290,[183]5.8143,[184]5.8123,[185]5.8051,[186]5.8082,[187]5.8130,[188]5.8123,[189]5.8273,[190]5.8272,[191]5.8451,[192]5.8630,[193]5.8814,[194]5.8948,[195]5.9164,[196]5.9303,[197]5.9489,[198]5.9641,[199]5.9664,[200]5.9698,[201]5.9616,[202]5.9773,[203]5.9862,[204]5.9840,[205]5.9948,[206]5.9981,[207]5.9971,[208]6.0058,[209]6.0088,[210]6.0148,[211]6.0248,[212]6.0305,[213]6.0395,[214]6.0411,[215]6.0455,[216]6.0595,[217]6.0768,[218]6.0906,[219]6.0907,[220]6.0879,[221]6.0823,[222]6.0798,[223]6.0721,[224]6.0639,[225]6.0586,[226]6.0787,[227]6.0829,[228]6.0881,[229]6.0926,[230]6.0880,[231]6.1028,[232]6.0911,[233]6.0760,[234]6.0625,[235]6.0388,[236]6.0322,[237]6.0197,[238]6.0219,[239]6.0078,[240]5.9965,[241]5.9993,[242]6.0011,[243]5.9973,[244]5.9865,[245]5.9826,[246]5.9711,[247]5.9599,[248]5.9516,[249]5.9474,[250]5.9503,[251]5.9420,[252]5.9373,[253]5.9274,[254]5.9213,[255]5.9105,[256]5.8933,[257]5.8817,[258]5.8746,[259]5.8747,[260]5.8655,[261]5.8604,[262]5.8547,[263]5.8493,[264]5.8260,[265]5.8258,[266]5.8232,[267]5.8167,[268]5.8243,[269]5.8238,[270]5.8248,[271]5.8311,[272]5.8339,[273]5.8346,[274]5.8360,[275]5.8421,[276]5.8483,[277]5.8637,[278]5.8728,[279]5.8829,[280]5.8854,[281]5.8959,[282]5.9006,[283]5.9159,[284]5.9254,[285]5.9346,[286]5.9476,[287]5.9476,[288]5.9550,[289]5.9474,[290]5.9316,[291]5.9165,[292]5.9003,[293]5.8867,[294]5.8866,[295]5.8858,[296]5.8899,[297]5.8876,[298]5.8890,[299]5.8847,[300]5.8732,[301]5.8718,[302]5.8638,[303]5.8559,[304]5.8476,[305]5.8436,[306]5.8310,[307]5.8334,[308]5.8357,[309]5.8195,[310]5.8153,[311]5.8092,[312]5.8094,[313]5.8034,[314]5.7999,[315]5.7840,[316]5.7790,[317]5.7652,[318]5.7463,[319]5.7609,[320]5.7746,[321]5.7799,[322]5.7771,[323]5.7718,[324]5.7714,[325]5.7818,[326]5.7823,[327]5.7846,[328]5.7881,[329]5.7941,[330]5.7961,[331]5.8080,[332]5.8037,[333]5.8111,[334]5.8053,[335]5.7987,[336]5.8002,[337]5.7985,[338]5.7991,[339]5.7942,[340]5.7899,[341]5.7966,[342]5.7987,[343]5.8031,[344]5.8031,[345]5.8032,[346]5.8000,[347]5.8020,[348]5.8053,[349]5.8081,[350]5.8059,[351]5.8061,[352]5.8067,[353]5.8004,[354]5.8006,[355]5.8046,[356]5.8073,[357]5.8054,[358]5.8137,[359]5.8160,[360]5.8129,[361]5.8122,[362]5.8191,[363]5.8302,[364]5.8362,[365]5.8415,[366]5.8438,[367]5.8517,[368]5.8485,[369]5.8497,[370]5.8511,[371]5.8455,[372]5.8507,[373]5.8552,[374]5.8536,[375]5.8522,[376]5.8597,[377]5.8560,[378]5.8589,[379]5.8634,[380]5.8564,[381]5.8535,[382]5.8493,[383]5.8474,[384]5.8462,[385]5.8456,[386]5.8459,[387]5.8456,[388]5.8407,[389]5.8360,[390]5.8304,[391]5.8234,[392]5.8184,[393]5.8189,[394]5.8216,[395]5.8195,[396]5.8118,[397]5.8192,[398]5.8235,[399]5.8304,[400]5.8302,[401]5.8323,[402]5.8331,[403]5.8352,[404]5.8414,[405]5.8324,[406]5.8285,[407]5.8279,[408]5.8297,[409]5.8407,[410]5.8512,[411]5.8608,[412]5.8757,[413]5.8872,[414]5.8930,[415]5.8987,[416]5.9059,[417]5.9177,[418]5.9216,[419]5.9269,[420]5.9356,[421]5.9463,[422]5.9508,[423]5.9562,[424]5.9661,[425]5.9739,[426]5.9804,[427]5.9840,[428]5.9912,[429]5.9961,[430]6.0027,[431]6.0171,[432]6.0211,[433]6.0194,[434]6.0155,[435]6.0166,[436]6.0192,[437]6.0284,[438]6.0358,[439]6.0318,[440]6.0297,[441]6.0249,[442]6.0239,[443]6.0256,[444]6.0263,[445]6.0243,[446]6.0262,[447]6.0286,[448]6.0320,[449]6.0295,[450]6.0295,[451]6.0258,[452]6.0119,[453]6.0027,[454]5.9967,[455]5.9971,[456]6.0019,[457]6.0034,[458]6.0016,[459]6.0015,[460]6.0094,[461]6.0058,[462]6.0043,[463]6.0077,[464]6.0072,[465]6.0053,[466]5.9982,[467]6.0000,[468]6.0003,[469]6.0020,[470]6.0028,[471]5.9991,[472]6.0028,[473]5.9972,[474]5.9983,[475]5.9922,[476]5.9948,[477]5.9878,[478]5.9866,[479]5.9917,[480]5.9965,[481]5.9991,[482]5.9949,[483]5.9910,[484]5.9926,[485]5.9912,[486]5.9867,[487]5.9858,[488]5.9839,[489]5.9796,[490]5.9779,[491]5.9756,[492]5.9703,[493]5.9672,[494]5.9657,[495]5.9639,[496]5.9602,[497]5.9553,[498]5.9536,[499]5.9493,[500]5.9405,[501]5.9349,[502]5.9349,[503]5.9336,[504]5.9252,[505]5.9276,[506]5.9285,[507]5.9232,[508]5.9191,[509]5.9190,[510]5.9216,[511]5.9259,[512]5.9294,[513]5.9311,[514]5.9370,[515]5.9324,[516]5.9320,[517]5.9321,[518]5.9315,[519]5.9340,[520]5.9364,[521]5.9375,[522]5.9401,[523]5.9405,[524]5.9456,[525]5.9491,[526]5.9504,[527]5.9518,[528]5.9475,[529]5.9479,[530]5.9436,[531]5.9416,[532]5.9465,[533]5.9484,[534]5.9477,[535]5.9509,[536]5.9455,[537]5.9443,[538]5.9487,[539]5.9499,[540]5.9516,[541]5.9525,[542]5.9529,[543]5.9551,[544]5.9563,[545]5.9550,[546]5.9561,[547]5.9522,[548]5.9473,[549]5.9467,[550]5.9442,[551]5.9415,[552]5.9402,[553]5.9372,[554]5.9347,[555]5.9314,[556]5.9309,[557]5.9334,[558]5.9300,[559]5.9306,[560]5.9304,[561]5.9305,[562]5.9284,[563]5.9282,[564]5.9328,[565]5.9339,[566]5.9342,[567]5.9312,[568]5.9312,[569]5.9295,[570]5.9323,[571]5.9329,[572]5.9332,[573]5.9333,[574]5.9294,[575]5.9283,[576]5.9284,[577]5.9267,[578]5.9249,[579]5.9247,[580]5.9191,[581]5.9161,[582]5.9152,[583]5.9164,[584]5.9171,[585]5.9098,[586]5.9032,[587]5.9031,[588]5.9070,[589]5.9120,[590]5.9150,[591]5.9168,[592]5.9156,[593]5.9115,[594]5.9118,[595]5.9098,[596]5.9135,[597]5.9110,[598]5.9077,[599]5.9094,[600]5.9082,[601]5.9067,[602]5.9065,[603]5.9084,[604]5.9097,[605]5.9129,[606]5.9147,[607]5.9137,[608]5.9099,[609]5.9108,[610]5.9151,[611]5.9137,[612]5.9164,[613]5.9133,[614]5.9085,[615]5.9019,[616]5.9045,[617]5.8987,[618]5.8937,[619]5.8886,[620]5.8758,[621]5.8695,[622]5.8672,[623]5.8689,[624]5.8691,[625]5.8702,[626]5.8697,[627]5.8725,[628]5.8730,[629]5.8733,[630]5.8764,[631]5.8819,[632]5.8876,[633]5.8872,[634]5.8902,[635]5.8914,[636]5.8888,[637]5.8854,[638]5.8878,[639]5.8838,[640]5.8849,[641]5.8853,[642]5.8910,[643]5.8926,[644]5.8935,[645]5.8921,[646]5.8958,[647]5.8925,[648]5.8932,[649]5.8938,[650]5.8975,[651]5.9017,[652]5.9023,[653]5.9057,[654]5.8995,[655]5.8986,

llama_print_timings: load time = 2575.04 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: prompt eval time = 326699.83 ms / 335360 tokens ( 0.97 ms per token, 1026.51 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 362017.90 ms

$ ./perplexity -m models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q5_K_M.bin -f wikitext-2-raw/wiki.test.raw -t 1 -ngl 99 -eps 1
e-5
main: build = 901 (3855ea3)
main: seed = 1690211637
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6
llama.cpp: loading model from models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q5_K_M.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 1.0e-05
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 17 (mostly Q5_K - Medium)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 388.03 MB (+ 256.00 MB per state)
llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 288 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: offloading v cache to GPU
llama_model_load_internal: offloading k cache to GPU
llama_model_load_internal: offloaded 35/35 layers to GPU
llama_model_load_internal: total VRAM used: 5019 MB
llama_new_context_with_model: kv self size = 256.00 MB

system_info: n_threads = 1 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 0.55 seconds per pass - ETA 6 minutes
[1]4.1567,[2]4.7072,[3]5.3515,[4]5.9350,[5]6.0629,[6]5.9819,[7]6.1495,[8]6.2252,[9]6.5530,[10]6.7395,[11]6.9634,[12]7.0142,[13]6.9389,[14]7.0150,[15]7.2388,[16]6.8983,[17]6.7813,[18]6.7748,[19]6.4558,[20]6.4521,[21]6.3791,[22]6.2070,[23]6.1766,[24]6.0832,[25]6.0679,[26]5.9150,[27]5.7333,[28]5.6348,[29]5.5509,[30]5.3983,[31]5.3618,[32]5.3818,[33]5.3358,[34]5.3650,[35]5.3804,[36]5.4048,[37]5.4011,[38]5.3999,[39]5.4128,[40]5.4637,[41]5.4864,[42]5.5241,[43]5.4856,[44]5.5406,[45]5.5517,[46]5.5296,[47]5.5532,[48]5.5325,[49]5.5333,[50]5.4999,[51]5.5007,[52]5.4920,[53]5.5386,[54]5.5250,[55]5.5102,[56]5.5400,[57]5.5589,[58]5.5867,[59]5.6074,[60]5.6571,[61]5.6546,[62]5.7159,[63]5.7503,[64]5.7567,[65]5.7990,[66]5.8100,[67]5.8282,[68]5.8484,[69]5.8833,[70]5.9221,[71]5.9492,[72]5.9842,[73]6.0369,[74]6.0463,[75]6.0560,[76]6.0727,[77]6.0872,[78]6.0767,[79]6.1035,[80]6.1025,[81]6.1173,[82]6.1225,[83]6.0745,[84]6.0653,[85]6.0618,[86]6.0447,[87]5.9872,[88]5.9627,[89]5.9474,[90]5.9405,[91]5.9601,[92]5.9592,[93]5.9611,[94]5.9606,[95]5.9899,[96]5.9885,[97]5.9808,[98]5.9749,[99]5.9642,[100]5.9668,[101]5.9891,[102]5.9845,[103]6.0017,[104]6.0107,[105]6.0124,[106]6.0290,[107]6.0310,[108]6.0453,[109]6.0441,[110]6.0405,[111]6.0596,[112]6.0798,[113]6.0820,[114]6.0828,[115]6.0919,[116]6.0798,[117]6.0838,[118]6.1086,[119]6.1288,[120]6.1614,[121]6.1781,[122]6.1999,[123]6.2407,[124]6.2580,[125]6.2498,[126]6.2857,[127]6.3202,[128]6.3456,[129]6.3305,[130]6.3397,[131]6.3346,[132]6.3258,[133]6.3114,[134]6.3190,[135]6.3169,[136]6.3059,[137]6.3001,[138]6.2839,[139]6.2771,[140]6.2742,[141]6.2492,[142]6.2453,[143]6.2177,[144]6.1988,[145]6.1897,[146]6.1794,[147]6.1860,[148]6.1874,[149]6.1806,[150]6.1810,[151]6.1873,[152]6.1803,[153]6.1668,[154]6.1593,[155]6.1668,[156]6.1645,[157]6.1792,[158]6.1817,[159]6.1837,[160]6.1862,[161]6.1990,[162]6.1714,[163]6.1595,[164]6.1347,[165]6.1039,[166]6.0762,[167]6.0389,[168]6.0080,[169]5.9940,[170]5.9830,[171]5.9582,[172]5.9441,[173]5.9283,[174]5.9003,[175]5.8797,[176]5.8662,[177]5.8461,[178]5.8247,[179]5.8090,[180]5.7998,[181]5.7811,[182]5.7632,[183]5.7489,[184]5.7473,[185]5.7394,[186]5.7414,[187]5.7461,[188]5.7448,[189]5.7598,[190]5.7600,[191]5.7781,[192]5.7963,[193]5.8143,[194]5.8279,[195]5.8501,[196]5.8641,[197]5.8830,[198]5.8980,[199]5.9004,[200]5.9040,[201]5.8967,[202]5.9125,[203]5.9213,[204]5.9189,[205]5.9300,[206]5.9337,[207]5.9323,[208]5.9414,[209]5.9440,[210]5.9502,[211]5.9601,[212]5.9660,[213]5.9758,[214]5.9771,[215]5.9803,[216]5.9945,[217]6.0111,[218]6.0246,[219]6.0246,[220]6.0218,[221]6.0149,[222]6.0124,[223]6.0041,[224]5.9948,[225]5.9896,[226]6.0096,[227]6.0142,[228]6.0199,[229]6.0252,[230]6.0209,[231]6.0354,[232]6.0241,[233]6.0086,[234]5.9932,[235]5.9696,[236]5.9635,[237]5.9514,[238]5.9536,[239]5.9400,[240]5.9291,[241]5.9318,[242]5.9338,[243]5.9301,[244]5.9191,[245]5.9153,[246]5.9043,[247]5.8936,[248]5.8862,[249]5.8826,[250]5.8857,[251]5.8774,[252]5.8730,[253]5.8636,[254]5.8579,[255]5.8474,[256]5.8304,[257]5.8192,[258]5.8110,[259]5.8116,[260]5.8035,[261]5.7979,[262]5.7922,[263]5.7860,[264]5.7615,[265]5.7609,[266]5.7577,[267]5.7512,[268]5.7592,[269]5.7583,[270]5.7594,[271]5.7656,[272]5.7687,[273]5.7689,[274]5.7705,[275]5.7768,[276]5.7829,[277]5.7985,[278]5.8075,[279]5.8176,[280]5.8203,[281]5.8310,[282]5.8358,[283]5.8506,[284]5.8600,[285]5.8694,[286]5.8824,[287]5.8818,[288]5.8880,[289]5.8803,[290]5.8644,[291]5.8501,[292]5.8341,[293]5.8203,[294]5.8202,[295]5.8195,[296]5.8237,[297]5.8217,[298]5.8232,[299]5.8192,[300]5.8080,[301]5.8068,[302]5.7991,[303]5.7913,[304]5.7824,[305]5.7786,[306]5.7663,[307]5.7681,[308]5.7691,[309]5.7532,[310]5.7492,[311]5.7434,[312]5.7435,[313]5.7375,[314]5.7343,[315]5.7184,[316]5.7133,[317]5.6985,[318]5.6790,[319]5.6921,[320]5.7052,[321]5.7103,[322]5.7064,[323]5.7007,[324]5.7000,[325]5.7113,[326]5.7120,[327]5.7141,[328]5.7176,[329]5.7234,[330]5.7257,[331]5.7377,[332]5.7338,[333]5.7415,[334]5.7349,[335]5.7287,[336]5.7305,[337]5.7290,[338]5.7291,[339]5.7243,[340]5.7203,[341]5.7273,[342]5.7292,[343]5.7335,[344]5.7332,[345]5.7334,[346]5.7305,[347]5.7330,[348]5.7362,[349]5.7392,[350]5.7372,[351]5.7376,[352]5.7377,[353]5.7318,[354]5.7306,[355]5.7345,[356]5.7374,[357]5.7349,[358]5.7433,[359]5.7459,[360]5.7434,[361]5.7430,[362]5.7501,[363]5.7612,[364]5.7673,[365]5.7723,[366]5.7747,[367]5.7827,[368]5.7798,[369]5.7811,[370]5.7827,[371]5.7779,[372]5.7835,[373]5.7877,[374]5.7863,[375]5.7851,[376]5.7925,[377]5.7891,[378]5.7921,[379]5.7966,[380]5.7891,[381]5.7861,[382]5.7812,[383]5.7793,[384]5.7783,[385]5.7772,[386]5.7778,[387]5.7778,[388]5.7728,[389]5.7683,[390]5.7623,[391]5.7556,[392]5.7508,[393]5.7510,[394]5.7534,[395]5.7513,[396]5.7437,[397]5.7509,[398]5.7550,[399]5.7621,[400]5.7613,[401]5.7633,[402]5.7641,[403]5.7661,[404]5.7724,[405]5.7635,[406]5.7597,[407]5.7591,[408]5.7611,[409]5.7722,[410]5.7828,[411]5.7920,[412]5.8070,[413]5.8182,[414]5.8241,[415]5.8298,[416]5.8369,[417]5.8488,[418]5.8528,[419]5.8581,[420]5.8664,[421]5.8769,[422]5.8807,[423]5.8863,[424]5.8959,[425]5.9040,[426]5.9106,[427]5.9141,[428]5.9214,[429]5.9265,[430]5.9332,[431]5.9473,[432]5.9511,[433]5.9496,[434]5.9458,[435]5.9473,[436]5.9500,[437]5.9594,[438]5.9669,[439]5.9631,[440]5.9610,[441]5.9563,[442]5.9553,[443]5.9569,[444]5.9578,[445]5.9560,[446]5.9581,[447]5.9604,[448]5.9641,[449]5.9617,[450]5.9619,[451]5.9583,[452]5.9442,[453]5.9348,[454]5.9286,[455]5.9291,[456]5.9340,[457]5.9355,[458]5.9337,[459]5.9335,[460]5.9415,[461]5.9379,[462]5.9363,[463]5.9395,[464]5.9386,[465]5.9366,[466]5.9296,[467]5.9311,[468]5.9311,[469]5.9330,[470]5.9337,[471]5.9300,[472]5.9337,[473]5.9280,[474]5.9290,[475]5.9228,[476]5.9249,[477]5.9179,[478]5.9168,[479]5.9218,[480]5.9266,[481]5.9291,[482]5.9249,[483]5.9212,[484]5.9226,[485]5.9213,[486]5.9169,[487]5.9160,[488]5.9140,[489]5.9099,[490]5.9083,[491]5.9057,[492]5.9003,[493]5.8970,[494]5.8949,[495]5.8929,[496]5.8892,[497]5.8844,[498]5.8827,[499]5.8783,[500]5.8698,[501]5.8643,[502]5.8644,[503]5.8626,[504]5.8543,[505]5.8564,[506]5.8569,[507]5.8517,[508]5.8474,[509]5.8474,[510]5.8496,[511]5.8542,[512]5.8575,[513]5.8590,[514]5.8648,[515]5.8600,[516]5.8596,[517]5.8598,[518]5.8593,[519]5.8617,[520]5.8641,[521]5.8652,[522]5.8680,[523]5.8685,[524]5.8737,[525]5.8769,[526]5.8782,[527]5.8795,[528]5.8748,[529]5.8752,[530]5.8709,[531]5.8690,[532]5.8741,[533]5.8761,[534]5.8751,[535]5.8784,[536]5.8730,[537]5.8717,[538]5.8763,[539]5.8776,[540]5.8795,[541]5.8805,[542]5.8811,[543]5.8834,[544]5.8846,[545]5.8833,[546]5.8845,[547]5.8806,[548]5.8757,[549]5.8752,[550]5.8727,[551]5.8702,[552]5.8689,[553]5.8653,[554]5.8630,[555]5.8600,[556]5.8596,[557]5.8620,[558]5.8585,[559]5.8588,[560]5.8587,[561]5.8589,[562]5.8564,[563]5.8562,[564]5.8608,[565]5.8620,[566]5.8625,[567]5.8596,[568]5.8598,[569]5.8584,[570]5.8612,[571]5.8620,[572]5.8626,[573]5.8624,[574]5.8587,[575]5.8572,[576]5.8571,[577]5.8556,[578]5.8541,[579]5.8541,[580]5.8486,[581]5.8458,[582]5.8450,[583]5.8462,[584]5.8463,[585]5.8392,[586]5.8326,[587]5.8324,[588]5.8363,[589]5.8415,[590]5.8444,[591]5.8462,[592]5.8449,[593]5.8409,[594]5.8413,[595]5.8394,[596]5.8431,[597]5.8407,[598]5.8375,[599]5.8394,[600]5.8383,[601]5.8369,[602]5.8367,[603]5.8384,[604]5.8398,[605]5.8430,[606]5.8448,[607]5.8438,[608]5.8398,[609]5.8408,[610]5.8447,[611]5.8436,[612]5.8458,[613]5.8430,[614]5.8384,[615]5.8320,[616]5.8346,[617]5.8289,[618]5.8240,[619]5.8190,[620]5.8061,[621]5.7998,[622]5.7976,[623]5.7995,[624]5.7997,[625]5.8007,[626]5.8001,[627]5.8029,[628]5.8036,[629]5.8039,[630]5.8071,[631]5.8124,[632]5.8180,[633]5.8176,[634]5.8206,[635]5.8216,[636]5.8187,[637]5.8149,[638]5.8173,[639]5.8134,[640]5.8146,[641]5.8149,[642]5.8207,[643]5.8223,[644]5.8233,[645]5.8219,[646]5.8255,[647]5.8218,[648]5.8224,[649]5.8231,[650]5.8266,[651]5.8310,[652]5.8317,[653]5.8352,[654]5.8291,[655]5.8283,

llama_print_timings: load time = 3436.11 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: prompt eval time = 324490.57 ms / 335360 tokens ( 0.97 ms per token, 1033.50 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 359577.39 ms

@ggerganov
Copy link
Member

Thanks - will take a look tmrw

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewd on the phone 👍

@slaren slaren merged commit 41c6741 into master Jul 24, 2023
@slaren slaren deleted the rms-norm-eps-param branch July 24, 2023 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug report] Performance deterioration of LLaMA-2 model due to hardcoded rms_norm_eps
2 participants