Skip to content

0.99 rebase #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 375 commits into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
375 commits
Select commit Hold shift + click to select a range
3d032ec
server : add `n_discard` parameter (#6300)
kaetemi Mar 26, 2024
deb7240
embedding : adjust `n_ubatch` value (#6296)
mscheong01 Mar 26, 2024
d25b1c3
quantize : be able to override metadata by key (#6321)
ikawrakow Mar 26, 2024
e097633
convert-hf : fix exception in sentencepiece with added tokens (#6320)
pcuenca Mar 26, 2024
55c1b2a
IQ1_M: 1.75 bpw quantization (#6302)
ikawrakow Mar 26, 2024
557410b
llama : greatly reduce output buffer memory usage (#6122)
compilade Mar 26, 2024
32c8486
wpm : portable unicode tolower (#6305)
cebtenzzre Mar 26, 2024
a4f569e
[SYCL] fix no file in win rel (#6314)
NeoZhangJianyu Mar 27, 2024
0642b22
server: public: use relative routes for static files (#6325)
EZForever Mar 27, 2024
1740d6d
readme : add php api bindings (#6326)
mcharytoniuk Mar 27, 2024
2ab4f00
llama2c : open file as binary (#6332)
ggerganov Mar 27, 2024
e562b97
common : change --no-penalize-nl to --penalize-nl (#6334)
CISC Mar 27, 2024
cbc8343
Make IQ1_M work for QK_K = 64 (#6327)
ikawrakow Mar 27, 2024
e82f9e2
[SYCL] Fix batched impl for NVidia GPU (#6164)
AidanBeltonS Mar 27, 2024
1e13987
embedding : show full embedding for single prompt (#6342)
howlger Mar 27, 2024
3a03459
make : whitespace
ggerganov Mar 27, 2024
e5b89a4
ggml : fix bounds checking of zero size views (#6347)
slaren Mar 27, 2024
53c7ec5
nix: ci: dont test cuda and rocm (for now)
SomeoneSerge Mar 27, 2024
a016026
server: continuous performance monitoring and PR comment (#6283)
phymbert Mar 27, 2024
25f4a61
[SYCL] fix set main gpu crash (#6339)
NeoZhangJianyu Mar 28, 2024
d0e2f64
doc: fix typo in MobileVLM-README.md (#6181)
ZiangWu-77 Mar 28, 2024
f6a0f5c
nix: .#widnows: init
hutli Feb 15, 2024
22a462c
nix: package: don't introduce the dependency on python
SomeoneSerge Mar 26, 2024
e9f17dc
nix: .#windows: proper cross-compilation set-up
SomeoneSerge Mar 26, 2024
dbb03e2
only using explicit blas if hostPlatform is allowed
hutli Mar 27, 2024
c873976
using blas.meta.available to check host platform
hutli Mar 27, 2024
d39b308
nix: moved blas availability check to package inputs so it is still o…
hutli Mar 27, 2024
d2d8f38
nix: removed unnessesary indentation
hutli Mar 27, 2024
6902cb7
server : stop gracefully on SIGTERM (#6348)
EZForever Mar 28, 2024
cfc4d75
doc: fix outdated default value of batch size (#6336)
Sunt-ing Mar 28, 2024
28cb9a0
ci: bench: fix master not schedule, fix commit status failed on exter…
phymbert Mar 28, 2024
0308f5e
llama : fix command-r inference when omitting outputs (#6367)
compilade Mar 28, 2024
66ba560
llava : fix MobileVLM (#6364)
ZiangWu-77 Mar 28, 2024
be55134
convert : refactor vocab selection logic (#6355)
cebtenzzre Mar 28, 2024
5106ef4
[SYCL] Revisited & updated SYCL build documentation (#6141)
OuadiElfarouki Mar 28, 2024
bfe7daf
readme : add notice for UI list
ggerganov Mar 28, 2024
b75c381
convert : allow conversion of Mistral HF models (#6144)
pcuenca Mar 29, 2024
057400a
llama : remove redundant reshape in build_kv_store (#6369)
danbev Mar 29, 2024
8093987
cmake : add explicit metal version options (#6370)
mattjcly Mar 29, 2024
b910287
readme : add project (#6356)
zhouwg Mar 29, 2024
cfde806
ci : fix BGE wget (#6383)
ggerganov Mar 29, 2024
0695747
[Model] Add support for xverse (#6301)
hxer7963 Mar 29, 2024
d48ccf3
sync : ggml (#6351)
ggerganov Mar 29, 2024
ba0c7c7
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
0cc4m Mar 29, 2024
f7fc5f6
split: allow --split-max-size option (#6343)
ngxson Mar 29, 2024
c342d07
Fedora build update (#6388)
Man2Dev Mar 29, 2024
37e7854
ci: bench: fix Resource not accessible by integration on PR event (#6…
phymbert Mar 30, 2024
c50a82c
readme : update hot topics
ggerganov Mar 31, 2024
226e819
ci: server: verify deps are coherent with the commit (#6409)
phymbert Apr 1, 2024
33a5244
compare-llama-bench.py: fix long hexsha args (#6424)
JohannesGaessler Apr 1, 2024
f87f7b8
flake.lock: Update (#6402)
ggerganov Apr 1, 2024
5260486
[SYCL] Disable iqx on windows as WA (#6435)
airMeng Apr 3, 2024
08a0c02
ggml : mul_mat_id use the same tensor for all the experts (#6387)
slaren Apr 3, 2024
076b086
readme : update hot topics
ggerganov Apr 3, 2024
1ff4d9f
Add OpenChat, Alpaca, Vicuna chat templates (#6397)
kaizau Apr 3, 2024
db214fa
Missing tokenizer.model error during gguf conversion (#6443)
overtunned Apr 3, 2024
e69945d
security : create policy (#6354)
joycebrum Apr 3, 2024
154d4ee
readme : add feature-rich rust bindings (#6465)
francis2tm Apr 3, 2024
5d4f12e
server: add cURL support to `server.Dockerfile` (#6461)
elepedus Apr 3, 2024
9f62c01
ci : update checkout, setup-python and upload-artifact to latest (#6456)
EwoutH Apr 3, 2024
bb43cf7
llama : add SEA-LION support (#6448)
bryanSwk Apr 3, 2024
60cdf40
server : handle exception on wrong type in request (#6452)
JH23X Apr 3, 2024
5fb1574
A few small fixes to server's README docs (#6428)
fat-tire Apr 3, 2024
72d73af
convert : fix for lint error complaining of bare except (#6470)
HanClinto Apr 4, 2024
1a43c72
server : add option to disable KV offload (#6468)
jxy Apr 4, 2024
4399f13
server : remove obsolete --memory-f32 option
ggerganov Apr 4, 2024
9b84ae1
examples : add GBNF validator program (#5948)
HanClinto Apr 4, 2024
4bcd6b9
common: remove duplicate check for curl (#6471)
danbev Apr 4, 2024
7a2c926
ci: bench: add more ftype, fix triggers and bot comment (#6466)
phymbert Apr 4, 2024
a74401f
Correct README link (#6458)
limitedAtonement Apr 4, 2024
8120efe
ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6…
phymbert Apr 4, 2024
2e66913
server: allow penalizing repetition of newlines on server webpage (#6…
sha224 Apr 4, 2024
c666ba2
build CI: Name artifacts (#6482)
EwoutH Apr 4, 2024
7dda1b7
ci: exempt master branch workflows from getting cancelled (#6486)
mscheong01 Apr 4, 2024
0a1d889
server: add cURL support to server Dockerfiles (#6474)
elepedus Apr 4, 2024
b660a57
readme : fix typo (#6481)
junnjiee Apr 4, 2024
a307375
readme : add Dot to UI list (#6487)
alexpinel Apr 4, 2024
1b496a7
[SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464)
OuadiElfarouki Apr 5, 2024
87e21bb
bench : make n_batch and n_ubatch configurable in Batched bench (#6500)
Sunt-ing Apr 5, 2024
d0f5dee
readme : update UI list (#6503)
hugo53 Apr 5, 2024
a8bd14d
gguf.py : add licence and version to gguf writer (#6504)
mofosyne Apr 5, 2024
75cd4c7
ci: bench: support sse and fix prompt processing time / server: add t…
phymbert Apr 6, 2024
57dd02c
Tests: Added integration tests for GBNF parser (#6472)
HanClinto Apr 6, 2024
b66aec6
backend : fix typo in scheduler documentation (ggml/781)
danbev Apr 3, 2024
54ea069
sync : ggml
ggerganov Apr 6, 2024
d4f220a
support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS…
NeoZhangJianyu Apr 7, 2024
9472bce
Run make to build the project (#6457)
limitedAtonement Apr 7, 2024
43e8995
scripts : sync ggml-cuda folder
ggerganov Apr 7, 2024
f77261a
ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020)
primenko-v Apr 4, 2024
c372477
sync : ggml
ggerganov Apr 7, 2024
e0717e7
Add GritLM as supported models. (#6513)
dranger003 Apr 7, 2024
b909236
flake.lock: Update (#6517)
ggerganov Apr 7, 2024
855f544
Change Windows AMD example to release build to make inference much fa…
thebaron88 Apr 7, 2024
d752327
Adding KodiBot to UI list (#6535)
firatkiral Apr 8, 2024
87fb5b4
remove row=1 cond (#6532)
abhilash1910 Apr 8, 2024
beea6e1
llama : save and restore kv cache for single seq id (#6341)
kaetemi Apr 8, 2024
e3c337d
llama : support negative ith in llama_get_ API (#6519)
TheFlipbook Apr 8, 2024
b73e564
quantize : fix precedence of cli args (#6541)
ggerganov Apr 8, 2024
cecd8d3
Comment explaining a decision (#6531)
kunnis Apr 8, 2024
cc4a954
llama : fix attention layer count sanity check (#6550)
ggerganov Apr 8, 2024
e11a899
license : update copyright notice + add AUTHORS (#6405)
ggerganov Apr 9, 2024
5dc9dd7
llama : add Command R Plus support (#6491)
RefractAI Apr 9, 2024
400d5d7
server : detect search query to start webchat (#6554)
Mardak Apr 9, 2024
c4a3a4f
sync : ggml
ggerganov Apr 9, 2024
1b67731
BERT tokenizer fixes (#6498)
cebtenzzre Apr 9, 2024
ba5e134
readme: fix typo in amdgpu target name (#6573)
Sejsel Apr 9, 2024
b231b37
readme : update UI list (#6560)
ylsdamxssjxxdd Apr 10, 2024
29122d3
readme : fix ROCm link (#6579)
artem-zinnatullin Apr 10, 2024
67fac4b
docs : how to add a model (#6565)
phymbert Apr 10, 2024
65c64dc
convert.py : add consolidated.safetensors for mixtral 8x22b (#6587)
slaren Apr 10, 2024
4f407a0
llama : add model types for mixtral (#6589)
slaren Apr 10, 2024
b3a96f2
minor layout improvements (#6572)
rsoika Apr 10, 2024
8228b66
gguf : add option to not check tensor data (#6582)
danbev Apr 10, 2024
b804b1e
eval-callback: Example how to use eval callback for debugging (#6576)
phymbert Apr 11, 2024
f4183af
scripts : add --outdir option to hf.sh (#6600)
danbev Apr 11, 2024
1bbdaf6
ci: download artifacts to release directory (#6612)
Hugi-R Apr 11, 2024
cbaadc9
grammars: 1.5x faster inference w/ complex grammars (vector reserves …
ochafik Apr 11, 2024
a474f50
Refactor Error Handling for CUDA (#6575)
nneubacher Apr 11, 2024
f7001cc
As suggested by @slaren, disabling Metal for test to fix CI build on …
HanClinto Apr 11, 2024
04a5ac2
Optimization: eliminate addition of redundant stacks when advancing g…
HanClinto Apr 12, 2024
9ed2737
ci : disable Metal for macOS-latest-cmake-x64 (#6628)
ggerganov Apr 12, 2024
81da18e
eval-callback: use ggml_op_desc to pretty print unary operator name (…
phymbert Apr 12, 2024
dee7f8d
Correct free memory and total memory. (#6630)
MasterYi1024 Apr 12, 2024
ef21ce4
imatrix : remove invalid assert (#6632)
ggerganov Apr 12, 2024
5c4d767
chore: Fix markdown warnings (#6625)
reneleonhardt Apr 12, 2024
91c7360
llama : add gguf_remove_key + remove split meta during quantize (#6591)
zj040045 Apr 12, 2024
24ee66e
server : coherent log output for KV cache full (#6637)
phymbert Apr 12, 2024
4cc120c
infill : add download instructions for model (#6626)
danbev Apr 12, 2024
fbbc030
metal : unify mul_mv_id kernels (#6556)
slaren Apr 12, 2024
ab9a324
JSON schema conversion: ⚡️ faster repetitions, min/maxLength for stri…
ochafik Apr 12, 2024
4bd0f93
model: support arch `DbrxForCausalLM` (#6515)
phymbert Apr 13, 2024
b5e7285
CUDA: fix matrix multiplication logic for tests (#6667)
JohannesGaessler Apr 13, 2024
de17e3f
fix memcpy() crash, add missed cmd in guide, fix softmax (#6622)
NeoZhangJianyu Apr 14, 2024
a4ec34e
convert : enable the `--use-temp-file` cli flag (#6645)
jac-jim Apr 14, 2024
e689fc4
[bug fix] convert github repository_owner to lowercase (#6673)
jaeminSon Apr 14, 2024
8800226
Fix --split-max-size (#6655)
CISC Apr 14, 2024
422c2af
Added support for GGML_OP_CLAMP in Metal (#6662)
dave-fl Apr 14, 2024
f184dd9
flake.lock: Update (#6669)
ggerganov Apr 14, 2024
04fbc5f
Add Command R chat template (#6650)
jc19chaoj Apr 14, 2024
1958f7e
llama : add missing kv clear in llama_beam_search (#6664)
dwrensha Apr 14, 2024
17e98d4
fix mul_mat_id() for new input, make the ut pass (#6682)
NeoZhangJianyu Apr 15, 2024
7fc16a2
swift : linux support (#6590)
spprichard Apr 15, 2024
3272896
server : revert "minor layout improvements" (#6684)
phymbert Apr 15, 2024
132f557
llama : fix restoring the number of outputs from state files (#6687)
compilade Apr 15, 2024
9ec8635
add detection of Xeon PHI: Knights Corner.
julialongtin Mar 12, 2024
a83e2ca
handle the case that we have no glibc on the PHI.
julialongtin Mar 12, 2024
5c0d49c
instead of checking on glibc, check on SYS_getcpu
julialongtin Mar 12, 2024
366279e
try to detect the PHI cross compiler in make.
julialongtin Mar 12, 2024
7fb8d47
try to detect the PHI cross compiler in make.
julialongtin Mar 12, 2024
429d69f
try to implement one intrinsic
julialongtin Mar 13, 2024
b5ea05f
use right type, and define GGML_F32_VEC_ZERO.
julialongtin Mar 13, 2024
7fce3f6
import intrinsics.
julialongtin Mar 13, 2024
192e4ad
implement F32 dot products.
julialongtin Mar 16, 2024
83be3db
Update ggml.c
julialongtin Mar 16, 2024
114e7dd
Update ggml.c
julialongtin Mar 16, 2024
c70b5f2
Update ggml.c
julialongtin Mar 16, 2024
d7d679e
merge from upstream
julialongtin Mar 17, 2024
a56a6f3
add a benchmark / test binary.
julialongtin Mar 17, 2024
d095d8e
Update ggml-phi-knc.c
julialongtin Mar 17, 2024
5a9d2f5
remove intrinsics import, and use upConv to save 12 bytes of memory t…
julialongtin Mar 20, 2024
a06fa4b
use the same header as ggml.c, and remove some warnings.
julialongtin Mar 20, 2024
bb73cb3
formatting changes.
julialongtin Mar 20, 2024
a48d3b9
spacing changes.
julialongtin Mar 21, 2024
c9730c0
be more specific about the length of our list of run amounts.
julialongtin Mar 21, 2024
669ce9b
begin work on targeting dot_q5_K_q8_K.
julialongtin Mar 23, 2024
3edaaca
import stdint.h for sizeSt.
julialongtin Mar 23, 2024
62e3543
import stdio.h for size_t.
julialongtin Mar 23, 2024
8703abe
pull in ggml specific types.
julialongtin Mar 23, 2024
a7f8abe
tell ggml-common.h to export what we want.
julialongtin Mar 23, 2024
aee550a
force to compile.
julialongtin Mar 23, 2024
a015d84
allow using code from ggml-phi-knc-dot_q5_K_q8_K.c
julialongtin Mar 23, 2024
7f5adf3
attempt to speed up float clearing.
julialongtin Mar 23, 2024
b3ec86e
first fixes.
julialongtin Mar 23, 2024
ff29b65
formatting improvement.
julialongtin Mar 23, 2024
2f0a949
promote aux16 into a vector.
julialongtin Mar 23, 2024
66d26d4
promote aux16 into a vector.
julialongtin Mar 23, 2024
84093a6
promote aux16 into a vector. (part three)
julialongtin Mar 23, 2024
e99f3a9
fix typo.
julialongtin Mar 23, 2024
656bf28
copy right block.
julialongtin Mar 23, 2024
2870bfc
add missing variable.
julialongtin Mar 23, 2024
7a00422
try to use vectorized zeroing function.
julialongtin Mar 23, 2024
5c010f7
expand mask, and align memory.
julialongtin Mar 23, 2024
ed639a6
use better memory save operator.
julialongtin Mar 23, 2024
31b8a5a
use quotes properly.
julialongtin Mar 23, 2024
45c94bd
promote aux16 to a vector.
julialongtin Mar 23, 2024
3c29fd5
add missing address of operators.
julialongtin Mar 23, 2024
10237df
promote aux32 to a vector.
julialongtin Mar 23, 2024
da69ed5
add I32 vector memory clearing.
julialongtin Mar 23, 2024
e3468e0
attempt our first FMA.
julialongtin Mar 23, 2024
d34e0ff
use proper mov operator, and pass addresses.
julialongtin Mar 23, 2024
0c01d07
perform 16 operations at a time.
julialongtin Mar 24, 2024
98c9b69
better comments, and fix some small errors.
julialongtin Mar 24, 2024
3cdfc9c
spacing changes, eliminate dead references to k1 or zero, and use the…
julialongtin Mar 24, 2024
3fef54f
fix our reference to src in the second place, and use a more accurate…
julialongtin Mar 24, 2024
1c182a3
promote aux8 into a vector.
julialongtin Mar 24, 2024
e579af1
loosen alignment requirements for zeros, add missing function, and pr…
julialongtin Mar 24, 2024
2a47e5f
separate filling aux16 from consuming aux16 by making it an array of …
julialongtin Mar 24, 2024
20c2bc5
fix vector sizes.
julialongtin Mar 25, 2024
33cc1d8
massively rewrite assembly routines.
julialongtin Apr 2, 2024
90498c1
minor changes.
julialongtin Apr 2, 2024
3cf6eb0
formatting.
julialongtin Apr 2, 2024
3ff0924
indent headers consistently.
julialongtin Apr 3, 2024
aeb5ae8
formatting changes.
julialongtin Apr 3, 2024
ded4da4
add Makefile rule for generation .s file, for manual inspection.
julialongtin Apr 3, 2024
f84859a
whoops. missing tab.
julialongtin Apr 3, 2024
b8abefb
use GGML_F32_EPR, and remove some dead code.
julialongtin Apr 3, 2024
fb83cd9
reformat, and label what these files are.
julialongtin Apr 3, 2024
d966ac2
replace tabs with spaces.
julialongtin Apr 3, 2024
c3d438b
further optimizations. 0.99 tokens per second.
julialongtin Apr 22, 2024
e37b7f8
fix some small errors.
julialongtin Apr 22, 2024
4fb1547
fix an offset error, and get rid of tabs.
julialongtin Apr 22, 2024
dc1f639
comment and spacing fixes.
julialongtin Apr 24, 2024
0124f7a
use or, instead of and. bug fix?
julialongtin Apr 24, 2024
9a799eb
spacing and capitalization changes.
julialongtin Apr 25, 2024
54f181d
spacing and capitalization changes. Fix the register list of GGML_5bi…
julialongtin Apr 26, 2024
1c2fdc3
minor spacing and comment changes.
julialongtin May 9, 2024
9fa06f4
add batch fp16<->fp32 conversion functions.
julialongtin May 9, 2024
c39fa8b
remove a warning.
julialongtin May 9, 2024
2cf193e
fix typo
julialongtin May 9, 2024
664a602
use different restrict syntax, to make g++ happy.
julialongtin May 9, 2024
6e0258a
broadcast a single int8, instead of 4 of them.
julialongtin May 10, 2024
b1c9622
Use a vectorized assembly function to handle remaining chunks less th…
julialongtin May 10, 2024
a14fe02
use vbroadcastss in place of vbroadcast32x4.
julialongtin May 10, 2024
d8d574c
perform better prefetches, and invert the test of our clear flag for …
julialongtin May 10, 2024
204bc1f
remove useless prefetches.
julialongtin May 10, 2024
f555f9d
spacing and comment changes.
julialongtin May 10, 2024
dda250f
move sub earlier, and move the compare of iterations to outside, and …
julialongtin May 10, 2024
270204e
fix loop.
julialongtin May 10, 2024
9a1a53b
use values inside of the loop as soon as we have them.
julialongtin May 10, 2024
f3b86eb
correct a comment, and use jz when comparing to zero.
julialongtin May 10, 2024
4097cde
comment clarification.
julialongtin May 10, 2024
511ad80
change from handling three iterations per loop to four.
julialongtin May 11, 2024
47ca67a
subtract the correct amount.
julialongtin May 11, 2024
1b7ca0b
look at the right final memory location.
julialongtin May 11, 2024
4d94831
add missing jump.
julialongtin May 11, 2024
fc23c22
spacing changes.
julialongtin May 11, 2024
a273a9e
spacing changes.
julialongtin May 11, 2024
9f3623f
introduce r10 and r11, for vloadunpackhd.
julialongtin May 11, 2024
9aa34c8
rename label 1 to 3.
julialongtin May 11, 2024
eefa650
rename some labels.
julialongtin May 11, 2024
0c0137e
relabel some other labels.
julialongtin May 11, 2024
50887fc
fill and increment r12 and r13.
julialongtin May 11, 2024
257c06b
add missing vector.
julialongtin May 11, 2024
3d39d61
make the offset of q4 available.
julialongtin May 11, 2024
420e9db
minor comment fixes.
julialongtin May 11, 2024
084e368
load from identical addresses for low and high side.
julialongtin May 11, 2024
7925fb1
make offset available in a register.
julialongtin May 11, 2024
bd22e9d
do 2 rounds of 4, instead of 4 rounds of 2. and properly offset unall…
julialongtin May 11, 2024
aede2f5
spacing changes.
julialongtin May 12, 2024
ded062c
Merge branch 'master' into 0.99-rebase
julialongtin Jun 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Checks: >
-readability-implicit-bool-conversion,
-readability-magic-numbers,
-readability-uppercase-literal-suffix,
-readability-simplify-boolean-expr,
clang-analyzer-*,
-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
performance-*,
Expand Down
8 changes: 5 additions & 3 deletions .devops/full-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ FROM ${BASE_CUDA_DEV_CONTAINER} as build
ARG CUDA_DOCKER_ARCH=all

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand All @@ -26,8 +26,10 @@ COPY . .

# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable cuBLAS
ENV LLAMA_CUBLAS=1
# Enable CUDA
ENV LLAMA_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

RUN make

Expand Down
5 changes: 5 additions & 0 deletions .devops/full-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ ENV LLAMA_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

# Enable cURL
ENV LLAMA_CURL=1
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

RUN make

ENTRYPOINT ["/app/.devops/tools.sh"]
5 changes: 4 additions & 1 deletion .devops/full.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ARG UBUNTU_VERSION=22.04
FROM ubuntu:$UBUNTU_VERSION as build

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand All @@ -15,6 +15,9 @@ WORKDIR /app

COPY . .

ENV LLAMA_CURL=1


RUN make

ENV LC_ALL=C.utf8
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cpp-clblast.srpm.spec
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SRPM for building from source and packaging an RPM for RPM-based distros.
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package
# https://docs.fedoraproject.org/en-US/quick-docs/creating-rpm-packages
# Built and maintained by John Boero - boeroboy@gmail.com
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SRPM for building from source and packaging an RPM for RPM-based distros.
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package
# https://docs.fedoraproject.org/en-US/quick-docs/creating-rpm-packages
# Built and maintained by John Boero - boeroboy@gmail.com
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal

Expand All @@ -12,7 +12,7 @@
# 4. OpenCL/CLBLAST support simply requires the ICD loader and basic opencl libraries.
# It is up to the user to install the correct vendor-specific support.

Name: llama.cpp-cublas
Name: llama.cpp-cuda
Version: %( date "+%%Y%%m%%d" )
Release: 1%{?dist}
Summary: CPU Inference of LLaMA model in pure C/C++ (no CUDA/OpenCL)
Expand All @@ -32,24 +32,24 @@ CPU inference for Meta's Lllama2 models using default options.
%setup -n llama.cpp-master

%build
make -j LLAMA_CUBLAS=1
make -j LLAMA_CUDA=1

%install
mkdir -p %{buildroot}%{_bindir}/
cp -p main %{buildroot}%{_bindir}/llamacppcublas
cp -p server %{buildroot}%{_bindir}/llamacppcublasserver
cp -p simple %{buildroot}%{_bindir}/llamacppcublassimple
cp -p main %{buildroot}%{_bindir}/llamacppcuda
cp -p server %{buildroot}%{_bindir}/llamacppcudaserver
cp -p simple %{buildroot}%{_bindir}/llamacppcudasimple

mkdir -p %{buildroot}/usr/lib/systemd/system
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamacublas.service
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamacuda.service
[Unit]
Description=Llama.cpp server, CPU only (no GPU support in this build).
After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target

[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/llama
ExecStart=/usr/bin/llamacppcublasserver $LLAMA_ARGS
ExecStart=/usr/bin/llamacppcudaserver $LLAMA_ARGS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=never

Expand All @@ -67,10 +67,10 @@ rm -rf %{buildroot}
rm -rf %{_builddir}/*

%files
%{_bindir}/llamacppcublas
%{_bindir}/llamacppcublasserver
%{_bindir}/llamacppcublassimple
/usr/lib/systemd/system/llamacublas.service
%{_bindir}/llamacppcuda
%{_bindir}/llamacppcudaserver
%{_bindir}/llamacppcudasimple
/usr/lib/systemd/system/llamacuda.service
%config /etc/sysconfig/llama

%pre
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cpp.srpm.spec
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SRPM for building from source and packaging an RPM for RPM-based distros.
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package
# https://docs.fedoraproject.org/en-US/quick-docs/creating-rpm-packages
# Built and maintained by John Boero - boeroboy@gmail.com
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal

Expand Down
4 changes: 2 additions & 2 deletions .devops/main-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ COPY . .

# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable cuBLAS
ENV LLAMA_CUBLAS=1
# Enable CUDA
ENV LLAMA_CUDA=1

RUN make

Expand Down
48 changes: 35 additions & 13 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@
config,
stdenv,
mkShell,
runCommand,
cmake,
ninja,
pkg-config,
git,
python3,
mpi,
openblas, # TODO: Use the generic `blas` so users could switch between alternative implementations
blas,
cudaPackages,
darwin,
rocmPackages,
Expand All @@ -23,7 +24,7 @@
useOpenCL
useRocm
useVulkan
],
] && blas.meta.available,
useCuda ? config.cudaSupport,
useMetalKit ? stdenv.isAarch64 && stdenv.isDarwin && !useOpenCL,
useMpi ? false, # Increases the runtime closure size by ~700M
Expand All @@ -35,7 +36,8 @@
# It's necessary to consistently use backendStdenv when building with CUDA support,
# otherwise we get libstdc++ errors downstream.
effectiveStdenv ? if useCuda then cudaPackages.backendStdenv else stdenv,
enableStatic ? effectiveStdenv.hostPlatform.isStatic
enableStatic ? effectiveStdenv.hostPlatform.isStatic,
precompileMetalShaders ? false
}@inputs:

let
Expand Down Expand Up @@ -65,10 +67,15 @@ let
strings.optionalString (suffices != [ ])
", accelerated with ${strings.concatStringsSep ", " suffices}";

executableSuffix = effectiveStdenv.hostPlatform.extensions.executable;

# TODO: package the Python in this repository in a Nix-like way.
# It'd be nice to migrate to buildPythonPackage, as well as ensure this repo
# is PEP 517-compatible, and ensure the correct .dist-info is generated.
# https://peps.python.org/pep-0517/
#
# TODO: Package up each Python script or service appropriately, by making
# them into "entrypoints"
llama-python = python3.withPackages (
ps: [
ps.numpy
Expand All @@ -87,6 +94,11 @@ let
]
);

xcrunHost = runCommand "xcrunHost" {} ''
mkdir -p $out/bin
ln -s /usr/bin/xcrun $out/bin
'';

# apple_sdk is supposed to choose sane defaults, no need to handle isAarch64
# separately
darwinBuildInputs =
Expand Down Expand Up @@ -150,13 +162,18 @@ effectiveStdenv.mkDerivation (
postPatch = ''
substituteInPlace ./ggml-metal.m \
--replace '[bundle pathForResource:@"ggml-metal" ofType:@"metal"];' "@\"$out/bin/ggml-metal.metal\";"

# TODO: Package up each Python script or service appropriately.
# If we were to migrate to buildPythonPackage and prepare the `pyproject.toml`,
# we could make those *.py into setuptools' entrypoints
substituteInPlace ./*.py --replace "/usr/bin/env python" "${llama-python}/bin/python"
substituteInPlace ./ggml-metal.m \
--replace '[bundle pathForResource:@"default" ofType:@"metallib"];' "@\"$out/bin/default.metallib\";"
'';

# With PR#6015 https://github.com/ggerganov/llama.cpp/pull/6015,
# `default.metallib` may be compiled with Metal compiler from XCode
# and we need to escape sandbox on MacOS to access Metal compiler.
# `xcrun` is used find the path of the Metal compiler, which is varible
# and not on $PATH
# see https://github.com/ggerganov/llama.cpp/pull/6118 for discussion
__noChroot = effectiveStdenv.isDarwin && useMetalKit && precompileMetalShaders;

nativeBuildInputs =
[
cmake
Expand All @@ -173,6 +190,8 @@ effectiveStdenv.mkDerivation (
]
++ optionals (effectiveStdenv.hostPlatform.isGnu && enableStatic) [
glibc.static
] ++ optionals (effectiveStdenv.isDarwin && useMetalKit && precompileMetalShaders) [
xcrunHost
];

buildInputs =
Expand All @@ -181,6 +200,7 @@ effectiveStdenv.mkDerivation (
++ optionals useMpi [ mpi ]
++ optionals useOpenCL [ clblast ]
++ optionals useRocm rocmBuildInputs
++ optionals useBlas [ blas ]
++ optionals useVulkan vulkanBuildInputs;

cmakeFlags =
Expand All @@ -191,7 +211,7 @@ effectiveStdenv.mkDerivation (
(cmakeBool "CMAKE_SKIP_BUILD_RPATH" true)
(cmakeBool "LLAMA_BLAS" useBlas)
(cmakeBool "LLAMA_CLBLAST" useOpenCL)
(cmakeBool "LLAMA_CUBLAS" useCuda)
(cmakeBool "LLAMA_CUDA" useCuda)
(cmakeBool "LLAMA_HIPBLAS" useRocm)
(cmakeBool "LLAMA_METAL" useMetalKit)
(cmakeBool "LLAMA_MPI" useMpi)
Expand All @@ -216,14 +236,16 @@ effectiveStdenv.mkDerivation (
# Should likely use `rocmPackages.clr.gpuTargets`.
"-DAMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102"
]
++ optionals useMetalKit [ (lib.cmakeFeature "CMAKE_C_FLAGS" "-D__ARM_FEATURE_DOTPROD=1") ]
++ optionals useBlas [ (lib.cmakeFeature "LLAMA_BLAS_VENDOR" "OpenBLAS") ];
++ optionals useMetalKit [
(lib.cmakeFeature "CMAKE_C_FLAGS" "-D__ARM_FEATURE_DOTPROD=1")
(cmakeBool "LLAMA_METAL_EMBED_LIBRARY" (!precompileMetalShaders))
];

# TODO(SomeoneSerge): It's better to add proper install targets at the CMake level,
# if they haven't been added yet.
postInstall = ''
mv $out/bin/main $out/bin/llama
mv $out/bin/server $out/bin/llama-server
mv $out/bin/main${executableSuffix} $out/bin/llama${executableSuffix}
mv $out/bin/server${executableSuffix} $out/bin/llama-server${executableSuffix}
mkdir -p $out/include
cp $src/llama.h $out/include/
'';
Expand Down
11 changes: 8 additions & 3 deletions .devops/server-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,26 @@ FROM ${BASE_CUDA_DEV_CONTAINER} as build
ARG CUDA_DOCKER_ARCH=all

RUN apt-get update && \
apt-get install -y build-essential git
apt-get install -y build-essential git libcurl4-openssl-dev

WORKDIR /app

COPY . .

# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable cuBLAS
ENV LLAMA_CUBLAS=1
# Enable CUDA
ENV LLAMA_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

RUN make

FROM ${BASE_CUDA_RUN_CONTAINER} as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

COPY --from=build /app/server /server

ENTRYPOINT [ "/server" ]
7 changes: 5 additions & 2 deletions .devops/server-intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ FROM intel/oneapi-basekit:$ONEAPI_VERSION as build

ARG LLAMA_SYCL_F16=OFF
RUN apt-get update && \
apt-get install -y git
apt-get install -y git libcurl4-openssl-dev

WORKDIR /app

Expand All @@ -16,11 +16,14 @@ RUN mkdir build && \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
fi && \
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16} && \
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake --build . --config Release --target server

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

COPY --from=build /app/build/bin/server /server

ENV LC_ALL=C.utf8
Expand Down
5 changes: 5 additions & 0 deletions .devops/server-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ ENV LLAMA_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

# Enable cURL
ENV LLAMA_CURL=1
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

RUN make

ENTRYPOINT [ "/app/server" ]
6 changes: 5 additions & 1 deletion .devops/server-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,16 @@ RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key
apt update -y && \
apt-get install -y vulkan-sdk

# Install cURL
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

# Build it
WORKDIR /app
COPY . .
RUN mkdir build && \
cd build && \
cmake .. -DLLAMA_VULKAN=1 && \
cmake .. -DLLAMA_VULKAN=1 -DLLAMA_CURL=1 && \
cmake --build . --config Release --target server

# Clean up
Expand Down
7 changes: 6 additions & 1 deletion .devops/server.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,21 @@ ARG UBUNTU_VERSION=22.04
FROM ubuntu:$UBUNTU_VERSION as build

RUN apt-get update && \
apt-get install -y build-essential git
apt-get install -y build-essential git libcurl4-openssl-dev

WORKDIR /app

COPY . .

ENV LLAMA_CURL=1

RUN make

FROM ubuntu:$UBUNTU_VERSION as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

COPY --from=build /app/server /server

ENV LC_ALL=C.utf8
Expand Down
Loading
Loading