Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool calls support improvements #11

Open
wants to merge 144 commits into
base: xsn/tool_call
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
144 commits
Select commit Hold shift + click to select a range
cddae48
Correct typo run_llama2.sh > run-llama2.sh (#9149)
shou692199 Aug 30, 2024
0ab30f8
llama : fix llama_split_mode enum values in main_gpu document (#9057)
kou Aug 30, 2024
49271ef
llama : fix typo in xcda_array_view comment [no ci] (#9132)
danbev Aug 31, 2024
ea5d747
sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908)
Srihari-mcw Aug 31, 2024
a47667c
nix: fix CUDA build - replace deprecated autoAddOpenGLRunpathHook
enolan Aug 22, 2024
8f1d81a
llama : support RWKV v6 models (#8980)
MollySophia Sep 1, 2024
c6d4cb4
llama : minor style
ggerganov Sep 2, 2024
9c1ba55
build(nix): Package gguf-py (#5664)
ditsuke Sep 2, 2024
b60074f
llama-cli : remove duplicated log message (#9275)
nbcsm Sep 2, 2024
6e7d133
server : refactor multitask handling (#9274)
ngxson Sep 2, 2024
f771d06
ggml : add pthread includes on FreeBSD (#9258)
yurivict Sep 2, 2024
048de84
docker : fix missing binaries in full-cuda image (#9278)
slaren Sep 2, 2024
f148516
src: make tail invalid when kv cell is intersection for mamba (#9249)
kylo5aby Sep 2, 2024
48baa61
server : test script : add timeout for all requests (#9282)
ngxson Sep 2, 2024
b69a480
readme : refactor API section + remove old hot topics
ggerganov Sep 3, 2024
8962422
llama-bench : add JSONL (NDJSON) output mode (#9288)
akx Sep 3, 2024
7605ae7
flake.lock: Update (#9261)
ggerganov Sep 3, 2024
9379d3c
readme : rename result_format to response_format (#9300)
iscy Sep 4, 2024
82e3b03
rpc : make RPC servers come first in the device list (#9296)
rgerganov Sep 4, 2024
c8671ae
Fix broken links in docker.md (#9306)
carlory Sep 4, 2024
5910ea9
[SYCL] Fix DMMV dequantization (#9279)
OuadiElfarouki Sep 4, 2024
581c305
ggml : AVX2 support for Q4_0_8_8 (#8713)
Srihari-mcw Sep 4, 2024
bdf314f
llama-bench : fix NUL terminators in CPU name (#9313)
slaren Sep 5, 2024
4db0478
cuda : fix defrag with quantized KV (#9319)
slaren Sep 5, 2024
1031771
CMake fix: host for msvc compiler can only be x86 or x64 (#8624)
Xarbirus Sep 5, 2024
32b2ec8
Update build.yml (#9184)
awatuna Sep 5, 2024
9bc6db2
ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151)
compilade Sep 6, 2024
8ebe8dd
Improve Vulkan shader build system (#9239)
mtavenrath Sep 6, 2024
4a1411b
server : fix missing lock (#9334)
ngxson Sep 6, 2024
409dc4f
ggml : fix build break for the vulkan-debug (#9265)
cyzero-kim Sep 6, 2024
815b1fb
batched-bench : add `--output-format jsonl` option (#9293)
akx Sep 6, 2024
134bc38
llama-bench : log benchmark progress (#9287)
akx Sep 6, 2024
9b2c24c
server : simplify state machine for slot (#9283)
ngxson Sep 6, 2024
6c89eb0
ci : disable rocm image creation (#9340)
slaren Sep 7, 2024
947538a
ggml : fix missing `cpu_set_t` on emscripten (#9336)
ngxson Sep 7, 2024
df270ef
llama : refactor sampling v2 (#9294)
ggerganov Sep 7, 2024
e32d081
ggml : always check bounds on get_rows operations (#9354)
slaren Sep 7, 2024
1b9ae51
common : refactor arg parser (#9308)
ngxson Sep 7, 2024
e536426
llamafile : disable sgemm for batch-size 1 (#9330)
netrunnereve Sep 7, 2024
faf69d4
llama : sanitize invalid tokens (#9357)
ggerganov Sep 7, 2024
f12295b
llama : fix empty ring buffer push (#9358)
ggerganov Sep 7, 2024
a5b5d9a
llama.android : fix build (#9350)
ggerganov Sep 7, 2024
fbb7fcf
llama : set attrs of mislabelled EOT/EOM tokens (#9348)
bakkot Sep 8, 2024
efe6a83
ggml : fix cont with transposed tensors when one dimension is 1 (ggml…
smeso Aug 28, 2024
51d964a
cuda : mark BF16 CONT as unsupported
ggerganov Aug 28, 2024
d2d3200
cann : add Ascend NPU support (whisper/2336)
MengqingCao Aug 9, 2024
ba1cf84
cann : fix doxy (ggml/0)
ggerganov Aug 28, 2024
dbbebca
ggml: fix ggml_graph_cpy undefined behavior (ggml/943)
JohannesGaessler Aug 31, 2024
202084d
tests: add gradient tests for all backends (ggml/932)
JohannesGaessler Sep 3, 2024
9cb9260
vulkan: correctly report support for OP_CONT (ggml/946)
smeso Sep 6, 2024
406c1a3
vulkan: add dryrun support to sin and cos ops (ggml/947)
smeso Sep 6, 2024
60a3107
scripts : option to increase git patch context
ggerganov Sep 8, 2024
385decb
sync : ggml
ggerganov Sep 8, 2024
a876861
metal : update support condition for im2col + fix warning (#0)
ggerganov Sep 8, 2024
00b02bb
imatrix : fix arg parser for imatrix (#9366)
ngxson Sep 8, 2024
eae5971
llama : sanitize tokens in the upper bound (#9359)
slaren Sep 8, 2024
2a358fb
[SYCL] add check malloc result on device (#9346)
NeoZhangJianyu Sep 8, 2024
19f4a7b
llama : refactor samplers internal implementation (#9370)
slaren Sep 8, 2024
a249843
common : restore --n-gpu-layers (#9371)
slaren Sep 8, 2024
3f7ccfd
common : bring back missing args, add env var duplication check (#9375)
ngxson Sep 8, 2024
e079bff
cuda : fix FA Q src index (1 -> 0) (#9374)
ggerganov Sep 8, 2024
daa9623
Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend …
mtavenrath Sep 8, 2024
b2e89a3
Arm AArch64: Documentation updates (#9321)
eddnjjn Sep 9, 2024
54f376d
rpc : update README [no ci] (#9320)
rgerganov Sep 9, 2024
5ed0875
readme : add LLMUnity to UI projects (#9381)
amakropoulos Sep 9, 2024
8e6e2fb
CUDA: fix variable name conflict for Windows build (#9382)
JohannesGaessler Sep 9, 2024
38ca6f6
readme : update hot topics
ggerganov Sep 9, 2024
5fb5e24
llama : minor sampling refactor (2) (#9386)
slaren Sep 9, 2024
5fac4d5
ggml : vector length agnostic SVE support (#9290)
Vithulep Sep 9, 2024
293bebe
rpc : fix segfault with nkvo (#9389)
rgerganov Sep 9, 2024
bfe76d4
common : move arg parser code to `arg.cpp` (#9388)
ngxson Sep 9, 2024
fb3f249
make : do not run llama-gen-docs when building (#9399)
slaren Sep 10, 2024
0b4ac75
RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387)
MollySophia Sep 10, 2024
83008b7
llama : update llm_build_copy_mask_state comment [no ci] (#9385)
danbev Sep 10, 2024
00ba2ff
metal : fix compile warning with GGML_METAL_NDEBUG (#0)
ggerganov Sep 10, 2024
49006c6
llama : move random seed generation to the samplers (#9398)
slaren Sep 10, 2024
8d300bd
enable --special arg for llama-server (#9419)
matteoserva Sep 10, 2024
6cd4e03
arg : bring back missing ifdef (#9411)
ngxson Sep 10, 2024
cb9c933
flake.lock: Update (#9360)
ggerganov Sep 10, 2024
51b6038
sycl : update support conditions (#9394)
Alcpz Sep 11, 2024
b34e023
musa: remove Clang builtins mapping (#9421)
yeahdongcn Sep 11, 2024
d2b496b
batched-bench : remove unused code (#9305)
ggerganov Sep 11, 2024
5af118e
CUDA: fix --split-mode row race condition (#9413)
JohannesGaessler Sep 11, 2024
67155ab
feat: Implements retrying logic for downloading models using --model-…
farbodbj Sep 11, 2024
5bb2c5d
files : remove accidentally added `lora_test` submodule (#9430)
ngxson Sep 11, 2024
0996c55
llava : correct args for minicpmv-cli (#9429)
ngxson Sep 11, 2024
8db003a
py : support converting local models (#7547)
EvilFreelancer Sep 11, 2024
1b28061
llama : skip token bounds check when evaluating embeddings (#9437)
slaren Sep 11, 2024
449ccfb
Add Jais to list of supported models (#9439)
fmz Sep 12, 2024
df4b794
cann: Fix error when running a non-exist op (#9424)
Dou-Git Sep 12, 2024
c9c8575
enhance run script to be easy to change the parameters (#9448)
NeoZhangJianyu Sep 12, 2024
d6a04f8
ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408)
ggerganov Sep 12, 2024
2b00fa7
riscv : modify Makefile and add a RISCV_VECT to print log info (#9442)
Tameem-10xE Sep 12, 2024
39f852f
py : add special tokens in hf_converter for RWKV v6 (#9428)
MollySophia Sep 12, 2024
ff76e18
cmake : fixed the order of linking libraries for llama-quantize (#9450)
Xarbirus Sep 12, 2024
3c26a16
ci : bump actions/checkout to v4 (#9377)
trivikr Sep 12, 2024
c837981
py : add Phi-1.5/Phi-2 tokenizer (#9361)
daminho Sep 12, 2024
4dc4f5f
ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329)
no1wudi Sep 12, 2024
2a82511
cmake : fix for builds without `GGML_CDEF_PUBLIC` (#9338)
Xarbirus Sep 12, 2024
d4c3c10
lora : raise error if lm_head is ignored (#9103)
ngxson Sep 12, 2024
e665744
llava : fix the script error in MobileVLM README (#9054)
fengerhu1 Sep 12, 2024
e6b7801
cann: Add host buffer type for Ascend NPU (#9406)
Dou-Git Sep 12, 2024
7820364
server : Add option to return token pieces in /tokenize endpoint (#9108)
mathijshenquet Sep 12, 2024
bd35cb0
feat: remove a sampler from a chain (#9445)
giladgd Sep 13, 2024
0abc6a2
llama : llama_perf + option to disable timings during decode (#9355)
ggerganov Sep 13, 2024
feff4aa
server : add loading html page while model is loading (#9468)
ngxson Sep 13, 2024
befaf11
llama : make cell_id const in inp_s_mask block (#9470)
danbev Sep 14, 2024
1f4111e
cmake : use list(APPEND ...) instead of set() + dedup linker (#9463)
ggerganov Sep 14, 2024
dcdcee3
server: add data: [DONE] to /chat/completions stream response (#9459)
VoidIsVoid Sep 14, 2024
822b632
ggml : ggml_type_name return "NONE" for invalid values (#9458)
ykhrustalev Sep 14, 2024
7596487
cmake : try to fix sycl+intel build (#9487)
Xarbirus Sep 15, 2024
d6b37c8
readme : update tools list (#9475)
OLSecret Sep 15, 2024
3c7989f
py : add "LLaMAForCausalLM" conversion support (#9485)
csabakecskemeti Sep 15, 2024
6988da9
cmake : correct order of sycl flags (#9497)
Xarbirus Sep 15, 2024
e6deac3
gguf-split : add basic checks (#9499)
slaren Sep 15, 2024
6262d13
common : reimplement logging (#9418)
ggerganov Sep 15, 2024
90a2fff
flake.lock: Update (#9488)
ggerganov Sep 16, 2024
c4965a6
metal : handle zero-sized allocs (#9466)
ggerganov Sep 16, 2024
441b72b
main : option to disable context shift (#9484)
VJHack Sep 16, 2024
95ca851
llama : support MiniCPM3 (#9322)
CarryFun Sep 16, 2024
0aadac1
llama : support OLMoE (#9462)
2015aroras Sep 16, 2024
5c3d0f1
ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)
netrunnereve Sep 16, 2024
19514d6
cmake : do not hide GGML options + rename option (#9465)
ggerganov Sep 16, 2024
d54c21d
convert : identify missing model files (#9397)
compilade Sep 16, 2024
a6a3a5c
ggml : link MATH_LIBRARY not by its full path (#9339)
Xarbirus Sep 16, 2024
acb2c32
llama : rename n_embed to n_embd in rwkv6_time_mix (#9504)
danbev Sep 16, 2024
23e0d70
ggml : move common CPU backend impl to new header (#9509)
slaren Sep 16, 2024
37f3a38
llama : add llama_n_head() (#9512)
Xarbirus Sep 17, 2024
0d2ec43
llama : support IBM Granite architecture (#9412)
gabe-l-hart Sep 17, 2024
503147a
unicode : add <algorithm> (#9508)
ykhrustalev Sep 17, 2024
0226613
threadpool : skip polling for unused threads (#9461)
max-krasnyansky Sep 17, 2024
8344ef5
llama : fix n_vocab init for 'no_vocab' case (#9511)
Xarbirus Sep 17, 2024
8b836ae
arg : add env variable for parallel (#9513)
bertwagner Sep 17, 2024
7be099f
llama-bench: correct argument parsing error message (#9524)
Xarbirus Sep 17, 2024
faf67b3
[SYCL]set context default value to avoid memory issue, update guide (…
NeoZhangJianyu Sep 18, 2024
f799155
server : fix OpenSSL build (remove obsolete `LOG_INFO`) (#9529)
EZForever Sep 18, 2024
8a30835
server : match OAI structured output response (#9527)
VJHack Sep 18, 2024
6443ddd
llama : use reserve/emplace_back in sampler_sample (#9534)
danbev Sep 18, 2024
0d2f22e
scripts : verify py deps at the start of compare (#9520)
ggerganov Sep 18, 2024
64c6af3
ggml : fix n_threads_cur initialization with one thread (#9538)
slaren Sep 18, 2024
eca0fab
imatrix : disable prompt escape by default (#9543)
CISC Sep 19, 2024
6026da5
server : clean-up completed tasks from waiting list (#9531)
ggerganov Sep 19, 2024
d3830ad
Tool calls support improvements (support null content in messages, ha…
mario7421 Sep 20, 2024
b67b817
Merge branch 'master' into tool-call-improvements
mario7421 Sep 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devops/full-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_CUDA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release --target llama-cli -j$(nproc) && \
cmake --build build --config Release -j$(nproc) && \
cp build/bin/* .

ENTRYPOINT ["/app/.devops/tools.sh"]
53 changes: 46 additions & 7 deletions .devops/nix/devshells.nix
Original file line number Diff line number Diff line change
@@ -1,13 +1,52 @@
{ inputs, ... }:

{
perSystem =
{ config, lib, ... }:
{
config,
lib,
system,
...
}:
{
devShells =
lib.concatMapAttrs
(name: package: {
${name} = package.passthru.shell;
${name + "-extra"} = package.passthru.shell-extra;
})
config.packages;
let
pkgs = import inputs.nixpkgs { inherit system; };
stdenv = pkgs.stdenv;
scripts = config.packages.python-scripts;
in
lib.pipe (config.packages) [
(lib.concatMapAttrs (
name: package: {
${name} = pkgs.mkShell {
name = "${name}";
inputsFrom = [ package ];
shellHook = ''
echo "Entering ${name} devShell"
'';
};
"${name}-extra" =
if (name == "python-scripts") then
null
else
pkgs.mkShell {
name = "${name}-extra";
inputsFrom = [
package
scripts
];
# Extra packages that *may* be used by some scripts
packages = [
pkgs.python3Packages.tiktoken
];
shellHook = ''
echo "Entering ${name} devShell"
addToSearchPath "LD_LIBRARY_PATH" "${lib.getLib stdenv.cc.cc}/lib"
'';
};
}
))
(lib.filterAttrs (name: value: value != null))
];
};
}
18 changes: 8 additions & 10 deletions .devops/nix/nixpkgs-instances.nix
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,14 @@
config.cudaSupport = true;
config.allowUnfreePredicate =
p:
builtins.all
(
license:
license.free
|| builtins.elem license.shortName [
"CUDA EULA"
"cuDNN EULA"
]
)
(p.meta.licenses or [ p.meta.license ]);
builtins.all (
license:
license.free
|| builtins.elem license.shortName [
"CUDA EULA"
"cuDNN EULA"
]
) (p.meta.licenses or [ p.meta.license ]);
};
# Ensure dependencies use ROCm consistently
pkgsRocm = import inputs.nixpkgs {
Expand Down
36 changes: 36 additions & 0 deletions .devops/nix/package-gguf-py.nix
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
lib,
llamaVersion,
numpy,
tqdm,
sentencepiece,
pyyaml,
poetry-core,
buildPythonPackage,
pytestCheckHook,
}:

buildPythonPackage {
pname = "gguf";
version = llamaVersion;
pyproject = true;
nativeBuildInputs = [ poetry-core ];
propagatedBuildInputs = [
numpy
tqdm
sentencepiece
pyyaml
];
src = lib.cleanSource ../../gguf-py;
pythonImportsCheck = [
"numpy"
"gguf"
];
nativeCheckInputs = [ pytestCheckHook ];
doCheck = true;
meta = with lib; {
description = "Python package for writing binary files in the GGUF format";
license = licenses.mit;
maintainers = [ maintainers.ditsuke ];
};
}
Loading