update test #7

kalomaze · 2023-12-23T23:23:32Z

No description provided.

* server : allow continue edit on completion mode * server : handle abort case in runCompletion * server : style improvement

@cebtenzzre

…ml-org#3981) * gguf-py: Refactor and add file reading support * Replay changes from ggml-org#3871 Credit to @cebtenzzre for that pull * Various type annotation fixes. * sort imports with isort (again) * Fix missing return statement in add_tensor * style cleanup with flake8 * fix NamedTuple and Enum usage * Fix an issue with state init in GGUFReader Move examples to an examples/ directory Clean up examples Add an example of modifying keys in a GGUF file Update documentation with info on examples Try to support people importing gguf/gguf.py directly * Damagage is not a word. * Clean up gguf-py/examples/modify_gguf.py whitespace Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/examples/modify_gguf.py formatting Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/gguf/gguf_reader.py type hint Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Make examples executable, formatting changes * Add more information to GGUFReader and examples comments * Include a gguf Python package version bump * Add convert-gguf-endian.py script * cleanup * gguf-py : bump minor version * Reorganize scripts * Make GGUFReader endian detection less arbitrary * Add JSON dumping support to gguf-dump.py Which I kind of regret now * A few for gguf-dump.py cleanups * Murder accidental tuple in gguf-py/scripts/gguf-dump.py Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * cleanup * constants : remove unneeded type annotations * fix python 3.8 compat * Set up gguf- scripts in pyproject.toml * And include scripts/__init__.py, derp * convert.py: We can't currently support Q8_0 on big endian. * gguf-py: SpecialVocab: Always try available sources for special token ids gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata u * cleanup * Promote add_X_token to GGUF metadata for BOS and EOS --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Fix gguf-convert-endian script * Bump version and update description

* typos * Update examples/parallel/README.md Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>

* gguf-py: gguf_writer: Use BytesIO to build metadata * Use bytearray instead Bump gguf-py package version

…ml-org#4041) * Add ReLU and SQR CUDA ops to fix Persimmon offloading * Persimmon loader: More helpful error on CUDA/ROCM when offloading too many layers

* sync : ggml (backend v2) (wip) * sync : migrate examples and llama.cpp to dynamic graphs (wip) * sync : update tests + fix max op params to 64 ggml-ci * sync : ggml-cuda ggml-ci * llama : fix save/load state context size ggml-ci * sync : try to fix build on tvOS * sync : pass custom graph sizes in training examples * sync : update graph copies to new ggml API * sync : update sync-ggml.sh with new files * scripts : fix header in sync script * train : fix context size calculations * llama : increase inference graph size up to 4096 nodes * train : allocate grads for backward graphs * train : allocate grads for gb_tmp

ggml-ci

)

* add safetensors to convert.py help message * Check for single-file safetensors model * Update convert.py "model" option help message * revert convert.py help message change

* Add support for stablelm-3b-4e1t * Supports GPU offloading of (n-1) layers

Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…ered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values" This reverts commit 34b3dac.

# Conflicts: # ggml.c

* Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values * do not cast to size_t, instead just use doubles * ggml : add ggml_row_size(), deprecate ggml_type_sizef() * ggml : fix row size compute to avoid overflows * tests : fix sizey -> sizez --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml-ci

* ggml : use ggml_row_size where possible ggml-ci * ggml : move ggml_nbytes_split to ggml-cuda.cu

* ggml : group mul_mat_id rows by matrix (cpu only) * remove mmid parameters from mm forward * store row groups in wdata and calculate only once in GGML_TASK_INIT ggml-ci

* Add API key authentication for enhanced server-client security * server : to snake_case --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Make my experimental branch support Mixtral

3.9 Temp for Greedy Dynamic Temp 2.2 Temp for HHI Dynamic Temp 1.84 Temp for Entropy Dynamic Temp (this one remains the same as before)

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

# Conflicts: # ggml.c # ggml.h # requirements.txt # tests/test-quantize-perf.cpp

* lora : add support for non-llama models ggml-ci * avoid leaking ggml_context on failure cleanup ggml-ci * lora : allow 1d tensors * lora : include embd and output layers in size calculation * fix style

This reverts commit 1ea28d6.

This reverts commit 774f2cb.

* CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>

LostRuins and others added 30 commits November 9, 2023 21:33

updated docs

6870c31

added preloadstory

be92cfa

remove unused func + fix entropy sampling again

8849c11

added support for filecomments

4a130ee

remove experimental stuff

cd57e17

oopsie

a9005ab

Unbreak persimmon after ggml-org#3837 (ggml-org#4010)

df9d129

updated docs

36e860e

Merge branch 'master' into concedo_experimental

a6e6b8b

rename file comments to welcome

4b9a685

server : allow continue edit on completion mode (ggml-org#3950)

4a4fd3e

* server : allow continue edit on completion mode * server : handle abort case in runCompletion * server : style improvement

Update llama.cpp

fd263bd

include opencl dll

e08e1bd

Merge branch 'master' into concedo_experimental

027cd8c

fixed localflag

a00a32e

server : fix crash when prompt exceeds context size (ggml-org#3996)

d96ca7d

Fix gguf-convert-endian script (ggml-org#4037)

e86fc56

* Fix gguf-convert-endian script * Bump version and update description

Fix some documentation typos/grammar mistakes (ggml-org#4032)

532dd74

* typos * Update examples/parallel/README.md Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>

gguf-py: gguf_writer: Use bytearray to build metadata (ggml-org#4051)

21fd874

* gguf-py: gguf_writer: Use BytesIO to build metadata * Use bytearray instead Bump gguf-py package version

improved estimation

f4ee91a

Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (gg…

bb50a79

…ml-org#4041) * Add ReLU and SQR CUDA ops to fix Persimmon offloading * Persimmon loader: More helpful error on CUDA/ROCM when offloading too many layers

readme : update hot topics

c049b37

ggml : sync (im2col, GPU conv, 32-bit arm compat) (ggml-org#4060)

3d68f36

ggml-ci

llava : fix regression for square images in ggml-org#3613 (ggml-org#4056

bd90eca

)

convert.py: also look for plain model.safetensors (ggml-org#4043)

b46d12f

* add safetensors to convert.py help message * Check for single-file safetensors model * Update convert.py "model" option help message * revert convert.py help message change

stablelm : StableLM support (ggml-org#3586)

36eed0c

* Add support for stablelm-3b-4e1t * Supports GPU offloading of (n-1) layers

Fix MacOS Sonoma model quantization (ggml-org#4052)

6bb4908

Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

allow customized rope to use model set values

8b919b5

LostRuins and others added 28 commits December 14, 2023 16:43

do not cast to size_t, instead just use doubles

05f7db4

Merge branch 'pr_fix_buf_resize_type' into concedo_experimental

53bbd1e

Revert "Fixes "Not enough space in the context's memory pool" encount…

04bd895

…ered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values" This reverts commit 34b3dac.

fixed length exceeding max ctx

f0de495

Merge branch 'master' into concedo_experimental

aac7f0b

# Conflicts: # ggml.c

manual workflow for generating builds instead

ae3d829

Workflow Build from experimental branch

7798587

py : add protobuf dependency (ggml-org#4466)

c50e400

ggml : remove n_dims from ggml_tensor (ggml-org#4469)

cafcd4f

ggml-ci

ggml : use ggml_row_size where possible (ggml-org#4472)

6744dbe

* ggml : use ggml_row_size where possible ggml-ci * ggml : move ggml_nbytes_split to ggml-cuda.cu

ggml : group mul_mat_id rows by matrix (cpu only) (ggml-org#4480)

ee4725a

* ggml : group mul_mat_id rows by matrix (cpu only) * remove mmid parameters from mm forward * store row groups in wdata and calculate only once in GGML_TASK_INIT ggml-ci

server : add optional API Key Authentication example (ggml-org#4441)

88ae895

* Add API key authentication for enhanced server-client security * server : to snake_case --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Merge pull request #4 from kalomaze/concedo

21c1421

Make my experimental branch support Mixtral

Change override values for DynaTemp variants

062f668

3.9 Temp for Greedy Dynamic Temp 2.2 Temp for HHI Dynamic Temp 1.84 Temp for Entropy Dynamic Temp (this one remains the same as before)

llama : sanity checks for access to logits (ggml-org#4274)

8a5be3b

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Merge branch 'master' into concedo_experimental

76a3ba4

# Conflicts: # ggml.c # ggml.h # requirements.txt # tests/test-quantize-perf.cpp

lora : add support for non-llama models (ggml-org#3333)

c6c4fc0

* lora : add support for non-llama models ggml-ci * avoid leaking ggml_context on failure cleanup ggml-ci * lora : allow 1d tensors * lora : include embd and output layers in size calculation * fix style

Add experimental custom routing for MoE

774f2cb

Change default expert count to 2

1ea28d6

Merge remote-tracking branch 'origin/master' into concedo_experimental

e8cf7f6

updated lite, up ver

ec05230

Add temp_sim script + tweak dynatemp entropy

4c975c0

Revert "Change default expert count to 2"

6009778

This reverts commit 1ea28d6.

Revert "Add experimental custom routing for MoE"

4fd1b2f

This reverts commit 774f2cb.

Update DynaTemp branch with mainline improvements

de9e0f3

CUDA: Faster Mixtral prompt processing (ggml-org#4538)

97fa427

* CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>

CUDA: mul_mat_id always on GPU for batches >= 32

92497e1

kalomaze changed the base branch from concedo to alternate_colab December 23, 2023 23:24

kalomaze closed this Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

update test #7

update test #7

Uh oh!

kalomaze commented Dec 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

67 participants

Uh oh!

update test #7

update test #7

Uh oh!

Conversation

kalomaze commented Dec 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

67 participants