forked from LostRuins/koboldcpp
    
        
        - 
                Notifications
    You must be signed in to change notification settings 
- Fork 5
b2266 #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
                
     Merged
            
            b2266 #91
Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    * Fix issues during StableLM models conversion * Fix hard coded layer_norm_eps * Support layer_norm_eps for LlavaStableLM Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Add missing parenthesis Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Support rotary_factor for LlavaStableLM Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * fix typo * Add StableLMEpochForCausalLM for safety Co-authored-by: compilade <113953597+compilade@users.noreply.github.com> * Add StableLMEpochForCausalLM for safety 2 Co-authored-by: compilade <113953597+compilade@users.noreply.github.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: compilade <113953597+compilade@users.noreply.github.com>
* coda : normalize enum names ggml-ci * code : cont * code : cont
…ible endpoint (#5708) * server: monitoring - add /metrics prometheus compatible endpoint * server: concurrency issue, when 2 task are waiting for results, only one call thread is notified * server: metrics - move to a dedicated struct
* server: logs - always use JSON logger, add add thread_id in message, log task_id and slot_id * server : skip GH copilot requests from logging * server : change message format of server_log() * server : no need to repeat log in comment * server : log style consistency * server : fix compile warning * server : fix tests regex patterns on M2 Ultra * server: logs: PR feedback on log level * server: logs: allow to choose log format in json or plain text * server: tests: output server logs in text * server: logs switch init logs to server logs macro * server: logs ensure value json value does not raised error * server: logs reduce level VERBOSE to VERB to max 4 chars * server: logs lower case as other log messages * server: logs avoid static in general Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: logs PR feedback: change text log format to: LEVEL [function_name] message | additional=data --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
fix nvcc version is empty
* [ggml-quants] Provide ggml_vqtbl1q_u8 for 64bit compatibility vqtbl1q_u8 is not part of arm v7 neon library * [android-example] Remove abi filter after arm v7a fix * [github-workflows] Do not skip Android armeabi-v7a build
The system prompt is now decoded in batches. * server : fix off-by-one n_past when start of prompt matches whole cache The tokens right after the matching part would otherwise skip a pos value.
* llama : refactor k-shift implementation ggml-ci * llama : rename llama_kv_cache_seq_shift to llama_kv_cache_seq_add * llama : cont k-shift refactoring + normalize type names ggml-ci * minor : fix MPI builds * llama : reuse n_rot from the build context ggml-ci * llama : revert enum name changes from this PR ggml-ci * llama : update llama_rope_type * llama : add comment about rope values * llama : fix build * passkey : apply kv cache updates explicitly ggml-ci * llama : change name to llama_kv_cache_update() * llama : add llama_kv_cache_seq_pos_max() * passkey : fix llama_kv_cache_seq_pos_max() usage * llama : some llama_kv_cell simplifications * llama : add llama_kv_cache_compress (EXPERIMENTAL) * llama : add alternative KV cache merging (EXPERIMENTAL) * llama : add llama_kv_cache_defrag * llama : comments * llama : remove llama_kv_cache_compress will add in a separate PR ggml-ci * llama : defragment via non-overlapping moves * llama : ggml_graph based defrag implementation ggml-ci * llama : switch the loop order in build_defrag * llama : add comments
…5718) * server: docs - refresh and tease a little bit more the http server * Rephrase README.md server doc Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: tests - longer inference timeout for CI
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Dec 22, 2024 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 11, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 12, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 12, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 13, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 13, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 14, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 15, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 15, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 15, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 15, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 15, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 15, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 18, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 21, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 21, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Aug 22, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 5, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 7, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 7, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 9, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 9, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 11, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 11, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 11, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 11, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 12, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 13, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 13, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 16, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 16, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 18, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 19, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 20, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 21, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 21, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 21, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 22, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 22, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 22, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 23, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 24, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 25, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 25, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 25, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 27, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
    
  Nexesenex 
      added a commit
      that referenced
      this pull request
    
      Oct 28, 2025 
    
    
      
  
    
      
    
  
To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
No description provided.