Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ggerganov:master #135

Closed
wants to merge 13 commits into from
Closed

Commits on Jul 27, 2024

  1. cann: Fix Multi-NPU execution error (#8710)

    * cann: fix multi-npu exec error
    
    * cann: update comment  for ggml_backend_cann_supports_buft
    wangshuai09 authored Jul 27, 2024
    Configuration menu
    Copy the full SHA
    bfb4c74 View commit details
    Browse the repository at this point in the history
  2. common : add --no-warmup option for main/llama-cli (#8712)

    This commit adds a --no-warmup option for llama-cli.
    
    The motivation for this is that it can be convenient to skip the
    warmup llama_decode call when debugging.
    
    Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
    danbev authored Jul 27, 2024
    Configuration menu
    Copy the full SHA
    9d03d08 View commit details
    Browse the repository at this point in the history
  3. llama : add function for model-based max number of graph nodes (#8622)

    * llama : model-based max number of graph nodes
    
    ggml-ci
    
    * llama : disable 405B max_nodes path due to lack of complaints
    
    ggml-ci
    ggerganov authored Jul 27, 2024
    Configuration menu
    Copy the full SHA
    92090ec View commit details
    Browse the repository at this point in the history
  4. llama : add support for llama 3.1 rope scaling factors (#8676)

    * Add llama 3.1 rope scaling factors to llama conversion and inference
    
    This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192
    
    * Update convert_hf_to_gguf.py
    
    Co-authored-by: compilade <git@compilade.net>
    
    * address comments
    
    * address comments
    
    * Update src/llama.cpp
    
    Co-authored-by: compilade <git@compilade.net>
    
    * Update convert_hf_to_gguf.py
    
    Co-authored-by: compilade <git@compilade.net>
    
    ---------
    
    Co-authored-by: compilade <git@compilade.net>
    jmorganca and compilade authored Jul 27, 2024
    Configuration menu
    Copy the full SHA
    b5e9546 View commit details
    Browse the repository at this point in the history
  5. ggml : remove unnecessary UNUSED macro call (ggml/880)

    This commit removes an UNUSED macro call that is not needed as the
    variable n0 is used in the code and will not produce a warning.
    
    Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
    danbev authored and ggerganov committed Jul 27, 2024
    Configuration menu
    Copy the full SHA
    c12b6e8 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    d2b851b View commit details
    Browse the repository at this point in the history
  7. vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/…

    …893)
    
    This prevents invalid frees when destroying a partially initialized
    vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
    when running out of device memory.
    
    Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
    2 people authored and ggerganov committed Jul 27, 2024
    Configuration menu
    Copy the full SHA
    203b7f1 View commit details
    Browse the repository at this point in the history
  8. ggml: add support for float16 input tensors in pooling operations (gg…

    …ml/895)
    
    * Add support for float16 tensors in 1d pooling operations
    
    * Add support for float16 input tensors in 2d pooling operations
    
    * code cleanup
    
    remove unnecessary casting during srow ptr initialization
    
    ---------
    
    Co-authored-by: vanaka11 <vanaka1189@gmail.com>
    2 people authored and ggerganov committed Jul 27, 2024
    Configuration menu
    Copy the full SHA
    9f77d89 View commit details
    Browse the repository at this point in the history
  9. ggml : loop tiling optimizations for scalar path (ggml/898)

    Apply a loop tiling technique to the generic path, which provides
    performance upside for ISAs with enough registers to take advantage
    of it. Also helps the compiler optimize this path.
    heshpdx authored and ggerganov committed Jul 27, 2024
    Configuration menu
    Copy the full SHA
    a05ca93 View commit details
    Browse the repository at this point in the history
  10. sync : ggml

    ggml-ci
    ggerganov committed Jul 27, 2024
    Configuration menu
    Copy the full SHA
    ae7985c View commit details
    Browse the repository at this point in the history
  11. ggml : add missing semicolon (#0)

    ggml-ci
    ggerganov committed Jul 27, 2024
    Configuration menu
    Copy the full SHA
    345c8c0 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    56f20aa View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    5e2727f View commit details
    Browse the repository at this point in the history