- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.5k
 
starcoder : add GPU offloading #3827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
                
     Merged
            
            
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    
    
  Nexesenex 
      pushed a commit
        to Nexesenex/croco.cpp
      that referenced
      this pull request
    
      Oct 28, 2023 
    
    
      
  
    
      
    
  
* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci
    
  Nexesenex 
      pushed a commit
        to Nexesenex/croco.cpp
      that referenced
      this pull request
    
      Oct 28, 2023 
    
    
      
  
    
      
    
  
* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci
    
  wsxiaoys 
      added a commit
        to wsxiaoys/llama.cpp
      that referenced
      this pull request
    
      Nov 4, 2023 
    
    
  
    
  ggerganov 
      pushed a commit
      that referenced
      this pull request
    
      Nov 5, 2023 
    
    
  
    
  github-actions bot
      pushed a commit
        to KerfuffleV2/ggml-sys-bleedingedge
      that referenced
      this pull request
    
      Nov 9, 2023 
    
    
      
  
    
      
    
  
== Relevant log messages from source repo:
commit 875fb42871a0f5a88fbe31a0b5edd697b84038e4
Author: slaren <slarengh@gmail.com>
Date:   Wed Nov 8 13:15:14 2023 +0100
    ggml-alloc : fix backend assignments of views (#3982)
commit e9c1cecb9d7d743d30b4a29ecd56a411437def0a
Author: xaedes <xaedes@gmail.com>
Date:   Tue Nov 7 09:04:51 2023 +0100
    ggml : fix backward rope after YaRN (#3974)
    * fix backward process of rope
    rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions.
    the code for the backward process is nearly identically to the forward process:
    the only difference is the sign of the sin-values.
    to avoid future regressions remove the near-duplicate backward functions and reuse the forward code:
    for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`.
    the sin-values will be negated when forward is false.
    * fix finetune rope call to use correct default attn_factor of 1.0f
    * remove unused `ggml_rope_xpos_back`
    it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.
    * fix comments explaining the sinus sign in ggml_forward_rope
    * add missing function arguments in declaration
    * fix function argument type in declaration
commit 46876d2a2c92e60579dc732cdb8cbd243b06f317
Author: Meng Zhang <meng@tabbyml.com>
Date:   Mon Nov 6 22:49:08 2023 -0800
    cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
    * protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build
    * doc: add comments to ggml_cublas_loaded()
    * fix defined(...)
commit 2833a6f63c1b87c7f4ac574bcf7a15a2f3bf3ede
Author: slaren <slarengh@gmail.com>
Date:   Sun Nov 5 18:45:16 2023 +0100
    ggml-cuda : fix f16 mul mat (#3961)
    * ggml-cuda : fix f16 mul mat
    ggml-ci
    * silence common.cpp warning (bonus)
commit 132d25b8a62ea084447e0014a0112c1b371fb3f8
Author: Jared Van Bortel <cebtenzzre@gmail.com>
Date:   Sun Nov 5 10:08:57 2023 -0500
    cuda : fix disabling device with --tensor-split 1,0 (#3951)
    Co-authored-by: slaren <slarengh@gmail.com>
commit 3d48f42efcd05381221654376e9f6f69d76af739
Author: Meng Zhang <meng@tabbyml.com>
Date:   Sun Nov 5 04:40:08 2023 -0800
    llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)
    as done in ggml-org/llama.cpp#3827
commit c41ea36eaa3548776de4cb3d5d49b925cd3fc0f2
Author: Eve <139727413+netrunnereve@users.noreply.github.com>
Date:   Sun Nov 5 08:03:09 2023 +0000
    cmake : MSVC instruction detection (fixed up #809) (#3923)
    * Add detection code for avx
    * Only check hardware when option is ON
    * Modify per code review sugguestions
    * Build locally will detect CPU
    * Fixes CMake style to use lowercase like everywhere else
    * cleanup
    * fix merge
    * linux/gcc version for testing
    * msvc combines avx2 and fma into /arch:AVX2 so check for both
    * cleanup
    * msvc only version
    * style
    * Update FindSIMD.cmake
    ---------
    Co-authored-by: Howard Su <howard0su@gmail.com>
    Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
commit 48ade94538fa509465d71023e49d07aab0ec8cd5
Author: slaren <slarengh@gmail.com>
Date:   Sun Nov 5 08:12:13 2023 +0100
    cuda : revert CUDA pool stuff (#3944)
    * Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)"
    This reverts commit 629f917cd6b96ba1274c49a8aab163b1b189229d.
    * Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)"
    This reverts commit d6069051de7165a4e06662c89257f5d2905bb156.
    ggml-ci
commit d9b33fe95bd257b36c84ee5769cc048230067d6f
Author: Peter Sugihara <peter@campsh.com>
Date:   Fri Nov 3 12:18:18 2023 -0700
    metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938)
commit 5ba37461711095c0284233dbd14f0d9010cdbf56
Author: Xiao-Yong Jin <jinxiaoyong@gmail.com>
Date:   Fri Nov 3 13:00:31 2023 -0500
    ggml-metal: fix yarn rope (#3937)
commit abb77e7319aabc0b5cfb7c22da690a692489b6b7
Author: slaren <slarengh@gmail.com>
Date:   Fri Nov 3 12:13:09 2023 +0100
    ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)
commit 05816027d649f977468fc804cdb54e99eac246d1
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Fri Nov 3 09:24:00 2023 +0200
    common : YAYF (yet another YARN fix) (#3925)
    ggml-ci
commit 3fdbe6b66b7b5c6ad3b2f245cbad1517c27ff776
Author: cebtenzzre <cebtenzzre@gmail.com>
Date:   Fri Nov 3 02:31:58 2023 -0400
    llama : change yarn_ext_factor placeholder to -1 (#3922)
commit 629f917cd6b96ba1274c49a8aab163b1b189229d
Author: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Date:   Thu Nov 2 13:58:22 2023 -0600
    cuda : add ROCM aliases for CUDA pool stuff (#3918)
commit c7743fe1c1cbda5a886362aa371480360580fdf0
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Thu Nov 2 20:32:11 2023 +0200
    cuda : fix const ptrs warning causing ROCm build issues (#3913)
commit d6069051de7165a4e06662c89257f5d2905bb156
Author: Oleksii Maryshchenko <oleksii.maryshchenko@gmail.com>
Date:   Thu Nov 2 18:10:39 2023 +0100
    cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)
    * Using cuda memory pools for async alloc/dealloc.
    * If cuda device doesnt support memory pool than use old implementation.
    * Removed redundant cublasSetStream
    ---------
    Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
commit 4ff1046d75e64f0e556d8dcd930ea25c23eb8b18
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Thu Nov 2 16:22:30 2023 +0200
    gguf : print error for GGUFv1 files (#3908)
commit 21958bb393a654591ed26f339791b752d58f5c8b
Author: slaren <slarengh@gmail.com>
Date:   Thu Nov 2 13:10:33 2023 +0100
    cmake : disable LLAMA_NATIVE by default (#3906)
commit 2756c4fbffab097736d5116007872d86456a544a
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Thu Nov 2 11:20:21 2023 +0200
    gguf : remove special-case code for GGUFv1 (#3901)
    ggml-ci
commit 1efae9b7dca2a5cc5aa21c1997b538022964ea19
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Thu Nov 2 09:54:18 2023 +0200
    llm : prevent from 1-D tensors being GPU split (#3697)
commit b12fa0d1c13596869c512f49a526b979c94787cc
Author: cebtenzzre <cebtenzzre@gmail.com>
Date:   Thu Nov 2 02:50:16 2023 -0400
    build : link against build info instead of compiling against it (#3879)
    * cmake : fix build when .git does not exist
    * cmake : simplify BUILD_INFO target
    * cmake : add missing dependencies on BUILD_INFO
    * build : link against build info instead of compiling against it
    * zig : make build info a .cpp source instead of a header
    Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
    * cmake : revert change to CMP0115
    ---------
    Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
commit 4d719a6d4e74b9a98e75f826f865f3153717d54b
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Thu Nov 2 08:35:10 2023 +0200
    cuda : check if this fixes Pascal card regression (#3882)
commit 183b3fac6c28e65d23ac0230c1dd6fb84bf0154d
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Thu Nov 2 08:33:37 2023 +0200
    metal : fix build errors and kernel sig after #2268 (#3898)
    
    
  brittlewis12 
      added a commit
        to brittlewis12/llmfarm_core.swift
      that referenced
      this pull request
    
      Nov 17, 2023 
    
    
  
    
  olexiyb 
      pushed a commit
        to Sanctum-AI/llama.cpp
      that referenced
      this pull request
    
      Nov 23, 2023 
    
    
      
  
    
      
    
  
* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci
    
  olexiyb 
      pushed a commit
        to Sanctum-AI/llama.cpp
      that referenced
      this pull request
    
      Nov 23, 2023 
    
    
  
    
  brittlewis12 
      added a commit
        to brittlewis12/llmfarm_core.swift
      that referenced
      this pull request
    
      Nov 30, 2023 
    
    
  
    
  YuMJie 
      pushed a commit
        to YuMJie/powerinfer
      that referenced
      this pull request
    
      Oct 25, 2024 
    
    
  
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
No description provided.