Skip to content

Commit

Permalink
llama : do not quantize expert gating tensors
Browse files Browse the repository at this point in the history
  • Loading branch information
ggerganov committed Dec 10, 2023
1 parent 6cfb31f commit d1259b7
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8443,6 +8443,9 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
quantize &= params->quantize_output_tensor || name != "output.weight";
quantize &= !params->only_copy;

// do not quantize expert gating tensors
quantize &= name.find("ffn_gate_inp.weight") == std::string::npos;

enum ggml_type new_type;
void * new_data;
size_t new_size;
Expand Down

0 comments on commit d1259b7

Please sign in to comment.