Skip to content

llama : support GEGLU for jina-bert-v2 #14090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 10, 2025
Merged

llama : support GEGLU for jina-bert-v2 #14090

merged 5 commits into from
Jun 10, 2025

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Jun 9, 2025

Clean up conversion of jina-bert-v2 models now that we have LLM_FFN_GEGLU, as we can easily support both old and newly converted models.

Removed ggml_cont in LLM_FFN_SWIGLU and LLM_FFN_GEGLU, they are unnecessary.

Edit: Tested without ggml_cont on CPU and CUDA with jina-reranker-v1 (GEGLU) and Phi-3.5-mini-instruct (SWIGLU).

@CISC CISC requested a review from ggerganov June 9, 2025 19:12
@github-actions github-actions bot added the python python script changes label Jun 9, 2025
Comment on lines 666 to 667
ggml_tensor * x0 = ggml_view_2d(ctx0, cur, split_point, cur->ne[1], cur->nb[1], 0);
ggml_tensor * x1 = ggml_view_2d(ctx0, cur, split_point, cur->ne[1], cur->nb[1], split_point * ggml_element_size(cur));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the conts works because the ops will be offloaded back to the CPU if the tensor is not contiguous:

switch (op->op) {
case GGML_OP_UNARY:
switch (ggml_get_unary_op(op)) {
case GGML_UNARY_OP_ABS:
case GGML_UNARY_OP_SGN:
case GGML_UNARY_OP_NEG:
case GGML_UNARY_OP_STEP:
case GGML_UNARY_OP_GELU:
case GGML_UNARY_OP_SILU:
case GGML_UNARY_OP_RELU:
case GGML_UNARY_OP_SIGMOID:
case GGML_UNARY_OP_HARDSIGMOID:
case GGML_UNARY_OP_HARDSWISH:
case GGML_UNARY_OP_GELU_ERF:
case GGML_UNARY_OP_GELU_QUICK:
case GGML_UNARY_OP_TANH:
case GGML_UNARY_OP_EXP:
return ggml_is_contiguous(op->src[0]);
default:
return false;
}
break;

For now lets keep the conts and expand the TODO to reference this comment, so that we update the backend implementations to support those.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov Just to make sure I'm on the right track here, I started looking into this yesterday, rather than (or maybe in addition to) making these ops handle non-contiguous data it seems much simpler (and more efficient) to add GEGLU/SWIGLU ops.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.

@CISC CISC requested a review from ggerganov June 10, 2025 10:13
@CISC CISC merged commit 3678b83 into master Jun 10, 2025
1 check passed
@CISC CISC deleted the cisc/jina-bert-v2-geglu branch June 10, 2025 16:02
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jun 10, 2025
* origin/master:
llama : support GEGLU for jina-bert-v2 (ggml-org#14090)
vulkan: force device 0 in CI (ggml-org#14106)
Fixed spec timings to: accepted/tested instead of accepted/drafted (ggml-org#14104)
sync : ggml
ggml : fix weak alias win32 (whisper/0)
Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (ggml-org#14099)
rpc : nicer error messages for RPC server crash (ggml-org#14076)
sync : ggml
Add in-build ggml::ggml ALIAS library (ggml/1260)
metal : use less stack memory in FA kernel (ggml-org#14088)
kv-cache : fix shift and defrag logic (ggml-org#14081)
llama : allow building all tests on windows when not using shared libs (ggml-org#13980)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants