Skip to content

Conversation

@roycho96
Copy link
Contributor

Summary

Add Liger kernel support for EXAONE4 models (LG AI Research's EXAONE 4.0 series).

Changes

  • Add src/liger_kernel/transformers/model/exaone4.py with fused linear cross entropy forward
  • Add apply_liger_kernel_to_exaone4() function in monkey_patch.py
  • Register in __init__.py
  • Add revert_liger_kernel_to_exaone4() in test/utils.py
  • Add convergence tests in test/convergence/bf16/ and test/convergence/fp32/

Supported Kernels

  • RMSNorm (including QK-Norm in attention)
  • SwiGLU MLP
  • RoPE
  • Fused Linear Cross Entropy

Note on in_place=False for RMSNorm

EXAONE4 requires in_place=False for RMSNorm due to its attention implementation pattern:

# EXAONE4 pattern - separate assignment
query_states = self.q_proj(hidden_states).view(...).transpose(1, 2)
query_states = self.q_norm(query_states)  # reassignment to same variable

# vs Qwen3 pattern - chained
query_states = self.q_norm(self.q_proj(hidden_states).view(...)).transpose(1, 2)

The view/transpose operations share storage with the original tensor. In-place modification corrupts the autograd graph, causing NaN gradients.

References

Testing Done

  • Hardware Type: H100
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Copy link
Collaborator

@shimizust shimizust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

@shimizust shimizust merged commit 13e1bbe into linkedin:main Jan 7, 2026
3 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants