fix: use jnp.exp(log_probs) instead of softmax(log_probs) in compute_entropy_from_logits by kuishou68 · Pull Request #1387 · google/tunix

kuishou68 · 2026-04-10T01:01:04Z

Summary

Fixes a bug in compute_entropy_from_logits in tunix/rl/ppo/ppo_helpers.py.

Problem

Line 164 applies jax.nn.softmax(log_probs) to convert log-probabilities to probabilities. This is mathematically incorrect:

softmax(log_softmax(x)) ≠ softmax(x)
Only exp(log_softmax(x)) = softmax(x) (i.e., the true probability distribution)

Applying softmax to already log-normalized values produces a different distribution, leading to silently incorrect entropy values during PPO training.

Fix

Replace jax.nn.softmax(log_probs) with jnp.exp(log_probs) on line 164:

# Before (wrong):
probs = jax.nn.softmax(log_probs)

# After (correct):
probs = jnp.exp(log_probs)

Impact

The compute_entropy_from_logits function is called in ppo_learner.py to compute token_entropy. The bug causes incorrect entropy values, which can silently affect PPO training metrics and dynamics.

…entropy_from_logits (Closes google#1386)

fix: use jnp.exp(log_probs) instead of softmax(log_probs) in compute_…

ffb157d

…entropy_from_logits (Closes google#1386)

kuishou68 requested review from abheesht17, hgao327, jiangyangmu, lc5211, s-noghabi, sizhit2, tianshub and wang2yn84 as code owners April 10, 2026 01:01

kuishou68 mentioned this pull request Apr 10, 2026

[Bug] compute_entropy_from_logits uses softmax(log_probs) instead of exp(log_probs), producing incorrect entropy #1386

Open

github-actions bot assigned hgao327 Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use jnp.exp(log_probs) instead of softmax(log_probs) in compute_entropy_from_logits#1387

fix: use jnp.exp(log_probs) instead of softmax(log_probs) in compute_entropy_from_logits#1387
kuishou68 wants to merge 1 commit intogoogle:mainfrom
kuishou68:fix/issue-1386-entropy-softmax-vs-exp

kuishou68 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kuishou68 commented Apr 10, 2026

Summary

Problem

Fix

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants