Update JVP rule for abs to fix behavior for complex infinite inputs #26086

dfm · 2025-01-24T18:06:28Z

As discussed in #25681, the gradients of abs don't have the correct behavior at complex infinities. As discussed in that issue, and combined with the notation from here, the JVP rule can be re-written as:

f(z) = abs(z) = abs(x + j y)
t = atan2(y, x)
df * (dx + j dx) = Re[(cos(t) - j sin(t)) * (dx + j dx)]

(Note that this isn't quite the same result as what @pearu reports in #25681, but it's what I found when working it through myself, and it has the correct behavior.)

This is straightforward to implement, and at the cost of a performance hit (3 new trig functions), we get stable JVPs throughout the complex plane. I don't expect this is a performance critical computation in many applications, so I think it's probably worth updating the implementation here, but I'd love to hear otherwise if people disagree!

I should note that this also changes the gradient at complex zero (as discussed in #10515 (comment)) to give grad(abs)(0+0j) = 1+0j. I think this is sensible behavior, but there's some chance that this breaks some downstream behavior. (@mattjj may want to comment, having thought about this before!)

pearu

Note that the parameters to the test function are not being used. Otherwise, looks good to me! Thanks, @dfm !

"Note that this isn't quite the same result as what @pearu reports in #25681, but it's what I found when working it through myself, and it has the correct behavior."

I believe we still have the same results. The difference is only in atan2(y, x) (this PR) and atan2(x, y) (unconventional usage, my comment in the issue) that leads to differences of final formula but results ought to be the same.

pearu · 2025-01-24T18:57:20Z

tests/lax_numpy_test.py

+    x = jax.lax.complex(jnp.inf, 0.0).astype(dtype)
+    expected = jax.lax.complex(1.0, 0.0).astype(dtype)


Should this read

Suggested change

x = jax.lax.complex(jnp.inf, 0.0).astype(dtype)

expected = jax.lax.complex(1.0, 0.0).astype(dtype)

x = input_parts

expected = grad_parts

or similar?

Doh! Good catch. Thank you!

jakevdp · 2025-01-24T19:55:22Z

I worry a bit about the performance hit here. I wonder if we could replace cos(atan2(y, x)) with x / sqrt(x ** 2 + y ** 2) and sin(atan2(y, x)) with y / sqrt(x ** 2 + y ** 2) and use custom_jvp or something similar to handle the (0, 0) case?

jakevdp · 2025-01-24T19:57:04Z

I also wonder if using multiple sinusoidal evaluations to essentially recover abs(x) at x != 0 would cause accuracy issues in some domains.

dfm · 2025-01-24T20:03:09Z

I wonder if we could replace cos(atan2(y, x)) with x / sqrt(x ** 2 + y ** 2) and sin(atan2(y, x)) with y / sqrt(x ** 2 + y ** 2)

Sorry - I should have been clearer! What you're suggesting here is actually what the existing implementation does, and it causes the problems seen in #25681 because at x+iy = inf + i0 you'll get x / sqrt(x ** 2 + y ** 2) = inf / inf, even though the JVP is well defined. The behavior at x+iy = 0 was not the target of this PR, just a side effect!

I couldn't think of any other ways to get numerically stable gradients in the limits without this change, unless we special cased for infinite inputs. But, in that case, I think we'd need to add quite a few cases for all the possible permutations...

jakevdp · 2025-01-24T20:14:47Z

I see – that makes sense.

I still worry about potential accuracy issues, especially since we're computing in float32 most of the time, and trig rounding errors could compound.

What if we use a lax.select that chooses between the two approaches depending on the domain of the inputs?

pearu · 2025-01-24T20:30:22Z

I think the select alternative to using atan2/sin/cos could be reasonable: when one of real(z) or imag(z) is infinity, the result depends only on the signs of the real and imaginary parts and the corresponding values could be tabulated. There will be 8 values (that correspond to 8 infinity cases) plus one for the finite case for both real and imaginary value of the result. So, in total, there will be 16 select expressions. Most likely, some of these expressions could be combined using symmetries.
Benchmarks would tell which approach is going to be better performance-wise.

jakevdp · 2025-01-24T21:16:01Z

Just a quick accuracy check:

import numpy as np
import jax.numpy as jnp

rng = np.random.default_rng(0)

x = jnp.array(rng.normal(0, 10, 10000), dtype='float32')
y = jnp.array(rng.normal(0, 10, 10000), dtype='float32')

val_trig = jnp.cos(jnp.arctan2(y, x))
val_quad = x / jnp.hypot(x, y)

x = np.array(x, dtype='float64')
y = np.array(y, dtype='float64')
val_true = np.cos(np.arctan2(y, x))

print("trig approach:      max rtol=", max(abs(val_trig - val_true) / val_true))
print("quadratic approach: max rtol=", max(abs(val_quad - val_true) / val_true))

trig approach:      max rtol= 0.0002238464
quadratic approach: max rtol= 1.9903023e-07

The relative accuracy degrades from 2E-7 to 2E-4. I think that's bad enough that we'll want to avoid using the trig approach alone across the whole domain.

dfm · 2025-01-24T21:40:16Z

Good point about the accuracy, @jakevdp! It's worth noting that this degradation happens only close to the origin, so one option would be to switch when either the real or imag part of the input passes some minimum threshold. But, I'll also take a look at explicitly special casing the infinities. @pearu's point about symmetries is a good one. I'll give this another go next week. Thanks both!!

jakevdp · 2025-01-24T22:14:17Z

Another question worth thinking about: how does this affect the second derivative at 0 and infinity? Does the trig version result in correct higher-order derivatives at these values?

dfm self-assigned this Jan 24, 2025

dfm requested review from pearu and jakevdp January 24, 2025 18:06

dfm added the pull ready Ready for copybara import and testing label Jan 24, 2025

pearu suggested changes Jan 24, 2025

View reviewed changes

Update JVP rule for abs.

3bb69bd

dfm force-pushed the abs-complex-grad branch from 786c82e to 3bb69bd Compare January 24, 2025 19:32

dfm changed the title ~~Update JVP rule for abs~~ Update JVP rule for abs to fix behavior for complex infinite inputs Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update JVP rule for abs to fix behavior for complex infinite inputs #26086

Update JVP rule for abs to fix behavior for complex infinite inputs #26086

dfm commented Jan 24, 2025

pearu left a comment •

edited

Loading

pearu Jan 24, 2025

dfm Jan 24, 2025

jakevdp commented Jan 24, 2025

jakevdp commented Jan 24, 2025

dfm commented Jan 24, 2025

jakevdp commented Jan 24, 2025

pearu commented Jan 24, 2025 •

edited

Loading

jakevdp commented Jan 24, 2025

dfm commented Jan 24, 2025

jakevdp commented Jan 24, 2025 •

edited

Loading

		x = jax.lax.complex(jnp.inf, 0.0).astype(dtype)
		expected = jax.lax.complex(1.0, 0.0).astype(dtype)

Update JVP rule for abs to fix behavior for complex infinite inputs #26086

Are you sure you want to change the base?

Update JVP rule for abs to fix behavior for complex infinite inputs #26086

Conversation

dfm commented Jan 24, 2025

pearu left a comment • edited Loading

Choose a reason for hiding this comment

pearu Jan 24, 2025

Choose a reason for hiding this comment

dfm Jan 24, 2025

Choose a reason for hiding this comment

jakevdp commented Jan 24, 2025

jakevdp commented Jan 24, 2025

dfm commented Jan 24, 2025

jakevdp commented Jan 24, 2025

pearu commented Jan 24, 2025 • edited Loading

jakevdp commented Jan 24, 2025

dfm commented Jan 24, 2025

jakevdp commented Jan 24, 2025 • edited Loading

pearu left a comment •

edited

Loading

pearu commented Jan 24, 2025 •

edited

Loading

jakevdp commented Jan 24, 2025 •

edited

Loading