CI: 05/22/25 upstream sync #432

rocm-repo-management-api-2 · 2025-05-22T06:02:16Z

Daily sync with upstream

PiperOrigin-RevId: 756850393

…mmutable inside `jax.Array` is immutable and `ShapeDtypeStruct` is a duck of `jax.Array` but immutability was never enforced. **If you are broken by this change, just update your code to use sds.update(...)** PiperOrigin-RevId: 756852248

These are currently thread-unsafe due to python/cpython#132817

PiperOrigin-RevId: 756874608

PiperOrigin-RevId: 756887335

PiperOrigin-RevId: 756887370

PiperOrigin-RevId: 756892123

…ent the unreduced rule. Currently that's only `add`. PiperOrigin-RevId: 756902404

PiperOrigin-RevId: 756914581

PiperOrigin-RevId: 756989842

* fix leaking of internal symbolic zeros in returned cotangents * fix a bug around symbolic zero output tangents

http://github.com/openxla/xla/commit/80924f3d144737d14758d8a92b236d90c8ec8cb9. PiperOrigin-RevId: 757132575

PiperOrigin-RevId: 757170420

PiperOrigin-RevId: 757200726

http://github.com/openxla/xla/commit/633c9abd097a2cf20884d29da51cc53b6e7144b5. PiperOrigin-RevId: 757401878

PiperOrigin-RevId: 757608204

…in TMEMRef PiperOrigin-RevId: 757628207

For recursive config definitions Bazel requires use of a single token notation `--config=value`

This is necessary to ensure that all SMEM reads issued from a current WG have completed before we schedule the copy (that acts as an SMEM write)! PiperOrigin-RevId: 757647993

http://github.com/openxla/xla/commit/6ad6ae3dafa9868708e54de10e3aeafb081a71f2. PiperOrigin-RevId: 757728274

Our current allocation scheme on GPU is unsafe in presence of multiple threads that might take diverging control paths. We work around this problem using our favorite trick and simply forbid this! With this change, `run_scoped(..., collective_axes="wg")` means that the same allocation will be returned in all programs that only differ in the `wg` axis. What's more, this call is a user promise that the allocation is a collective that will be executed by all threads along that axis. Only executing it on a subset is undefined behavior and in our current Mosaic GPU implementation might lead to deadlocks due to barriers. Note that nothing changes for single-threaded kernels, where run_scoped is always allowed. PiperOrigin-RevId: 757734362

PiperOrigin-RevId: 757735827

PTX docs are a bit confusing because the type is called e4m3, but [its description](https://docs.nvidia.com/cuda/parallel-thread-execution/#alternate-floating-point-data-formats) indicates that it is actually e4m3fn (no infs, limited NaNs). PiperOrigin-RevId: 757741649

This should be useful for kernels such as FlashAttention since row-wise reductions can be performed entirely without any communication with other threads. PiperOrigin-RevId: 757746207

PiperOrigin-RevId: 757755328

PiperOrigin-RevId: 757761831

…windowing

…ntended platforms Fixes: jax-ml#28594 Currently `lax.platform_dependent` allows specifying code that behaves differently when lowered on different platforms. However, this function operates in a confusing way, in that it will create a branch on the platform, but will lower all branches for the **current** lowering platforms. For example, in the following code: ``` lax.platform_dependent(x, cpu=for_cpu, tpu=for_tpu) ``` If we lower for CPU, we lower both `for_cpu` and `for_tpu` for CPU (!), but only the branch corresponding to `for_cpu` will actually run. This is a problem if, e.g., `for_tpu` does not have a lowering for CPU. We will get an error during lowering. Instead there should be no error during lowering, because that branch is not actually needed. We add a new test `test_platform_dependent_with_primitive_with_lowering_error` to demonstrate this. The solution implememented here is the Solution A from jax-ml#28594: we add a `branches_platform` param to the `cond` primitive, which is propagated by all transformations. This param is used only for the conditionals arising from `lax.platform_dependendet`. During lowering we drop the branches corresponding to the platforms that are not interesting.

PiperOrigin-RevId: 761534198

PiperOrigin-RevId: 761545482

PiperOrigin-RevId: 761552874

…sts depend on NVIDIA CUDA wheels hermetically. The flag is enabled by default. To disable the dependency, pass `add_pypi_cuda_wheel_deps=False` in the Bazel options. PiperOrigin-RevId: 761568590

…ypes Fix: jax-ml#28416 PiperOrigin-RevId: 761577943

PiperOrigin-RevId: 761578503

This will make it easier to track down unexpected path mismatches in the future. PiperOrigin-RevId: 761584888

…-12.9 PiperOrigin-RevId: 761587875

PiperOrigin-RevId: 761612390

…for unpacked types and native tiling on TPUv5 PiperOrigin-RevId: 761676578

Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, and leads to improved build and iteration times. This refactor required moving the definitions of a few private utilities from pjit and pxla, because these files are part of the larger jax build target. PiperOrigin-RevId: 761689391

PiperOrigin-RevId: 761690584

…ive tiling PiperOrigin-RevId: 761692972

…full explicit mode PiperOrigin-RevId: 761708753

Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, and leads to improved build and iteration times. This required moving some internal utilities out of dispatch.py, which is part of the main JAX build rule. I chose api_util.py because they seem to fit there. PiperOrigin-RevId: 761722054

Next steps: - non-tile aligned - Clean up fn and utilize it for general changeTiling PiperOrigin-RevId: 761731600

…fo__ guards after 0.6.1 release. PiperOrigin-RevId: 761737523

…normal` and other APIs implementing the `Initializer` protocol. Currently it takes `key, shape, dtype` and now we added an optional out_sharding parameter to it. PiperOrigin-RevId: 761742909

PiperOrigin-RevId: 761745409

PiperOrigin-RevId: 761758158

PiperOrigin-RevId: 761806312

emilyfertig and others added 30 commits May 9, 2025 11:49

Reverts 8137c37

701af06

PiperOrigin-RevId: 756850393

Disable profiler tests under Python 3.14 if multithreaded.

0d5771c

These are currently thread-unsafe due to python/cpython#132817

[Mosaic] Fix typo: FPToSI > SIToFP.

6e97afd

PiperOrigin-RevId: 756874608

Merge pull request jax-ml#28650 from olupton:pytest-importlib

a4edfac

PiperOrigin-RevId: 756887335

Merge pull request jax-ml#28656 from hawkinsp:profiler

d464565

PiperOrigin-RevId: 756887370

Merge pull request jax-ml#28626 from jakevdp:rm-div

7ba83d8

PiperOrigin-RevId: 756892123

Disallow unreduced inputs for all primitives except those that implem…

4aee381

…ent the unreduced rule. Currently that's only `add`. PiperOrigin-RevId: 756902404

JEP 28661: the __jax_array__ protocol

b95ba89

Fix typo in FusedAttentionTest

0aef447

PiperOrigin-RevId: 756914581

[Pallas] Allow more int casting tests.

35e2657

PiperOrigin-RevId: 756989842

[si_vjp] fix bugs around symbolic zeros

55a9de3

* fix leaking of internal symbolic zeros in returned cotangents * fix a bug around symbolic zero output tangents

Update XLA dependency to use revision

6ec18d4

http://github.com/openxla/xla/commit/80924f3d144737d14758d8a92b236d90c8ec8cb9. PiperOrigin-RevId: 757132575

Automated Code Change

2016e59

PiperOrigin-RevId: 757170420

Merge pull request jax-ml#28664 from mattjj:rahul-fix

ddfcf84

PiperOrigin-RevId: 757200726

Update XLA dependency to use revision

b4eb48c

http://github.com/openxla/xla/commit/633c9abd097a2cf20884d29da51cc53b6e7144b5. PiperOrigin-RevId: 757401878

[Mosaic GPU] Add support for 8-bit MMA on Blackwell

caf10df

PiperOrigin-RevId: 757608204

[Pallas:MGPU] Update one more lowering rule to the load/store rename …

863f762

…in TMEMRef PiperOrigin-RevId: 757628207

Fix debug rule in .bazelrc

f6bce25

For recursive config definitions Bazel requires use of a single token notation `--config=value`

[Mosaic GPU] Add an additional WG barrier before copy_gmem_to_smem

eb2fe97

This is necessary to ensure that all SMEM reads issued from a current WG have completed before we schedule the copy (that acts as an SMEM write)! PiperOrigin-RevId: 757647993

Update XLA dependency to use revision

0c51090

http://github.com/openxla/xla/commit/6ad6ae3dafa9868708e54de10e3aeafb081a71f2. PiperOrigin-RevId: 757728274

Call block_until_ready for testAutodiffCache

5999208

PiperOrigin-RevId: 757735827

[Mosaic GPU] Add support for TMEM loads/stores with the 32x32b shape

d99778c

This should be useful for kernels such as FlashAttention since row-wise reductions can be performed entirely without any communication with other threads. PiperOrigin-RevId: 757746207

[pallas:mosaic_gpu] Slightly generalized MosaicGridMapping

fc12df0

PiperOrigin-RevId: 757755328

Merge pull request jax-ml#28683 from Arech8:arech8/fix_bazelrc_debug

e2b7076

PiperOrigin-RevId: 757761831

Speed up scipy.signal.stft by using lax.dynamic_slice_in_dim for …

e65b317

…windowing

Revert pytest: use importlib mode by default

1be91c5

dfm and others added 28 commits May 21, 2025 11:15

Try using uv for installing packages on Read the Docs.

99a0e67

Skip oldest supported numpy presubmit on release branches

eb12203

PiperOrigin-RevId: 761534198

Merge pull request jax-ml#28887 from dfm:uv-on-rtds

609fb7f

PiperOrigin-RevId: 761545482

Skip generating source links for re-exported numpy functions in docs.

6d2dc34

Merge pull request jax-ml#28886 from dfm:np-doc-links

fb4279a

PiperOrigin-RevId: 761552874

Revert "jax-cuda12-plugin: require nvidia-cublas-cu12<12.9"

c892cda

Introduce the flag add_pypi_cuda_wheel_deps that controls if the te…

8e4f3b5

…sts depend on NVIDIA CUDA wheels hermetically. The flag is enabled by default. To disable the dependency, pass `add_pypi_cuda_wheel_deps=False` in the Bazel options. PiperOrigin-RevId: 761568590

Adjust triton dialect lowering rounding mode to allow upcasting fp8 t…

f227b13

…ypes Fix: jax-ml#28416 PiperOrigin-RevId: 761577943

[Mosaic] Add faster implementation for s8->bf16 and s4->bf16 on TPUv6+.

3179e5d

PiperOrigin-RevId: 761578503

Show exactly what the copy paths/patterns are when copying wheels.

06c323c

This will make it easier to track down unexpected path mismatches in the future. PiperOrigin-RevId: 761584888

Merge pull request jax-ml#28897 from olupton:revert-28513-skip-cublas…

f0cf935

…-12.9 PiperOrigin-RevId: 761587875

[ragged-paged-attn] Use select for initialization in flash attention.

08c2a36

PiperOrigin-RevId: 761612390

Merge branch 'release/0.6.1'

880d31d

[Mosaic:TPU][Relayout] Support minor to 2nd minor implicit dimension …

2e070ca

…for unpacked types and native tiling on TPUv5 PiperOrigin-RevId: 761676578

Update version numbers after 0.6.1 release.

7e9c7e6

Merge pull request jax-ml#28904 from hawkinsp:postrelease

ba8120d

PiperOrigin-RevId: 761690584

[Mosaic:TPU][Relayout] Remove minor implicit dimension for 32-bit nat…

efc70a0

…ive tiling PiperOrigin-RevId: 761692972

Allow eval_shape to propagate shardings if the aval has shardings in …

61a9bd2

…full explicit mode PiperOrigin-RevId: 761708753

Expose tree prefix broadcasting as a public API in tree utils.

f9a1475

Add initial support aligned jnp.swapaxes on major/minor dims

62c46ff

Next steps: - non-tile aligned - Clean up fn and utilize it for general changeTiling PiperOrigin-RevId: 761731600

remove jaxlib_extension_version, ifrt_version and jaxlib.__version_in…

0169f32

…fo__ guards after 0.6.1 release. PiperOrigin-RevId: 761737523

Add out_sharding to the function returned by `jax.nn.initializers.he_…

29e6647

…normal` and other APIs implementing the `Initializer` protocol. Currently it takes `key, shape, dtype` and now we added an optional out_sharding parameter to it. PiperOrigin-RevId: 761742909

Add visibility hook for //jax:stages

c19bf66

PiperOrigin-RevId: 761745409

Merge pull request jax-ml#28909 from levskaya:tree_broadcast

c993491

PiperOrigin-RevId: 761758158

fix custom_vjp optimize_remat=True with collectives

bddb877

Merge pull request jax-ml#28923 from mattjj:rahul-fix3

2531d31

PiperOrigin-RevId: 761806312

rocm-repo-management-api-2 bot requested a review from a team as a code owner May 22, 2025 06:02

rocm-repo-management-api-2 bot enabled auto-merge (rebase) May 22, 2025 06:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI: 05/22/25 upstream sync #432

CI: 05/22/25 upstream sync #432

Uh oh!

rocm-repo-management-api-2 bot commented May 22, 2025

Uh oh!

Uh oh!

CI: 05/22/25 upstream sync #432

Are you sure you want to change the base?

CI: 05/22/25 upstream sync #432

Uh oh!

Conversation

rocm-repo-management-api-2 bot commented May 22, 2025

Uh oh!

Uh oh!