tag:github.com,2008:https://github.com/pytorch/FBGEMM/releases Release notes from FBGEMM 2025-06-27T19:33:52Z tag:github.com,2008:Repository/150154628/v1.3.0-rc1 2025-06-27T19:33:52Z v1.3.0-rc1 <p>FBGEMM v1.3.0-rc1</p> q10 tag:github.com,2008:Repository/150154628/v1.2.0 2025-05-19T01:47:06Z FBGEMM v1.2.0 Release Notes <h1>Highlights</h1> <h3>TBE GPU</h3> <ul> <li>Added support for <code>int64_t</code> table indices and offsets in TBE inference</li> <li>Improved TBE benchmark utilities with the introduction of the Embeddings Estimator and Generator (EEG)</li> </ul> <h3>TBE CPU</h3> <ul> <li>Added Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf operator</li> <li>Make FloatToFloat16 conversion 75x faster using SVE2 instructions</li> <li>Added FP32 GEMM kernels</li> </ul> <h3>TBE SSD</h3> <ul> <li>Fix OOM issues during init</li> <li>Improvements to L1 and L2 flush</li> </ul> <h3>Gen AI Ops</h3> <ul> <li>GenAI ops are now separately packaged into FBGEMM GenAI package for easier build and installation</li> <li>Various FP8 grouped GEMM optimizations</li> <li>BF16I4 preshuffled grouped GEMM</li> <li>BF16 stacked grouped GEMM</li> <li>F8I4 grouped GEMM optimizations</li> <li>Added nccl_alltoall function</li> </ul> <h3>ROCm</h3> <ul> <li>Added preliminary ROCm OSS build support for GenAI ops</li> </ul> <h3>Better Engineering</h3> <ul> <li>Added build support for CUDA 12.8</li> <li>Introduced a set of utilities to harden CUDA kernel launches against a set of runtime errors</li> </ul> <h1>Software Requirements</h1> <p>FBGEMM_GPU v1.2.0 has been tested and known to work on the following setups:</p> <ul> <li><strong>PyTorch</strong>: v2.7</li> <li><strong>CUDA</strong>: v11.8, 12.6, 12.8</li> <li><strong>Python</strong>: v3.9, 3.10, 3.11, 3.12, 3.13</li> </ul> <p>It is recommended to prepare an isolated environment for installing and running FBGEMM_GPU (instructions <a href="https://pytorch.org/FBGEMM/fbgemm_gpu/development/InstallationInstructions.html" rel="nofollow">here</a>) and FBGEMM-GenAI (instructions <a href="https://pytorch.org/FBGEMM/fbgemm_genai/development/InstallationInstructions.html" rel="nofollow">here</a>).</p> <h2>Availability</h2> <p>FBGEMM_GPU and FBGEMM GenAI can be fetched directly from PyPI:</p> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="# FBGEMM_GPU - CUDA (only the CUDA 12.6 variant is available) pip install fbgemm-gpu==1.2.0 # FBGEMM_GPU - CPU pip install fbgemm-gpu-cpu==1.2.0 # FBGEMM GenAI pip install fbgemm-gpu-genai==1.2.0"><pre><span class="pl-c"><span class="pl-c">#</span> FBGEMM_GPU - CUDA (only the CUDA 12.6 variant is available)</span> pip install fbgemm-gpu==1.2.0 <span class="pl-c"><span class="pl-c">#</span> FBGEMM_GPU - CPU</span> pip install fbgemm-gpu-cpu==1.2.0 <span class="pl-c"><span class="pl-c">#</span> FBGEMM GenAI</span> pip install fbgemm-gpu-genai==1.2.0</pre></div> <p>Alternatively, it can be fetched from PyTorch PIP:</p> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="# FBGEMM_GPU - CUDA pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cu118/ pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cu126/ pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cu128/ # FBGEMM_GPU - CPU pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cpu # FBGEMM GenAI pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cpu"><pre><span class="pl-c"><span class="pl-c">#</span> FBGEMM_GPU - CUDA</span> pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cu118/ pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cu126/ pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cu128/ <span class="pl-c"><span class="pl-c">#</span> FBGEMM_GPU - CPU</span> pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cpu <span class="pl-c"><span class="pl-c">#</span> FBGEMM GenAI </span> pip install fbgemm-gpu==1.2.0 --index-url https://download.pytorch.org/whl/cpu</pre></div> <h1>Changes</h1> <h2>CPU</h2> <h3>GEMM</h3> <ul> <li>[Improvement] Improve Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon by 5%-15% (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2938693289" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3860" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3860/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3860">#3860</a>)</li> <li>[New] Use enum to select floating point format in FbgemmEmbedding APIs (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2929175654" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3842" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3842/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3842">#3842</a>)</li> <li>[New] Add generic IEEE754 truncation code (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2921345830" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3820" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3820/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3820">#3820</a>)</li> <li>[New] Enable KleidiAI for FP32 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2921106783" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3818" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3818/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3818">#3818</a>)</li> <li>[Improvement] Move float conversion functions from Types.h into new FloatConversion.h (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2892662954" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3760" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3760/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3760">#3760</a>)</li> <li>[Fix] Use kleidiAI on static builds (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2915338679" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3806" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3806/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3806">#3806</a>)</li> <li>[Fix] Fix KleidiAI FP16 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2896314038" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3769" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3769/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3769">#3769</a>)</li> <li>[Improvement] Pull ARM's matrix transpose PR (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2831403017" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3660" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3660/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3660">#3660</a>)</li> <li>[New] Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2861437946" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3707" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3707/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3707">#3707</a>)</li> <li>[Improvement] avoid extra copy in PackedGemmMatrixB constructor (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2852188868" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3691" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3691/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3691">#3691</a>)</li> <li>[Improvement] Remove FENV pragma (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2817361725" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3629" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3629/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3629">#3629</a>)</li> <li>[Improvement] Make FloatToFloat16 conversion 75x faster using SVE2 instructions (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2816549280" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3626" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3626/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3626">#3626</a>)</li> <li>[New] add a new constructor to PackedGemmMatrixB (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2803658239" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3598" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3598/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3598">#3598</a>)</li> <li>[New] Move FP32 kernels to OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2785628618" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3568" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3568/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3568">#3568</a>)</li> </ul> <h2>GenAI</h2> <h3>GenAI Ops</h3> <ul> <li>[Improvement] Performance Optimization: Improved TileShape Configuration for Large Llama Shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2907112708" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3790" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3790/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3790">#3790</a>) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2978577403" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3942" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3942/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3942">#3942</a>)</li> <li>[New] Add harness for comms benchmark (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2974607109" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3936" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3936/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3936">#3936</a>)</li> <li>[Improvement] Refactoring of NoPE (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2926654744" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3840" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3840/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3840">#3840</a>)</li> <li>[Improvement] support fp16 dtypes for input weight and bias (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2973370309" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3931" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3931/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3931">#3931</a>)</li> <li>[Fix] fix fp8 kv cache dequantize kernels (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2957345041" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3896" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3896/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3896">#3896</a>)</li> <li>[Fix] fix fp8 kv cache dequantize kernels (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2957345041" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3896" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3896/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3896">#3896</a>)</li> <li>[Improvement] scatter_add 0 size support (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2939027468" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3861" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3861/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3861">#3861</a>)</li> <li>[Improvement] Retuned CK GMM fp8/bf16 with perf fixes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2932937437" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3851" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3851/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3851">#3851</a>)</li> <li>[Improvement] Enable groupwise scales for F8I4 Grouped Gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2950296490" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3884" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3884/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3884">#3884</a>)</li> <li>[Fix] Fix empty input view. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2947762578" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3880" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3880/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3880">#3880</a>)</li> <li>[New] FP8 Rowwise Dequant Kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2944032610" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3873" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3873/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3873">#3873</a>)</li> <li>[New] <code>torch.ops.fbgemm.gather_scale_dense_tokens</code> for oss. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2936253249" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3855" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3855/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3855">#3855</a>)</li> <li>[Improvement] Replace rms_norm as norm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2929086748" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3841" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3841/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3841">#3841</a>)</li> <li>[Improvement] Move DeepGemm scale transpose to quantize (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2925972337" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3834" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3834/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3834">#3834</a>)</li> <li>[Improvement] follow up to reflect rowwise scale inputs for x, w in <code>quantize_ops</code> scripts (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2926529305" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3839" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3839/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3839">#3839</a>)</li> <li>[New] add rowwise scaling support (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2921535147" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3822" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3822/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3822">#3822</a>)</li> <li>[Improvement] update to tune for small <code>m</code>s and quantized gemv (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2862067115" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3712" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3712/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3712">#3712</a>)</li> <li>[New] Add Preshuffled FP8 x INT4 Grouped Gemm Kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2911427694" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3800" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3800/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3800">#3800</a>)</li> <li>[New] FBGEMM Add Columnwise Weight Scaling to F8I4 GEMM (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2895881556" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3766" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3766/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3766">#3766</a>)</li> <li>[Improvement] update the sorting kernel for bf16 ck fmoe kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2919320917" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3817" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3817/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3817">#3817</a>)</li> <li>[Fix] fix volatile synchronization with acquire/relax (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2874364505" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3728" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3728/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3728">#3728</a>)</li> <li>[Improvement] Force determinism by unswizzle (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2874364460" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3727" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3727/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3727">#3727</a>)</li> <li>[New] add fp8 kv nope (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2904925321" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3786" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3786/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3786">#3786</a>)</li> <li>[Improvement] move common op to vector utils (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2892642543" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3759" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3759/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3759">#3759</a>)</li> <li>[Improvement] Gather/Scatter. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2885284049" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3743" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3743/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3743">#3743</a>)</li> <li>[Improvement] reduce scatter supports last dim (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2874250427" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3726" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3726/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3726">#3726</a>)</li> <li>[Improvement] Add custom reduce scatter to llama_comms (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2877482946" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3730" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3730/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3730">#3730</a>)</li> <li>[New] Adds shapes information to enable torch.compile. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2872804914" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3724" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3724/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3724">#3724</a>)</li> <li>[Improvement] avoid propagation of NaN (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2870464111" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3723" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3723/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3723">#3723</a>)</li> <li>[New] <code>torch.ops.fbgemm.scatter_add_along_first_dim</code>.. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2867843053" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3720" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3720/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3720">#3720</a>)</li> <li>[New] <code>torch.ops.fbgemm.gather_along_first_dim</code>. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2867513775" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3719" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3719/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3719">#3719</a>)</li> <li>[New] Paged Attention Support (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2855240427" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3698" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3698/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3698">#3698</a>)</li> <li>[New] custom reduce scatter (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2851726521" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3686" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3686/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3686">#3686</a>)</li> <li>[Fix] Recover custom collective test (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2851726551" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3687" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3687/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3687">#3687</a>)</li> <li>[Improvement] update sweep_utils.py to test more precision gemv kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2844464868" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3678" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3678/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3678">#3678</a>)</li> <li>[New] add fp8fp8 fast_gemv_quantized (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2844415149" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3677" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3677/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3677">#3677</a>)</li> <li>[New] add mixed precision fp8 fast_gemv_quantized kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2844002944" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3675" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3675/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3675">#3675</a>)</li> <li>[Improvement] adjust interface (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2839397442" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3669" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3669/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3669">#3669</a>)</li> <li>[Improvement] CK MoE: cherry-pick <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1744140414" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/1808" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/1808/hovercard" href="https://github.com/pytorch/FBGEMM/pull/1808">#1808</a> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2807663191" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3609" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3609/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3609">#3609</a>)</li> <li>[Improvement] fix llm shapes in quantize bench and add ldm shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2808201555" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3611" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3611/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3611">#3611</a>)</li> <li>[Improvement] Return if no data to allreduce (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2796003114" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3586" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3586/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3586">#3586</a>)</li> <li>[Improvement] llm decode shapes fp8 rowwise gemm tuning (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2784718955" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3565" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3565/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3565">#3565</a>)</li> <li>[Improvement] Make zero_start_index_M optional for dynamic BF16 Grouped Gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2771722601" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3553" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3553/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3553">#3553</a>)</li> <li>[New] Add nccl_alltoall function (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2771516487" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3551" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3551/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3551">#3551</a>)</li> <li>[New] Add fused_moe kernel to ck_extension (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2749557439" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3518" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3518/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3518">#3518</a>)</li> </ul> <h3>GEMM</h3> <ul> <li>[Improvement] Update cutlass verison to 3.8V2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2898679786" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3772" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3772/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3772">#3772</a>)</li> <li>[Improvement] Update Cutlass to V3.8-2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2895887632" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3767" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3767/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3767">#3767</a>)</li> <li>[Improvement] fp8_gemm (non_persistent): adding optimal configs for 8k &amp; 16k shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2895677362" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3764" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3764/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3764">#3764</a>)</li> <li>[New] new tuning for fp8 rowwise (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2892166499" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3756" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3756/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3756">#3756</a>)</li> <li>[Improvement] Add DeepGEMM blockwise GEMM in quantize bench (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2885663533" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3746" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3746/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3746">#3746</a>)</li> <li>[Improvement] Enable DeepGEMM in quantize bench (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2885307394" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3745" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3745/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3745">#3745</a>)</li> <li>[Improvement] reduce overhead for f8f8bf16_rowwise_grouped_dynamic on amd (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2884985948" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3742" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3742/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3742">#3742</a>)</li> <li>[Improvement] Performance Optimization: Optimized TileShape Configuration for f8 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2813215494" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3617" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3617/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3617">#3617</a>) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2879652101" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3735" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3735/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3735">#3735</a>)</li> <li>[Improvement] Performance Optimization: Optimized TileShape Configuration for bf16 and Mixed Formats (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2800373940" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3591" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3591/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3591">#3591</a>) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2861701511" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3710" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3710/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3710">#3710</a>)</li> <li>[Improvement] adding an option to skip zeroing output tensor for f8f8bf16_rowwise_grouped_dynamic (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2849736047" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3685" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3685/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3685">#3685</a>)</li> <li>[Improvement] Update CK (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2859479109" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3701" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3701/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3701">#3701</a>)</li> <li>[Fix] Fix CUDA kernel index data type in deeplearning/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu +10 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2929858568" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3844" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3844/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3844">#3844</a>)</li> <li>[New] Make F8I4 grouped GEMM process M_sizes with INT32 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2933271077" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3853" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3853/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3853">#3853</a>)</li> <li>[Improvement] Skip empty groups in FP8 Stacked Gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2939048886" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3862" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3862/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3862">#3862</a>)</li> <li>[New] Enable preshuffled mixed dtype Cutlass Gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2869920393" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3722" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3722/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3722">#3722</a>)</li> <li>[Improvement] [CUTLASS] Minor Cutlass change to fix CI (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2901455219" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3779" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3779/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3779">#3779</a>)</li> <li>[Improvement] Clean up cutlass FP8 Grouped Gemm Kernel Setup (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2939135243" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3864" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3864/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3864">#3864</a>)</li> <li>[New] Modernize bf16 cutlass grouped gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2953757356" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3889" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3889/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3889">#3889</a>)</li> <li>[Improvement] [CUTLASS] Include new cutlass support for groupwise mixed dtype grouped gemm. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2950541695" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3885" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3885/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3885">#3885</a>)</li> <li>[New] Add DEEPGEMM Masked API. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2983575996" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3949" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3949/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3949">#3949</a>)</li> <li>[Improvement] Use Int64 Indexing in Grouped Gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2973055220" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3930" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3930/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3930">#3930</a>)</li> <li>[Improvement] Add correctness testing for shuffled mixed dtype GEMMs. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2973513644" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3932" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3932/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3932">#3932</a>)</li> <li>[New] BF16I4 Preshuffled Grouped Gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2967482102" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3917" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3917/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3917">#3917</a>)</li> <li>[New] Preshuffled BF16I4 Gemm Kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2964868029" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3913" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3913/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3913">#3913</a>)</li> <li>[New] Enable rowwise scaling for DeepGemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2944218323" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3874" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3874/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3874">#3874</a>)</li> <li>[New] bf16 stacked group gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2951176588" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3888" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3888/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3888">#3888</a>)</li> <li>[New] F8I4 Grouped Gemm Optimization for Sparse M (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2933435844" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3854" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3854/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3854">#3854</a>)</li> </ul> <h3>FP8</h3> <ul> <li>[Fix] FBGEMM fp8 ck GEMM fix for irregular GEMM shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2955075194" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3894" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3894/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3894">#3894</a>)</li> <li>[Fix] fix stacked version fp8 rowwise group gemm registration in quantize_bench (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2960909843" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3902" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3902/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3902">#3902</a>)</li> <li>[Fix] A hotfix for FBGEMM fp8 rowwise with irregular gemm sizes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2948406173" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3883" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3883/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3883">#3883</a>)</li> <li>[Improvement] Transpose FP8 GEMM inputs for better tuning (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2939145810" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3866" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3866/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3866">#3866</a>)</li> <li>[New] Enable FP8 Triton dequantized block-wise kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2905867035" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3788" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3788/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3788">#3788</a>)</li> <li>[Improvement] Refactor stacked version of FP8 Grouped Gemm for reduced overhead (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2858745202" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3699" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3699/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3699">#3699</a>)</li> <li>[Improvement] changing config for fp8 gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2839063279" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3668" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3668/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3668">#3668</a>)</li> <li>[Improvement] Add option to disable fast_accumulation for fp8 gemm. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2864093934" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3714" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3714/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3714">#3714</a>)</li> <li>[New] Add cublas FP8 tensorwise GEMM in fbgemm quantize bench (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2852697872" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3693" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3693/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3693">#3693</a>)</li> <li>[Improvement] write_k_back for fp8 ROPE (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2846552976" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3679" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3679/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3679">#3679</a>)</li> <li>[Improvement] Moves utility functions into a standalone file. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2841429987" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3671" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3671/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3671">#3671</a>)</li> <li>[Fix] Fix f8f8bf16_lite quantize op input in <code>quantize_and_compute</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2838706375" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3667" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3667/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3667">#3667</a>)</li> <li>[Improvement] Optimize zero fill (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2836811629" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3666" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3666/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3666">#3666</a>)</li> <li>[Improvement] FP8 Grouped Gemm Optimization (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2828945416" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3655" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3655/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3655">#3655</a>)</li> <li>[New] Add sweep_utils.py script to tune heuristics (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2829061572" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3656" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3656/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3656">#3656</a>)</li> <li>[Improvement] loose unit test <code>atol</code> <code>rtol</code> tolerance to eliminate ut flakiness (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2834110951" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3664" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3664/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3664">#3664</a>)</li> <li>[New] Port oss f16_fast_gemv into fbcode (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2808135597" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3610" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3610/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3610">#3610</a>)</li> <li>[New] fp8 rowwise regular gemm tuning for llm new shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2828496302" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3654" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3654/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3654">#3654</a>)</li> <li>[Improvement] k_norm in rope for fp8 kv cache (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2818919807" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3633" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3633/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3633">#3633</a>)</li> <li>[Improvement] Fix zero_start_index_M argument for triton rowwise quantize (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2819707316" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3639" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3639/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3639">#3639</a>)</li> <li>[Fix] Fix handling of dynamic FP8 grouped gemm on Nvidia (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2811832504" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3616" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3616/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3616">#3616</a>)</li> <li>[Improvement] Improve FP8 grouped GEMM perf via tileshape and cooperative (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2825937922" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3653" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3653/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3653">#3653</a>)</li> <li>[Improvement] Refactor FP8 grouped GEMM with dynamic and static versions (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2780802212" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3561" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3561/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3561">#3561</a>)</li> <li>[New] Support FP8 grouped GEMM with rowwise scailing (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2780796265" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3560" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3560/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3560">#3560</a>)</li> <li>[Fix] [CUTLASS] Use custom copy of cutlass to enable FP8 Grouped Gemm. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2824191047" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3649" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3649/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3649">#3649</a>)</li> <li>[Fix] kv_dq zero initialization to avoid NaNs from FA3 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2818919759" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3632" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3632/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3632">#3632</a>)</li> <li>[Improvement] amd fp8 rowwise batched gemm tuning (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2816369945" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3624" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3624/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3624">#3624</a>)</li> <li>[Improvement] Improve handling for FP8 grouped gemm without zero_start_index_M (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2811763114" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3615" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3615/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3615">#3615</a>)</li> <li>[New] amd fp8 rowwise gemm prefill shape tuning (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2807594389" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3607" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3607/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3607">#3607</a>)</li> <li>[New] Enable fast FP8 GEMM for memory bound (resubmit) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2807635363" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3608" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3608/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3608">#3608</a>)</li> <li>[Improvement] Make zero_start_index_M optional for dynamic FP8 grouped gemm on AMD (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2805548255" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3604" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3604/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3604">#3604</a>)</li> <li>[Improvement] Enable fast FP8 GEMM for memory bound (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2791454842" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3577" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3577/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3577">#3577</a>)</li> <li>[Improvement] more fp8 tuning for decode and not need to pad (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2791069520" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3576" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3576/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3576">#3576</a>)</li> <li>[Improvement] Enable fast FP8 GEMM for memory bound (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2791454842" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3577" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3577/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3577">#3577</a>)</li> </ul> <h3>Triton</h3> <ul> <li>[Improvement] Uses <code>FastAccum=True</code> by default for Triton GroupedGEMM. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2967600987" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3919" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3919/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3919">#3919</a>)</li> <li>[Improvement] Handle 0 inputs for gmm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2960836624" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3901" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3901/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3901">#3901</a>)</li> <li>[New] Triton GroupedGEMM. WS. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2964798029" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3912" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3912/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3912">#3912</a>)</li> <li>[Improvement] No recompilation caused by varying sequence lengths. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2961538793" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3903" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3903/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3903">#3903</a>)</li> <li>[Improvement] Enable bufferops for non-persistent fp8 rowwise GEMM (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2957491136" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3898" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3898/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3898">#3898</a>)</li> <li>[Improvement] Makes <code>use_fast_accum</code> configurable. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2923628857" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3829" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3829/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3829">#3829</a>)</li> <li>[Fix] Fix triton group gemm for tp4 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2895414058" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3762" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3762/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3762">#3762</a>)</li> <li>[Improvement] Reduce tuning. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2888239459" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3754" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3754/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3754">#3754</a>)</li> <li>[Improvement] [fbgemm_gpu] Upgrade Triton to latest (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2879885671" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3736" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3736/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3736">#3736</a>)</li> <li>[New] GroupedGEMM for AMD. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2875823927" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3729" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3729/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3729">#3729</a>)</li> <li>[Improvement] GroupedGEMM interface takes <code>m_sizes</code> instead of <code>m_offsets</code>. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2854884214" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3696" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3696/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3696">#3696</a>)</li> <li>[Fix] Numerical Fix. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2851742255" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3688" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3688/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3688">#3688</a>)</li> <li>[New] Adds Triton based GroupedGEMM implementation. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2843825065" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3674" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3674/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3674">#3674</a>)</li> <li>[Improvement] Add optional zero_start_index_M argument to triton fp8 rowwise quantization (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2816837430" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3628" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3628/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3628">#3628</a>)</li> <li>[Improvement] Make the scale match the shape of quantized value with N-D tensors (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2674508149" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3396" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3396/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3396">#3396</a>)</li> </ul> <h2>TBE</h2> <h3>TBE GPU</h3> <ul> <li>[Improvement] Fix flaky TBE unit tests (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2978171080" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3938" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3938/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3938">#3938</a>)</li> <li>[Fix] Fix get_infos_metadata meta dispatch (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2980807572" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3946" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3946/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3946">#3946</a>)</li> <li>[Improvement] Change set_learning_rate_tensor (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2978771158" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3945" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3945/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3945">#3945</a>)</li> <li>[Improvement] Cleanups to <code>StochasticRoundingRNGState</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2967989863" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3922" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3922/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3922">#3922</a>)</li> <li>[New] Unifying TBE API using List (Frontend) - reland (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2921431206" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3821" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3821/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3821">#3821</a>)</li> <li>[Improvement] Add tests for bounds_check_indices v2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2967636335" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3920" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3920/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3920">#3920</a>)</li> <li>[Improvement] Use bounds_check_indices v2 on ROCm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2967404221" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3916" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3916/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3916">#3916</a>)</li> <li>[Fix] Partial revert D70855331 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2970291373" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3925" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3925/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3925">#3925</a>)</li> <li>[Fix] Add a workaround for stochastic rounding for AMD GPUs (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2962386294" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3908" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3908/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3908">#3908</a>)</li> <li>[New] AdagradW (fbgemm frontend) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2930492039" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3850" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3850/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3850">#3850</a>)</li> <li>[New] AdagradW (fbgemm backend) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2921844970" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3827" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3827/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3827">#3827</a>)</li> <li>[Fix] Fix IMA in TBE grad indices kernel for int32 indices (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2945673284" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3877" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3877/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3877">#3877</a>)</li> <li>[Improvement] Use PackedAccessor64 for index_remappings in pruned_array_lookup (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2941558866" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3870" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3870/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3870">#3870</a>)</li> <li>[Improvement] Add overflow_safe_int_t for addressing the int overflow problem (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2945176600" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3875" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3875/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3875">#3875</a>)</li> <li>[Improvement] Add torch.jit.script to unit tests (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2939534035" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3869" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3869/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3869">#3869</a>)</li> <li>[Improvement] Replace LR access with wrapper (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2930269369" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3849" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3849/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3849">#3849</a>)</li> <li>[Fix] Fix CUDA kernel index data type in deeplearning/fbgemm/fbgemm_gpu/bench/verify_fp16_stochastic_benchmark.cu +10 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2929864384" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3845" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3845/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3845">#3845</a>)</li> <li>[Improvement] Allow FBGEMM_TBE_BOUNDS_CHECK_MODE to take effect when using mode 4,5,6 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2926433885" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3838" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3838/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3838">#3838</a>)</li> <li>[Improvement] replace device param with bounds_check_warning of inputs_to_device function (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2925044950" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3831" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3831/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3831">#3831</a>)</li> <li>[Improvement] Packed bag parameters tuning (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2915281470" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3805" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3805/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3805">#3805</a>)</li> <li>[Improvement] Symintify max_B and max_D (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2915811427" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3807" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3807/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3807">#3807</a>)</li> <li>[Improvement] make lazy init tunable (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2918520917" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3811" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3811/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3811">#3811</a>)</li> <li>[Improvement] Log feature gate statuses in TBE init (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2908149259" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3792" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3792/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3792">#3792</a>)</li> <li>[Fix] Backout Unifying TBE API using List (Frontend) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2911979328" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3803" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3803/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3803">#3803</a>)</li> <li>[Improvement] Migrate TBE benchmark utilities over to TBE, pt 5b (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2911821242" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3802" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3802/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3802">#3802</a>)</li> <li>[Improvement] Migrate TBE benchmark utilities over to TBE, pt 4 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2908467570" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3794" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3794/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3794">#3794</a>)</li> <li>[Fix] fix bounds check v2 mode with vbe input (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2892516871" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3758" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3758/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3758">#3758</a>)</li> <li>[Improvement] Migrate TBE benchmark utilities over to TBE, pt 3 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2904148689" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3785" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3785/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3785">#3785</a>)</li> <li>[Fix] Fix prev_iter (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2903697013" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3784" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3784/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3784">#3784</a>)</li> <li>[Improvement] Migrate TBE benchmark utilities over to TBE, pt 2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2902436841" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3783" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3783/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3783">#3783</a>)</li> <li>[Improvement] Enable int32_t support for reshape_vbe_offsets (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2902431326" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3782" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3782/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3782">#3782</a>)</li> <li>[Improvement] Migrate TBE benchmark utilities over to TBE (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2902085693" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3781" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3781/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3781">#3781</a>)</li> <li>[New] Unifying TBE API using List (Frontend) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2862019443" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3711" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3711/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3711">#3711</a>)</li> <li>[Improvement] Implement inference bag packing along D (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2767421390" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3541" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3541/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3541">#3541</a>) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2898301353" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3771" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3771/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3771">#3771</a>)</li> <li>[Improvement] Add test for VBE CPU (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2901021664" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3778" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3778/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3778">#3778</a>)</li> <li>[Improvement] Change the TBE bounds check to match the TBE implementation. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2898752681" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3773" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3773/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3773">#3773</a>)</li> <li>[Improvement] Implement generate_vbe_metadata cpu (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2864414531" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3715" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3715/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3715">#3715</a>)</li> <li>[New] Compute <code>info_B_num_bits</code> from T to make it a constant (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2886249460" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3748" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3748/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3748">#3748</a>)</li> <li>[Improvement] Move <code>execute_backward_adagrad</code> into a class (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2885298806" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3744" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3744/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3744">#3744</a>)</li> <li>[Improvement] Add more helper methods for TBE benchmarking (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2885868615" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3747" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3747/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3747">#3747</a>)</li> <li>[Improvement] Move <code>execute_backward_adagrad</code> into a class (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2885298806" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3744" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3744/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3744">#3744</a>)</li> <li>[Improvement] Add barrier to test regression hypothesis (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2882793336" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3741" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3741/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3741">#3741</a>)</li> <li>[Improvement] annotate tensors in schema for PT2 interface (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2882361035" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3738" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3738/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3738">#3738</a>)</li> <li>[Fix] Create cpu iterator irrespective of optimizer choice (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2851819638" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3689" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3689/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3689">#3689</a>)</li> <li>[Fix] Fix the TBE cache_precision to fp32 when on ROCm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2843303601" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3672" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3672/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3672">#3672</a>)</li> <li>[New] Unifying TBE API using List (Backend) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2781765001" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3563" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3563/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3563">#3563</a>)</li> <li>[New] Updating split_table_batched_embeddings_ops_training.py (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2809020209" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3613" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3613/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3613">#3613</a>)</li> <li>[New] Support INT4 Dequant onto GPU for Seq INT TBE look up (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2794243138" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3584" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3584/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3584">#3584</a>)</li> <li>[Fix] Fix calling <code>numel</code> on symbolic shapes issue (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2813908665" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3621" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3621/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3621">#3621</a>)</li> <li>[Fix] fix pre_iter fp32 inaccuracy issue (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2814401722" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3623" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3623/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3623">#3623</a>)</li> <li>[Improvement] directly pass update_util as int flag without syncing iter (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2805125532" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3602" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3602/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3602">#3602</a>)</li> <li>[New] : basic tbe input dump framework (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2802596264" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3593" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3593/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3593">#3593</a>)</li> <li>[New] Add support for <code>int32_t</code> indices in TBE training (2K/N) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2794002063" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3583" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3583/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3583">#3583</a>)</li> <li>[New] Add support for <code>int32_t</code> indices in TBE training (2H/N) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2767011693" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3539" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3539/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3539">#3539</a>)</li> <li>[Misc] Enable v2 forward test for ROCm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2789385076" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3573" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3573/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3573">#3573</a>)</li> <li>[Fix] Fix bug in ROCm optimized forward pass (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2804666065" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3599" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3599/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3599">#3599</a>)</li> <li>[New] Add support for <code>int32_t</code> indices in TBE training (2I/N) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2773938618" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3556" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3556/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3556">#3556</a>)</li> <li>[New] Add support for <code>int32_t</code> indices in TBE training (2F/N) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2658159127" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3376" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3376/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3376">#3376</a>)</li> <li>[Fix] Back out "Optimzed backward pass for ROCm devices (pt 2)" (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2796197443" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3587" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3587/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3587">#3587</a>)</li> <li>[Fix] Revert D65620886 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2793857422" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3582" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3582/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3582">#3582</a>)</li> <li>[New] Add support for <code>int32_t</code> indices in TBE training (2G/N) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2659864189" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3377" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3377/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3377">#3377</a>)</li> <li>[New] Add new optimizer state <code>row_counter</code> for Adam [Frontend] (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2778795048" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3558" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3558/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3558">#3558</a>)</li> <li>[New] Add support for <code>int32_t</code> indices in TBE training (2E/N) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657974720" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3375" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3375/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3375">#3375</a>)</li> <li>[Improvement] Do not call <code>scalar_type</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2672880898" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3394" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3394/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3394">#3394</a>)</li> <li>[Fix] Remove torch.jit.script (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2781702462" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3562" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3562/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3562">#3562</a>)</li> <li>[New] Add support for <code>int32_t</code> indices in TBE training (2D/N) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657688995" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3374" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3374/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3374">#3374</a>)</li> <li>[New] Add support for <code>int32_t</code> indices in TBE training (3/N) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657360987" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3372" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3372/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3372">#3372</a>)</li> <li>[New] Add support for <code>int32_t</code> indices in TBE training (2B/N) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657287735" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3371" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3371/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3371">#3371</a>)</li> <li>[Improvement] Optimzed backward pass for ROCm devices (pt 2) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2748648700" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3511" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3511/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3511">#3511</a>)</li> </ul> <h3>TBE SSD</h3> <ul> <li>[Improvement] Reduce bulk init time and fix OOM (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2922363940" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3828" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3828/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3828">#3828</a>)</li> <li>[Improvement] Engage KVTensor aware checkpoint load paths. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2864965649" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3718" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3718/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3718">#3718</a>)</li> <li>[Fix] uncomment accidently commented out unittest (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2864611794" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3716" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3716/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3716">#3716</a>)</li> <li>[Improvement] sync wait before L1 and L2 flush (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2861649044" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3709" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3709/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3709">#3709</a>)</li> <li>[Improvement] return right away if keys is empty (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2831010982" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3658" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3658/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3658">#3658</a>)</li> <li>[Improvement] Adding couple more APIs to KVTensorWrapper to bring partiy with torch::Tensor (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2822369030" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3645" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3645/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3645">#3645</a>)</li> <li>[Improvement] Move embedding_rocksdb_wrapper to its own header. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2814108219" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3622" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3622/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3622">#3622</a>)</li> <li>[Fix] Fix autodeps for torch/custom_class.h and use it in kv_tensor_wrapper_cpu (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2804889711" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3600" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3600/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3600">#3600</a>)</li> <li>[Improvement] put KVTensorWrapper in its own header (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2791004004" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3575" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3575/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3575">#3575</a>)</li> </ul> <h2>Other Ops</h2> <h3>Inplace Ops</h3> <ul> <li>[Fix] Fix CUDA kernel index data type in deeplearning/fbgemm/fbgemm_gpu/src/embedding_inplace_ops/embedding_inplace_update.cu +10 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2929864493" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3846" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3846/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3846">#3846</a>)</li> </ul> <h3>Permute Ops</h3> <ul> <li>[New] support permute_multi_embedding_function on torch.export (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2957378737" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3897" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3897/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3897">#3897</a>)</li> <li>[Improvement] do not call permute on empty tensor (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2861401138" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3705" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3705/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3705">#3705</a>)</li> </ul> <h3>Quantize Ops</h3> <ul> <li>[New] implement packed quantize row / dequantize row API (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2966538019" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3915" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3915/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3915">#3915</a>)</li> <li>[New] Add small M support (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2847512161" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3682" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3682/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3682">#3682</a>)</li> <li>[Improvement] Provide helper functions for int4 quantization (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2899247738" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3775" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3775/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3775">#3775</a>)</li> <li>[Improvement] test fp8fp8bf16/bf16fp8bf16_fast_gemv is torch compileable (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2916117382" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3809" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3809/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3809">#3809</a>)</li> <li>[Improvement] Eliminate MemCpyDtoH overhead for quantized fast_gemv kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2874100442" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3725" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3725/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3725">#3725</a>)</li> <li>[Improvement] Add abstract impl for Fused8BitRowwiseQuantizedToFloatOrHalf et al. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1010281782" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/715" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/715/hovercard" href="https://github.com/pytorch/FBGEMM/pull/715">#715</a>) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2820006033" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3640" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3640/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3640">#3640</a>)</li> <li>[New] Fold ops registration code (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2819252098" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3634" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3634/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3634">#3634</a>)</li> <li>[Fix] Fix type to assert group size (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2805111673" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3601" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3601/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3601">#3601</a>)</li> </ul> <h3>Sparse Ops</h3> <ul> <li>[Improvement] nested dispatching of segment_csr on cpu/gpu (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2948022358" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3881" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3881/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3881">#3881</a>)</li> <li>[Improvement] Fix faketensor error when dev_weights is undefined. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2890027271" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3755" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3755/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3755">#3755</a>)</li> <li>[Improvement] Set default value to null (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2879280983" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3732" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3732/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3732">#3732</a>)</li> <li>[New] Support histogram_binning_calibration for export (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2830897397" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3657" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3657/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3657">#3657</a>)</li> <li>[Fix] fix data type of block_bucketize_pos in block_bucketize_sparse_features (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2796851280" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3589" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3589/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3589">#3589</a>)</li> <li>[Fix] Fix specailization issue in keyed_jagged_index_select_dim1_forward_cuda (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2791478550" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3578" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3578/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3578">#3578</a>)</li> </ul> <h3>SLL Ops</h3> <ul> <li>[Improvement] Re-organize SLL ops, pt 9 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2834747069" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3665" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3665/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3665">#3665</a>)</li> <li>[Improvement] Re-organize SLL ops, pt 8 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2833947865" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3663" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3663/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3663">#3663</a>)</li> <li>[Improvement] Re-organize SLL ops, pt 7 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2824206322" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3650" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3650/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3650">#3650</a>)</li> <li>[Improvement] Re-organize SLL ops, pt 6 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2822494309" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3647" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3647/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3647">#3647</a>)</li> <li>[Improvement] Re-organize SLL ops, pt 5 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2822482341" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3646" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3646/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3646">#3646</a>)</li> <li>[Improvement] Re-organize SLL ops, pt 4 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2822077908" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3644" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3644/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3644">#3644</a>)</li> <li>[Improvement] Re-organize SLL ops, pt 3 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2825092344" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3652" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3652/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3652">#3652</a>)</li> <li>[Improvement] Re-organize SLL ops, pt 2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2821818709" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3643" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3643/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3643">#3643</a>)</li> <li>[Improvement] Re-organize SLL ops, pt 1 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2821814635" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3642" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3642/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3642">#3642</a>)</li> <li>[Improvement] Fold ops registration code, pt 3 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2821666377" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3641" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3641/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3641">#3641</a>)</li> <li>[Improvement] Fold ops registration code, pt 2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2819356316" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3635" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3635/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3635">#3635</a>)</li> </ul> <h2>Benchmarks</h2> <ul> <li>[Fix] [fbgemm_gpu] Fix CPU benchmark scripts (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2978556281" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3941" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3941/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3941">#3941</a>)</li> <li>[New] Enable multi-processing in CPU TBE micro-benchmarks (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2887898276" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3753" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3753/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3753">#3753</a>)</li> <li>[Improvement] Improve VBE benchmark (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2939168335" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3867" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3867/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3867">#3867</a>)</li> <li>[Improvement] Clean up stochastic rounding benchmarks (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2945277641" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3876" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3876/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3876">#3876</a>)</li> <li>[Fix] Fix EEG indices estimator op (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2932940493" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3852" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3852/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3852">#3852</a>)</li> <li>[New] Expose EEG indices estimation to Python (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2926079186" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3836" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3836/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3836">#3836</a>)</li> <li>[Improvement] Support MTIA for device and device-with-spec (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2925456862" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3832" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3832/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3832">#3832</a>)</li> <li>[Improvement] fix benchmark logging after script reorganization (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2921251822" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3819" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3819/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3819">#3819</a>)</li> <li>[Improvement] Cleanups for the EEG-based TBE benchmark CLI, pt 2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2918592083" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3815" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3815/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3815">#3815</a>)</li> <li>[New] [fbgemm_gpu] Add benchmark workflows (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2862081594" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3713" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3713/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3713">#3713</a>)</li> <li>[Improvement] Add cache-precision arg to TBE device_with_spec bench (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2907522510" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3791" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3791/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3791">#3791</a>)</li> <li>[New] Migrate EEG-based TBE benchmark code to OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2901501573" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3780" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3780/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3780">#3780</a>)</li> <li>[New] Migrate TBE EEG Python code to OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2898876281" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3774" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3774/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3774">#3774</a>)</li> <li>[New] Migrate TBE EEG C++ code to OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2895728145" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3765" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3765/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3765">#3765</a>)</li> <li>[New] Add <code>TBEDataConfig</code> to Python side (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2882592167" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3739" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3739/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3739">#3739</a>)</li> <li>[Improvement] Add Quantize benchmark (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2861432567" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3706" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3706/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3706">#3706</a>)</li> <li>[Improvement] Adding iterations to benchmark script (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2861583660" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3708" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3708/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3708">#3708</a>)</li> <li>[Improvement] Small modifications to quantize_bench script (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2849167769" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3684" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3684/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3684">#3684</a>)</li> <li>[Improvement] Add option to set cache precision in TBE benchmark (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2831391614" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3659" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3659/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3659">#3659</a>)</li> <li>[Improvement] Add tracing option to quantize bench (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2824485435" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3651" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3651/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3651">#3651</a>)</li> <li>[Improvement] Add preprocess stage to quantize bench operators (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2824110056" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3648" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3648/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3648">#3648</a>)</li> <li>[Improvement] Pre-convert indices/offsets in TBE bench (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2802844125" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3595" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3595/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3595">#3595</a>)</li> <li>[Improvement] Allow reusing input data in TBE benchmark (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2802835686" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3594" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3594/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3594">#3594</a>)</li> <li>[Improvement] Profile with kineto and warmup for more accurate benchmarking (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2793113024" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3580" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3580/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3580">#3580</a>) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2794244692" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3585" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3585/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3585">#3585</a>)</li> </ul> <h2>Better Engineering</h2> <h3>Builds</h3> <ul> <li>[Improvement] [fbgemm_gpu] Reduce OSS build sizes for non-GenAI FBGEMM_GPU (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2983530054" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3948" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3948/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3948">#3948</a>)</li> <li>[New] [fbgemm_gpu] Add Scripts for Generating Release Reports (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2844009959" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3676" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3676/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3676">#3676</a>)</li> <li>[New] [AMD] Add CK to dependencies to enable AMD build. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2970939391" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3929" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3929/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3929">#3929</a>)</li> <li>[Fix] [fbgemm_gpu] Fix Nova package labeling for GenAI (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2973581135" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3933" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3933/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3933">#3933</a>)</li> <li>[Fix] [fbgemm_gpu] Update Nova jobs (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2954014503" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3890" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3890/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3890">#3890</a>)</li> <li>[Improvement] update hipify_torch (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2967544207" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3918" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3918/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3918">#3918</a>)</li> <li>[Improvement] [fbgemm_gpu] Fix setup scripts for OSS ROCm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2964100703" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3909" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3909/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3909">#3909</a>)</li> <li>[Fix] [fbgemm_gpu] Fix undefined symbol error (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2959239954" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3900" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3900/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3900">#3900</a>)</li> <li>[Improvement] Add FB python sources into genai CMakeLists.txt (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2950906606" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3886" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3886/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3886">#3886</a>)</li> <li>[Improvement] [fbgemm_gpu] Update CMakeLists.txt for experimental/genai (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2943971306" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3872" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3872/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3872">#3872</a>)</li> <li>[Improvement] [fbgemm_gpu] Increase timeout for Nova jobs (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2941727835" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3871" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3871/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3871">#3871</a>)</li> <li>[New] [fbgemm_gpu] Update Nova CI configuration to support B200 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2939438824" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3868" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3868/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3868">#3868</a>)</li> <li>[Fix] [fbgemm_gpu] Fix bash line to work with macOS builds (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2939122570" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3863" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3863/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3863">#3863</a>)</li> <li>[Improvement] Add option to set build parallelism in OSS workflows (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2936867665" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3859" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3859/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3859">#3859</a>)</li> <li>[New] [fbgemm_gpu] Support newer CUDA architectures in OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2930127737" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3848" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3848/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3848">#3848</a>)</li> <li>[Improvement] [fbgemm_gpu] Increase CUDA test timeout (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2917997938" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3810" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3810/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3810">#3810</a>)</li> <li>[Fix] [fbgemm] Fix compilation issues with GCC 14.1 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2915165762" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3804" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3804/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3804">#3804</a>)</li> <li>[Improvement] [fbgemm_gpu] Limit the number of ROCm hardware targets (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2909362012" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3797" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3797/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3797">#3797</a>)</li> <li>[Improvement] Fix clang vla warning (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2354662823" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/2736" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/2736/hovercard" href="https://github.com/pytorch/FBGEMM/pull/2736">#2736</a>)</li> <li>[Fix] [fbgemm_gpu] Fix PT2 wrapper registrations (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2868054557" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3721" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3721/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3721">#3721</a>)</li> <li>[Fix] [fbgemm_gpu] Fix python docs not being visible (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2864905479" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3717" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3717/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3717">#3717</a>)</li> <li>[New] [fbgemm_gpu] Nova job update to support building against CUDA 12.8 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2861388608" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3704" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3704/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3704">#3704</a>)</li> <li>[New] [fbgemm_gpu] Add CUDA 12.8 build support (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2859127154" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3700" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3700/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3700">#3700</a>)</li> <li>[Improvement] [fbgemm_gpu] Test genai op registration (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2852359028" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3692" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3692/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3692">#3692</a>)</li> <li>[Improvement] [fbgemm_gpu] Break down <code>fbgemm_gpu_tbe_training_backward</code> module further, pt 3 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2854659705" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3694" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3694/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3694">#3694</a>)</li> <li>[Improvement] [fbgemm_gpu] Save built docs as GHA artifact (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2854832421" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3695" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3695/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3695">#3695</a>)</li> <li>[Improvement] Rename sources to avoid internal build issue (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2854917187" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3697" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3697/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3697">#3697</a>)</li> <li>[Improvement] [fbgemm_gpu] Break down CMake module further, pt 2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2847024282" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3681" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3681/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3681">#3681</a>)</li> <li>[Fix] Fix linting CI error introduced in D69213404 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2849125496" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3683" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3683/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3683">#3683</a>)</li> <li>[Improvement] [fbgemm_gpu] Break down CMake module further (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2843717182" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3673" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3673/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3673">#3673</a>)</li> <li>[New] [fbgemm_gpu] GitHub PR scraper (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2831444978" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3661" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3661/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3661">#3661</a>)</li> <li>[New] [fbgemm_gpu] Add macro support for NVCC and HIPCC specific flags (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2819581343" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3636" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3636/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3636">#3636</a>)</li> <li>[Improvement] add AMD specific includes in cuda_prelude.h (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2810213066" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3614" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3614/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3614">#3614</a>)</li> <li>[Fix] [fbgemm_gpu] Fix CMakeLists.txt for experimental/gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2802557144" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3592" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3592/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3592">#3592</a>)</li> <li>[Improvement] [fbgemm_gpu] Upgrade GitHub actions (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2793522552" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3581" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3581/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3581">#3581</a>)</li> <li>[Improvement] add patchelf as a required package in fbgemm_gpu/requirements.txt (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2790683570" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3574" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3574/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3574">#3574</a>)</li> <li>[Improvement] add patchelf as a required package in fbgemm_gpu/requirements.txt (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2790683570" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3574" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3574/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3574">#3574</a>)</li> <li>[New] [fbgemm_gpu] Add build support for AMD MI300 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2784875328" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3566" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3566/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3566">#3566</a>)</li> <li>[Improvement] [fbgemm_gpu] Expand test timeout for ROCm pip install workflow (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2778568792" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3557" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3557/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3557">#3557</a>)</li> <li>[Improvement] [fbgemm_gpu] Update triton version for OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2773723522" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3555" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3555/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3555">#3555</a>)</li> <li>[Fix] [fbgemm_gpu] Fix versioning scheme for ROCm releases (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2773448239" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3554" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3554/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3554">#3554</a>)</li> <li>[Misc] [fbgemm_gpu] Properly disable TBE SSD tests in OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2771188084" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3548" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3548/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3548">#3548</a>)</li> </ul> <h3>Documentation</h3> <ul> <li>[New] [fbgemm_gpu] Add docs for GenAI package (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2961768057" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3905" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3905/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3905">#3905</a>)</li> <li>[Improvement] [fbgemm_gpu] Update docs (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2954450820" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3891" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3891/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3891">#3891</a>)</li> <li>[Misc] Add comments to TBE inference PackedMode (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2906839320" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3789" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3789/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3789">#3789</a>)</li> <li>[Misc] [fbgemm_gpu] Update ROCm and CUDA versions in docs (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2785981157" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3569" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3569/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3569">#3569</a>)</li> <li>[New] [fbgemm_gpu] Add documentation for Feature Gates (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2882653903" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3740" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3740/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3740">#3740</a>)</li> <li>[Improvement] Remove erroneous comment (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2879299307" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3733" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3733/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3733">#3733</a>)</li> <li>[Fix] [fbgemm_gpu] Minor doc fix (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2813722087" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3618" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3618/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3618">#3618</a>)</li> </ul> <h3>Utils</h3> <ul> <li>[New] Better kernel launch utilities (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2965215361" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3914" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3914/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3914">#3914</a>)</li> <li>[Improvement] Add ability to save and load data into HostDeviceBufferPair (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2957498042" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3899" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3899/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3899">#3899</a>)</li> <li>[New] Add abstractions for writing out data (flesh out D71147675, pt 1) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2936289530" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3856" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3856/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3856">#3856</a>)</li> <li>[New] Add feature gate for HIP-based backward kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2926040300" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3835" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3835/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3835">#3835</a>)</li> <li>[Misc] Optionally use env vars for config lookup (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2908634119" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3795" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3795/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3795">#3795</a>)</li> <li>[Fix] Updates and fixes to tensor_accessor.h (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2788133852" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3571" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3571/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3571">#3571</a>)</li> </ul> q10 tag:github.com,2008:Repository/150154628/v1.2.0-rc1 2025-04-10T05:12:01Z v1.2.0-rc1 <p>FBGEMM v1.2.0-rc1</p> q10 tag:github.com,2008:Repository/150154628/v1.1.2-rc1 2025-04-05T07:18:14Z v1.1.2-rc1 <p>[v1.1.2] Ad-hoc release of FBGEMM GenAI to support the newer INT4 ker…</p> q10 tag:github.com,2008:Repository/150154628/v1.1.2 2025-04-04T19:02:32Z v1.1.2: BF16I4 Preshuffled Grouped Gemm (#3917) <p>Summary:<br> X-link: <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2967482908" data-permission-text="Title is private" data-url="https://github.com/facebookresearch/FBGEMM/issues/1006" data-hovercard-type="pull_request" data-hovercard-url="/facebookresearch/FBGEMM/pull/1006/hovercard" href="https://github.com/facebookresearch/FBGEMM/pull/1006">facebookresearch/FBGEMM#1006</a></p> <p>Pull Request resolved: <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2967482102" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3917" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3917/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3917">#3917</a></p> <p>This diff adds a preshuffled variant of BF16I4 Grouped Gemm. Notably, cutlass does not currently support zero points for grouped gemm, so this kernel must be used without them. That said, the accuracy of the kernel appears reasonable and the performance is very compelling.</p> <p>{F1976716898}</p> <p>Reviewed By: jiawenliu64</p> <p>Differential Revision: D72337760</p> <p>fbshipit-source-id: a2cf9e913d095da42f1cf88a5c08dbbe1f2794c9</p> jwfromm tag:github.com,2008:Repository/150154628/v1.1.1-rc1 2025-04-01T19:18:30Z v1.1.1-rc1 <p>[v1.1.1] This is a release of latest FBGEMM_GPU (with kernels unavail…</p> q10 tag:github.com,2008:Repository/150154628/v1.1.0 2025-04-27T08:22:39Z FBGEMM_GPU v1.1.0 Release Notes <h1>Highlights</h1> <h3>TBE GPU</h3> <ul> <li>Introducing support for int32_t indices in TBE training</li> <li>Extended TBE support for larger embedding dimensions</li> <li>Made the learning rate a tensor value</li> <li>Improvements on indices bounds checking</li> </ul> <h3>TBE CPU</h3> <ul> <li>Improved ARM support with SVE implementations for matrix multiplication and float matrix transpose</li> <li>Improved the EmbeddingSpMDMAutovec API</li> <li>Migrated FP32 ops to OSS</li> </ul> <h3>TBE SSD</h3> <ul> <li>Enabled VBE in SSD-TBE</li> <li>Async initialization of RockDB SSD tensors and pad before writing to rocksDB</li> <li>Improvements on indices bounds and other constraints checking</li> </ul> <h3>Gen AI Ops</h3> <ul> <li>Custom allgather support multiple dtypes, with dtype checking to prevent silent failures</li> </ul> <h3>ROCm</h3> <ul> <li>Add CK FP8 Batched GEMM and Rowwise GEMM kernels along with heuristic tuning</li> <li>Fixed CK FP8 rowwise quantization for some GEMM shapes</li> <li>Introduced HIP-specific optimizations to the TBE forward and backward passes</li> </ul> <h3>SLL ops</h3> <ul> <li>Migrated Sequence Learning Library (SLL) ops to OSS</li> </ul> <h3>Better Engineering</h3> <ul> <li>Restructured the build to produce multipiple smaller shared libraries instead of a single large binary</li> <li>New and improved tests and benchmarks</li> <li>Improved ROCm build variant support</li> <li>Add build support for CUDA 12.6 and Python 3.13</li> </ul> <h1>Software Requirements</h1> <p>FBGEMM_GPU v1.1.0 has been tested and known to work on the following setups:</p> <ul> <li><strong>PyTorch</strong>: v2.6</li> <li><strong>CUDA</strong>: v11.8, 12.4, 12.6</li> <li><strong>Python</strong>: v3.9, 3.10, 3.11, 3.12, 3.13</li> </ul> <p>It is recommended to prepare an isolated environment for installing and running FBGEMM_GPU, such as Conda and/or Docker.</p> <h2>Availability</h2> <p>FBGEMM_GPU can be fetched directly from PyPI:</p> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="# FBGEMM_GPU CUDA variant (only the CUDA 12.4 variant is available) pip install fbgemm-gpu==1.1.0 # FBGEMM_GPU CPU variant pip install fbgemm-gpu-cpu==1.1.0"><pre><span class="pl-c"><span class="pl-c">#</span> FBGEMM_GPU CUDA variant (only the CUDA 12.4 variant is available)</span> pip install fbgemm-gpu==1.1.0 <span class="pl-c"><span class="pl-c">#</span> FBGEMM_GPU CPU variant</span> pip install fbgemm-gpu-cpu==1.1.0</pre></div> <p>Alternatively, it can be fetched from PyTorch PIP:</p> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="# FBGEMM_GPU CUDA variant pip install fbgemm-gpu==1.1.0 --index-url https://download.pytorch.org/whl/cu118/ pip install fbgemm-gpu==1.1.0 --index-url https://download.pytorch.org/whl/cu124/ pip install fbgemm-gpu==1.1.0 --index-url https://download.pytorch.org/whl/cu126/ # FBGEMM_GPU CPU variant pip install fbgemm-gpu==1.1.0 --index-url https://download.pytorch.org/whl/cpu"><pre><span class="pl-c"><span class="pl-c">#</span> FBGEMM_GPU CUDA variant</span> pip install fbgemm-gpu==1.1.0 --index-url https://download.pytorch.org/whl/cu118/ pip install fbgemm-gpu==1.1.0 --index-url https://download.pytorch.org/whl/cu124/ pip install fbgemm-gpu==1.1.0 --index-url https://download.pytorch.org/whl/cu126/ <span class="pl-c"><span class="pl-c">#</span> FBGEMM_GPU CPU variant</span> pip install fbgemm-gpu==1.1.0 --index-url https://download.pytorch.org/whl/cpu</pre></div> <h1>Changes</h1> <h2>Table Batched Embedding (TBE) operators</h2> <h3>For GPU</h3> <ul> <li>[New] Add support for <code>int32_t</code> indices in TBE training (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2659864189" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3377" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3377/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3377">#3377</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657974720" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3375" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3375/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3375">#3375</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657688995" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3374" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3374/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3374">#3374</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657360987" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3372" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3372/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3372">#3372</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657287735" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3371" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3371/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3371">#3371</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2634217247" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3324" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3324/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3324">#3324</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2606223226" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3267" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3267/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3267">#3267</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2603959364" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3264" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3264/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3264">#3264</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2603416752" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3263" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3263/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3263">#3263</a> <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2598528704" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3257" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3257/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3257">#3257</a>)</li> <li>[New] Add support for int64_t indices and offsets in TBE inference (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2598379220" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3254" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3254/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3254">#3254</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2574445655" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3233" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3233/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3233">#3233</a>)</li> <li>[New] Extend TBE support for larger embedding dimensions (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2719524151" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3462" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3462/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3462">#3462</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2721614446" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3467" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3467/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3467">#3467</a>)</li> <li>[New] Make <code>learning rate</code> tensor (Backend) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2620068121" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3287" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3287/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3287">#3287</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2629793647" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3310" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3310/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3310">#3310</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2636820859" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3332" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3332/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3332">#3332</a>)</li> <li>[New] Add PTA checks to embedding_bounds_check kernels" (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2631456409" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3318" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3318/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3318">#3318</a>)</li> <li>[Fix] Fix PackedTensorAccessor for batch_index_select (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2615440001" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3281" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3281/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3281">#3281</a>)</li> <li>[Fix] Set cache_precision = weights_precision in TBE if it is not explicitly set (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657187836" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3370" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3370/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3370">#3370</a>)</li> <li>[Fix] Fix pt2_wrapper registration for unified TBE interface (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2579821275" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3238" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3238/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3238">#3238</a>)</li> <li>[Fix] Fix PT2 compliant opcheck tests (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2681855460" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3404" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3404/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3404">#3404</a>)</li> <li>[Fix] Fix FBGEMM_GPU_MEMCHECK in Split optimizers (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2692557258" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3416" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3416/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3416">#3416</a>)</li> <li>[Fix] Fix learning rate as tensor for PT2 compile (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2684828334" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3407" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3407/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3407">#3407</a>)</li> <li>[New] Add new optimizer state <code>row_counter</code> for Adam [Frontend] (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2778795048" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3558" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3558/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3558">#3558</a>)</li> <li>[New] Add new optimizer state <code>row_counter</code> for Adam [Backend] (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2642580726" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3342" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3342/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3342">#3342</a>)</li> <li>[Fix] Back out ""Add support for int64_t indices and offsets in TBE inference [7C/N]"" (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2600757682" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3258" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3258/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3258">#3258</a>)</li> <li>[Fix] Back out ""Add support for int64_t indices and offsets in TBE inference [8/N]"" (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2598460090" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3255" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3255/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3255">#3255</a>)</li> <li>[Fix] Fix global weight decay Faketensor test (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2642471177" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3341" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3341/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3341">#3341</a>)</li> <li>[Fix] Fix pt2_wrapper registration for unified TBE interface (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2579760158" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3237" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3237/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3237">#3237</a>)</li> <li>[Fix] Fix ""Cannot call numel() on tensor with symbolic sizes/strides"" (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2656329724" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3368" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3368/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3368">#3368</a>)</li> <li>[Fix] Fix grid size overflow in generate_vbe_metadata (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2726390236" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3484" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3484/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3484">#3484</a>)</li> <li>[Fix] Fix an integer overflow in permute_multi_embedding() (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2720869590" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3465" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3465/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3465">#3465</a>)</li> <li>[Fix] Fix the sync point caused by iter_cpu.item() (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2680908343" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3401" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3401/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3401">#3401</a>)</li> <li>[Fix] Fix global weight decay Faketensor test (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2642471177" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3341" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3341/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3341">#3341</a>)</li> <li>[Fix] Hot fix to skip VBE CPU reshaping for MTIA (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2721371209" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3466" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3466/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3466">#3466</a>)</li> <li>[Fix] address mem over used during flushing (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2719362041" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3460" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3460/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3460">#3460</a>)</li> <li>[Improvement] Add <code>iter</code> singular value into TBE optimizer state (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2570862251" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3228" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3228/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3228">#3228</a>)</li> <li>[Improvement] V2 fwd modified warps (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2787575814" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3570" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3570/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3570">#3570</a>)</li> <li>[Improvement] Add enable_async_update into tbe signature and config (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2712959779" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3431" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3431/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3431">#3431</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2719392227" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3461" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3461/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3461">#3461</a>)"</li> <li>[Improvement] Adjust kNumThreads for bounds_check_indices_kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2625959642" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3299" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3299/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3299">#3299</a>)</li> <li>[Improvement] Reduce registers in bounds_check_indices" (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2625953017" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3298" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3298/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3298">#3298</a>)</li> <li>[Improvement] Mark unified autograd function traceable (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2660408265" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3378" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3378/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3378">#3378</a>)</li> <li>[Improvement] Improve bounds_check_indices for VBE (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2668999515" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3388" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3388/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3388">#3388</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2667302495" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3386" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3386/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3386">#3386</a>)</li> <li>[Improvement] Do not call <code>scalar_type</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2672880898" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3394" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3394/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3394">#3394</a>)</li> <li>[Improvement] optimizer 1d -- EMA in place (fbgemm part) (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2681258766" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3402" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3402/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3402">#3402</a>)</li> <li>[Improvement] Clean up nbit_forward tests (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2619314382" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3286" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3286/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3286">#3286</a>)</li> <li>[Improvement] Remove unused-variable in some generated code (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2635573139" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3327" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3327/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3327">#3327</a>)</li> <li>[Improvement] Limit grid size of bounds_check_indices" (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2615449996" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3282" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3282/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3282">#3282</a>)</li> <li>[Improvement] Support config based bound check version via extended modes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2696191213" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3418" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3418/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3418">#3418</a>)</li> <li>[Improvement] Use int64_t index for SplitOptimizer grad (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2716234158" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3447" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3447/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3447">#3447</a>)</li> <li>[Improvement] Remove unused arg from generate_vbe_metadata frontend (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2716472804" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3453" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3453/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3453">#3453</a>)</li> <li>[Improvement] Add generate_vbe_metadata test (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2726313642" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3483" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3483/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3483">#3483</a>)</li> <li>[Improvement] Support config based bound check version via extended modes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2718625660" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3454" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3454/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3454">#3454</a>)</li> <li>[Improvement] make <code>iter</code> PT2 compatible (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2595870157" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3253" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3253/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3253">#3253</a>)</li> <li>[Improvement] Add meta function for PT2 wrappers (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2581982850" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3240" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3240/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3240">#3240</a>)</li> <li>[Improvement] Nesterov (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2572080179" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3232" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3232/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3232">#3232</a>)</li> </ul> <h3>For CPU</h3> <ul> <li>[New] Introduce SVE function for matrix multiplication (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2649979264" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3348" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3348/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3348">#3348</a>)</li> <li>[New] Add sve implementation for float matrix transpose (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2696779013" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3421" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3421/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3421">#3421</a>)</li> <li>[New] autovec specialization framework (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2670425565" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3393" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3393/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3393">#3393</a>)</li> <li>[New] Move FP32 kernels to OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2785628618" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3568" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3568/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3568">#3568</a>)</li> <li>[Improvement] Pull in PR for Kleidi-based FP16 kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2743415641" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3507" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3507/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3507">#3507</a>)</li> <li>[Improvement] Use local buffer where possible (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2627992794" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3304" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3304/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3304">#3304</a>)</li> <li>[Improvement] Refactor GenerateEmbeddingXXX functions (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2628001517" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3307" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3307/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3307">#3307</a>)</li> <li>[Improvement] Increase local_storage size to 512 floats (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2653885730" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3357" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3357/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3357">#3357</a>)</li> <li>[Improvement] Adjust EmbeddingSpMDMAutovec API (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2654088348" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3366" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3366/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3366">#3366</a>)</li> <li>[Improvement] Split loops to work around loop vectorizer weakness (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2684034913" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3406" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3406/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3406">#3406</a>)</li> <li>[Improvement] Do an early check that data_size is not negative (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2627995672" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3305" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3305/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3305">#3305</a>)</li> <li>[Improvement] Fix strict aliasing violation, code cleanup (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2627999892" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3306" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3306/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3306">#3306</a>)</li> </ul> <h3>SSD TBE Operators</h3> <ul> <li>[New] Enable VBE in SSD-TBE (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2587230388" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3247" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3247/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3247">#3247</a>)</li> <li>[Improvement] put KVTensorWrapper in its own header (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2791004004" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3575" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3575/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3575">#3575</a>)</li> <li>[Improvement] Moving KVTensorWrapper to a header file to be used in ModelStore checkpointing code (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2612324500" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3276" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3276/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3276">#3276</a>)</li> <li>[Improvement] Async initialization of RockDB SSD tensors (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2753184856" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3520" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3520/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3520">#3520</a>)</li> <li>[Improvement] pad before writing to rocksDB (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2583244029" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3245" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3245/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3245">#3245</a>)</li> <li>[Improvement] use RocksDB iterator to read key range from ssd embedding (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2731604263" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3495" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3495/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3495">#3495</a>)</li> <li>[Improvement] Log total duration spent prefetching (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2728425903" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3487" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3487/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3487">#3487</a>)</li> <li>[Improvement] address mem over used during flushing (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2719362041" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3460" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3460/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3460">#3460</a>)</li> <li>[Improvement] Create move TBE to right device, and set Cache Load in TBE class (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2713880686" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3438" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3438/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3438">#3438</a>)</li> <li>[Improvement] Unit test for new move tbe from device/cache_load method (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2713880484" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3437" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3437/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3437">#3437</a>)</li> <li>[Improvement] make L2/rocksdb update async optional (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2712958285" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3429" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3429/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3429">#3429</a>)</li> <li>[Improvement] Drop RoPE when filling KV cache (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2645641762" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3346" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3346/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3346">#3346</a>)</li> <li>[Improvement] Remove setting total_cache_hash_size as buffer (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2715709052" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3441" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3441/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3441">#3441</a>)</li> <li>[Improvement] Add meta registrations for kv_cache operators (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2715866992" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3442" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3442/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3442">#3442</a>)</li> <li>[Improvement] remove output dtype restriction in SSD TBE (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2754485818" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3524" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3524/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3524">#3524</a>)</li> <li>[Improvement] change pmt require grad to false when detached (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2754486047" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3525" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3525/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3525">#3525</a>)</li> <li>[Improvement] add more attributes to PartiallyMaterializedTensor (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2627846721" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3300" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3300/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3300">#3300</a>)</li> <li>[Improvement] skip broken inference test that uses ssd TBE (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2731602895" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3494" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3494/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3494">#3494</a>)</li> <li>[Improvement] "coro =&gt; fut" (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2712959378" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3430" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3430/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3430">#3430</a>)</li> <li>[Improvement] Reland of D65489998 Optimize sharding performance of embeddings (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2771429138" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3549" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3549/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3549">#3549</a>)</li> <li>[Improvement] Remove torch.jit.script (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2781702462" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3562" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3562/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3562">#3562</a>)</li> </ul> <h2>GenAI Support and Operators</h2> <ul> <li>[New] Add nccl_alltoall function (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2771516487" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3551" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3551/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3551">#3551</a>)</li> <li>[New] custom allgather support multiple dtypes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2732259504" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3498" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3498/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3498">#3498</a>)</li> <li>[Improvement] Make sure fake tensor functions return on proper device (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2600757682" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3258" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3258/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3258">#3258</a>)</li> <li>[Improvement] Add CPU registrations to custom operators (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2603320451" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3262" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3262/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3262">#3262</a>)</li> <li>[Improvement] Check src &amp; dst dtypes in allgather to prevent silent failures (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2753467557" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3523" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3523/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3523">#3523</a>)</li> <li>[Improvement] Better shape function registration (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2579760158" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3237" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3237/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3237">#3237</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2641939032" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3340" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3340/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3340">#3340</a>)</li> <li>[Improvement] Package re-organization improvements (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2768378149" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3546" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3546/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3546">#3546</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2593262318" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3251" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3251/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3251">#3251</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2696211884" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3419" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3419/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3419">#3419</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2606714725" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3268" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3268/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3268">#3268</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2748752047" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3512" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3512/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3512">#3512</a>)</li> </ul> <h3>FP8 and other Quantization support</h3> <ul> <li>[New] New autotune config for M=4 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2612436127" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3277" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3277/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3277">#3277</a>)</li> <li>[New] MoE FP8 grouped GEMM (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2633975805" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3321" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3321/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3321">#3321</a>)</li> <li>[New] Add shape check on GroupedGEMM kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2716305378" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3449" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3449/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3449">#3449</a>)</li> <li>[New] Tuning for fp8 gemm with emu1.7 shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2713748157" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3436" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3436/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3436">#3436</a>)</li> <li>[Improvement] more fp8 tuning for decode and not need to pad (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2791069520" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3576" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3576/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3576">#3576</a>)</li> <li>[Improvement] llm decode shapes fp8 rowwise gemm tuning (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2784718955" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3565" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3565/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3565">#3565</a>)</li> <li>[Improvement] Split FP8 Grouped Gemm into dynamic and static version (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2768220068" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3543" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3543/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3543">#3543</a>)</li> <li>[Improvement] Warp-specialized FP8 rowsise GEMM kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2761168080" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3532" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3532/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3532">#3532</a>)</li> <li>[Improvement] Add Cutlass FP8 Grouped Gemm to Quantize Bench (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2756792559" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3530" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3530/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3530">#3530</a>)</li> <li>[Improvement] Fixed FBGEMM fp8 rowwise for irregular shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2729106575" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3491" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3491/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3491">#3491</a>)</li> <li>[Improvement] Properly define preallocated output as mutable in fp8 rowwise gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2724112019" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3476" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3476/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3476">#3476</a>)</li> <li>[Improvement] Fix FP8 Rowwise Gemm Compilation with Auto-functionalize V2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2719098142" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3457" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3457/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3457">#3457</a>)</li> <li>[Improvement] Support zero-size inputs in FP8 cuda quantize kernel (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2716259892" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3448" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3448/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3448">#3448</a>)</li> <li>[Improvement] update FP8 GEMM tuning for emu1.7 7B shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2669703744" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3391" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3391/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3391">#3391</a>)</li> <li>[Improvement] Customize FP8 grouped GEMM for non-zero calculation for token choice MoE (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2663488921" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3383" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3383/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3383">#3383</a>)</li> <li>[Improvement] Support FP8 grouped GEMM with cudagraph (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2657511253" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3373" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3373/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3373">#3373</a>)</li> <li>[Improvement] Refactor FP8 grouped GEMM to prepare cudagraph support (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2656659088" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3369" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3369/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3369">#3369</a>)</li> <li>[Improvement] Improve FP8 BMM heuristic for large shapes and MoE E2E performance (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2642993783" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3344" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3344/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3344">#3344</a>)</li> <li>[Improvement] retune some of the EMU1.6 7B FP8 GEMM shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2636145254" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3328" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3328/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3328">#3328</a>)</li> <li>[Improvement] Make FP8 BMM output contiguous (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2607304060" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3270" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3270/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3270">#3270</a>)</li> <li>[Improvement] Tune FP8 rowwise bmm tile hueristic (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2598504763" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3256" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3256/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3256">#3256</a>)</li> <li>[Improvement] more FP8 GEMM tuning for LDM shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2691467166" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3414" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3414/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3414">#3414</a>)</li> <li>[Improvement] Split up <code>f8f8bf16_rowwise_batched.cu</code> (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2661268495" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3381" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3381/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3381">#3381</a>)</li> <li>[Improvement] use sym int in quantize.cpp for f8f8bf16_rowwise_meta (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2685082450" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3410" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3410/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3410">#3410</a>)</li> <li>[Improvement] Remove triton.ops dependency from fbgemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2636415427" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3329" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3329/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3329">#3329</a>)</li> <li>[Improvement] Improve performance of prefill mode FP8 Grouped Gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2753342989" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3522" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3522/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3522">#3522</a>)</li> <li>[Improvement] support quantize_fp8_row for up to 4d non contiguous tensor (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2743527641" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3508" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3508/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3508">#3508</a>)</li> <li>[Improvement] Back out ""support quantize_fp8_row for up to 4d non contiguous tensor"" (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2740210144" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3505" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3505/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3505">#3505</a>)</li> <li>[Improvement] Make the scale match the shape of quantized value with N-D tensors (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2674508149" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3396" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3396/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3396">#3396</a>)</li> <li>[Improvement] Fix out-of-bound load in row scaling (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2755349742" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3527" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3527/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3527">#3527</a>)</li> </ul> <h3>ROCm</h3> <ul> <li>[New] More CK FP8 rowwise GEMM instances and tuning (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2718683498" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3455" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3455/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3455">#3455</a>)</li> <li>[New] Setup for ck fp8 batched gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2634103519" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3322" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3322/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3322">#3322</a>)</li> <li>[New] CK FP8 Batched Gemm Heuristic Tuning (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2639612115" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3336" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3336/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3336">#3336</a>)</li> <li>[New] CK FP8 Grouped Gemm Support (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2630670046" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3316" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3316/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3316">#3316</a>)</li> <li>[New] Enable v2 forward test for ROCm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2789385076" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3573" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3573/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3573">#3573</a>)</li> <li>[New] Add fused_moe kernel to ck_extension (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2749557439" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3518" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3518/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3518">#3518</a>)</li> <li>[Improvement] Implement Vec2 load/store for ROCm devices (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2691278431" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3413" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3413/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3413">#3413</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2723953452" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3475" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3475/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3475">#3475</a>)</li> <li>[Improvement] Manual loop unroll for rocm inference (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2714507182" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3439" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3439/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3439">#3439</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2683047255" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3405" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3405/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3405">#3405</a>)</li> <li>[Improvement] Optimzed backward pass for ROCm devices (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2655869405" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3367" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3367/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3367">#3367</a>)</li> <li>[Improvement] Add manual loop unroll for rocm devices in fwd pass (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2629471609" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3309" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3309/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3309">#3309</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2645041400" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3345" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3345/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3345">#3345</a>)</li> <li>[Improvement] [ROCm] debug v2 kernel for ROCm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2606109531" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3266" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3266/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3266">#3266</a>)</li> <li>[Improvement] Optimzed backward pass for ROCm devices (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2748648700" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3511" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3511/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3511">#3511</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2728437941" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3488" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3488/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3488">#3488</a>)</li> <li>[Improvement] FP8 Rowwise compile fix followup for AMD (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2724184186" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3478" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3478/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3478">#3478</a>)</li> <li>[Improvement] Use output zero fill into grouped gemm kernel setup (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2766839241" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3537" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3537/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3537">#3537</a>)</li> <li>[Improvement] ROCm] remove the duplicated ROCm version print as it has been done in Pytorch (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2636449919" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3330" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3330/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3330">#3330</a>)</li> <li>[Improvement] Small cleanup of CK kernels (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2615061318" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3278" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3278/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3278">#3278</a>)</li> <li>[Improvement] Cherry-pick CK PR <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1618132747" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/1636" data-hovercard-type="issue" data-hovercard-url="/pytorch/FBGEMM/issues/1636/hovercard" href="https://github.com/pytorch/FBGEMM/issues/1636">#1636</a> for fp8 GEMM rowwise for 70B Prefill (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2749097024" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3517" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3517/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3517">#3517</a>)</li> <li>[Improvement] Heuristic Tuning for CK FP8 Grouped Gemm (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2653660393" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3356" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3356/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3356">#3356</a>)</li> <li>[Improvement] Temporary disable nbit_forward_test on OSS rocm clang (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2716045964" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3445" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3445/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3445">#3445</a>)</li> <li>[Fix] Fix CK FP8 rowwise quantization for some GEMM shapes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2728421557" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3486" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3486/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3486">#3486</a>)</li> </ul> <h3>SLL</h3> <ul> <li>[Improvement] Migrate SLL ops to OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2728225442" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3485" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3485/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3485">#3485</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2724248891" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3479" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3479/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3479">#3479</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2719334443" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3459" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3459/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3459">#3459</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2719022476" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3456" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3456/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3456">#3456</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2708681972" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3428" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3428/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3428">#3428</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2719137088" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3458" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3458/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3458">#3458</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2653455119" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3354" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3354/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3354">#3354</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2650870510" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3352" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3352/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3352">#3352</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2650784563" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3351" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3351/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3351">#3351</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2650570427" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3350" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3350/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3350">#3350</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2647019394" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3347" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3347/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3347">#3347</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2636640770" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3331" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3331/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3331">#3331</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2723384922" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3472" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3472/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3472">#3472</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2724978448" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3482" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3482/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3482">#3482</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2723921707" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3474" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3474/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3474">#3474</a>)</li> <li>[Improvement] Fix specialization issue in keyed_jagged_index_select_dim1_forward_cuda (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2791478550" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3578" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3578/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3578">#3578</a>)</li> <li>[Improvement] Align sll function names (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2722175953" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3471" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3471/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3471">#3471</a>)</li> <li>[Improvement] Break up SLL test files (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2771444292" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3550" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3550/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3550">#3550</a>)</li> <li>[Improvement] Register jagged ops to CompositeImplicitAutograd (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2673976984" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3395" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3395/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3395">#3395</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2592971496" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3249" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3249/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3249">#3249</a>)</li> </ul> <h2>Sparse Operators</h2> <h3>Sparse Ops</h3> <ul> <li>[Improvement] Register fake tensor impl for fbgemm::all_to_one_device (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2633675640" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3320" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3320/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3320">#3320</a>)</li> <li>[Improvement] Code cleanups to sparse bucketize and sparse block bucketize kernels (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2625586596" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3296" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3296/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3296">#3296</a>,<br> <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2625584910" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3295" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3295/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3295">#3295</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2627940312" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3302" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3302/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3302">#3302</a>)</li> <li>[Improvement] Update impl_abstract in sparse ops (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2629893500" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3311" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3311/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3311">#3311</a>)</li> <li>[Improvement] Cleanup stray testing line (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2651023767" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3353" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3353/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3353">#3353</a>)</li> <li>[Improvement] Print the node infos when CUDA p2p init fails (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2669510198" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3390" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3390/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3390">#3390</a>)</li> <li>[Improvement] Add large my_size support in _block_bucketize_pooled_sparse_features_cuda_kernel2 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2625273124" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3294" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3294/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3294">#3294</a>)</li> <li>[Improvement] Kernel support for multiple buckets per rank (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2634142159" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3323" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3323/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3323">#3323</a>)</li> <li>[Improvement] Add CPU group_index_select fwd and bwd impl (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2609366112" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3273" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3273/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3273">#3273</a>)</li> <li>[Improvement] Skip check_all_same_device if only CPU and meta tensors appear (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2581989765" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3241" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3241/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3241">#3241</a>)</li> <li>[Improvement] create pack_segments_v2 with additional pad_minf and presence_mask functionality (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2708333026" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3427" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3427/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3427">#3427</a>)</li> </ul> <h2>Quantization Operators</h2> <h3>Quantize Ops</h3> <ul> <li>[Improvement] Add meta dispatch for FusedNBitRowwiseQuantizedSBHalfToFloatOrHalf (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2590191490" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3248" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3248/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3248">#3248</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2571595108" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3231" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3231/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3231">#3231</a>)</li> <li>[Improvement] Add torch checks for QuantizedCommCodec (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2600944073" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3260" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3260/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3260">#3260</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2669404471" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3389" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3389/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3389">#3389</a>)</li> <li>[Fix] Fix index overflow for superlarge inputs (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2751763526" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3519" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3519/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3519">#3519</a>)</li> </ul> <h3>MX4 Ops</h3> <ul> <li>[Improvement] MX4 group size configuration for pyper (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2749023984" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3516" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3516/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3516">#3516</a>)</li> <li>[Fix] Various illegal memory access fixes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2571198467" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3229" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3229/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3229">#3229</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2746457095" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3509" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3509/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3509">#3509</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2650272992" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3349" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3349/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3349">#3349</a>)</li> </ul> <h2>Better Engineering</h2> <h3>Benchmarks and tests</h3> <ul> <li>[New] Add a benchmark for VBE (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2720774891" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3464" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3464/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3464">#3464</a>)</li> <li>[New] Add Machete to fbgemm quantize bench (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2600759617" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3259" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3259/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3259">#3259</a>)</li> <li>[Improvement] Improve bounds check indices benchmark (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2615450307" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3283" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3283/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3283">#3283</a>)</li> <li>[Improvement] Add trace for nbit_device (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2622943118" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3292" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3292/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3292">#3292</a>)</li> <li>[Improvement] Use cudagraph for autotune (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2622542674" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3291" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3291/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3291">#3291</a>)</li> <li>[Improvement] Improve benchmark accuracy with warmups and kineto profiling (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2794244692" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3585" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3585/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3585">#3585</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2793113024" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3580" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3580/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3580">#3580</a>)</li> <li>[Fix] Fix test error (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2724391577" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3480" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3480/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3480">#3480</a>)</li> <li>[Fix] Disable SLL test in OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2768349708" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3545" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3545/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3545">#3545</a>)</li> </ul> <h3>Build / CI improvements</h3> <ul> <li>[New] Add build support for CUDA 12.6 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2677191920" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3398" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3398/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3398">#3398</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2762385724" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3533" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3533/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3533">#3533</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2738994531" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3503" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3503/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3503">#3503</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2713372162" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3434" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3434/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3434">#3434</a>)</li> <li>[New] Add build support for Python 3.13 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2737193713" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3502" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3502/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3502">#3502</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2756646019" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3529" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3529/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3529">#3529</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2773723522" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3555" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3555/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3555">#3555</a>)</li> <li>[New] Modularize the OSS CMake build (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2663565596" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3385" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3385/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3385">#3385</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2670361066" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3392" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3392/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3392">#3392</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2684972960" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3408" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3408/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3408">#3408</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2692707166" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3417" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3417/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3417">#3417</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2716125783" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3446" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3446/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3446">#3446</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2716336226" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3450" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3450/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3450">#3450</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2716407738" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3451" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3451/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3451">#3451</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2729456336" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3492" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3492/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3492">#3492</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2736642127" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3500" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3500/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3500">#3500</a>)</li> <li>[Improvement] Add CUTLASS 3.6 compatibility (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2627977948" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3303" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3303/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3303">#3303</a>)</li> <li>[Improvement] MIsc CMake build fixes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2748825269" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3513" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3513/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3513">#3513</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2662746812" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3382" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3382/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3382">#3382</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2724124936" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3477" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3477/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3477">#3477</a>)</li> <li>[Improvement] Update ManyLinux support to ManyLinux 2.28 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2753296076" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3521" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3521/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3521">#3521</a>)</li> <li>[Improvement] Update Triton (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2731900397" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3497" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3497/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3497">#3497</a>)</li> <li>[Improvement] Various build fixes and workflow improvements for ROCm jobs (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2784875328" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3566" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3566/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3566">#3566</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2778568792" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3557" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3557/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3557">#3557</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2773448239" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3554" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3554/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3554">#3554</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2769430305" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3547" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3547/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3547">#3547</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2736760710" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3501" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3501/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3501">#3501</a>)</li> <li>[Improvement] Various GitHub workflow improvements (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2713249359" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3432" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3432/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3432">#3432</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2756858832" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3531" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3531/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3531">#3531</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2733695433" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3499" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3499/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3499">#3499</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2748828654" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3514" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3514/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3514">#3514</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2768092315" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3542" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3542/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3542">#3542</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2595534289" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3252" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3252/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3252">#3252</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2793522552" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3581" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3581/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3581">#3581</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2766960141" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3538" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3538/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3538">#3538</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2766760583" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3536" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3536/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3536">#3536</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2716125783" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3446" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3446/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3446">#3446</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2625780961" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3297" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3297/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3297">#3297</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2582322503" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3243" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3243/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3243">#3243</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2582101802" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3242" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3242/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3242">#3242</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2579821275" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3238" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3238/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3238">#3238</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2580103795" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3239" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3239/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3239">#3239</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2587011542" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3246" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3246/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3246">#3246</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2579366587" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3236" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3236/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3236">#3236</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2756466159" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3528" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3528/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3528">#3528</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2571527608" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3230" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3230/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3230">#3230</a>)</li> <li>[Improvement] Various documentation fixes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2639978592" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3339" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3339/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3339">#3339</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2637175097" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3333" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3333/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3333">#3333</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2582389093" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3244" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3244/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3244">#3244</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2653982421" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3365" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3365/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3365">#3365</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2621881928" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3289" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3289/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3289">#3289</a>)</li> <li>[Improvement] Improvements to documentation regarding compatibility (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2785981157" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3569" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3569/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3569">#3569</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2615343301" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3280" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3280/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3280">#3280</a>)</li> <li>[Improvement] Update package requirements.txt (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2790683570" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3574" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3574/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3574">#3574</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2721846464" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3469" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3469/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3469">#3469</a>)</li> <li>[Improvement] Increase time-out for CUDA OSS CI (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2571527608" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3230" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3230/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3230">#3230</a>)</li> <li>[Improvement] Add backwards compatibility checks for v1.1.0 release (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2728685091" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3489" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3489/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3489">#3489</a>)</li> <li>[Improvement] Disable certain tests in OSS (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2715896998" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3443" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3443/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3443">#3443</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2771188084" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3548" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3548/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3548">#3548</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2609287748" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3272" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3272/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3272">#3272</a>)</li> </ul> <h3>Misc Cleanups</h3> <ul> <li>[Improvement] Lint fixes (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2593193707" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3250" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3250/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3250">#3250</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2739055105" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3504" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3504/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3504">#3504</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2713487106" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3435" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3435/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3435">#3435</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2676147858" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3397" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3397/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3397">#3397</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2624630711" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3293" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3293/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3293">#3293</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2668833811" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3387" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3387/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3387">#3387</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2713310219" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3433" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3433/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3433">#3433</a>)</li> <li>[Improvement] Remove unused variables (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2638527033" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3335" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3335/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3335">#3335</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2653598396" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3355" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3355/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3355">#3355</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2653925381" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3359" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3359/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3359">#3359</a>)</li> <li>[Improvement] Use type-safe utilities from c10 (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2618896033" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3285" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3285/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3285">#3285</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2653925203" data-permission-text="Title is private" data-url="https://github.com/pytorch/FBGEMM/issues/3358" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/FBGEMM/pull/3358/hovercard" href="https://github.com/pytorch/FBGEMM/pull/3358">#3358</a>)</li> </ul> q10 tag:github.com,2008:Repository/150154628/v1.1.0-rc3 2025-01-29T22:07:32Z v1.1.0-rc3 <p>v1.1.0-rc3</p> q10 tag:github.com,2008:Repository/150154628/v1.1.0-rc2 2025-01-16T00:21:57Z v1.1.0-rc2 <p>v1.1.0-rc2</p> q10 tag:github.com,2008:Repository/150154628/v1.1.0-rc1 2025-01-07T19:17:29Z v1.1.0-rc1 <p>v1.1.0-rc1</p> q10