Skip to content

Conversation

@Ali-Tehrani
Copy link

Summary:
Implements KV-ZCH with the benchmarking platform

Several things were added to make it work with KV-ZCH:

  • Added eviction policies
  • Added KeyValueParam to add parameters to TBE fused_params which is then fed into SSDTableBatchedEmbeddingBags. See _populate_ssd_tbe_params in batched_embedding_kernel and add_params_from_parameter_sharding in distributed/utils.py.
  • Added CacheParams creation to set prefetch_pipeline=True due to warning below.

NOTE: The prefetch_pipeline attribute of CacheParams is set to True, due to the following complaint without it:
{F1983388476,width=300,height=200}

Update on November 11, 2025:

  • The line pipeline.progress(iter(bench_inputs)) is commented out on benchmark_train_pipeline.py due to conflict with pipeline.reset(). This gives an error on the forward pass when using pipeline="prefetch" with KV-ZCH.

Reviewed By: TroyGarden

Differential Revision: D86677315

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 12, 2025

@Ali-Tehrani has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86677315.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 12, 2025
Ali-Tehrani pushed a commit to Ali-Tehrani/torchrec that referenced this pull request Nov 12, 2025
Summary:

Implements KV-ZCH with the benchmarking platform

Several things were added to make it work with KV-ZCH:
- Added eviction policies
- Added KeyValueParam to add parameters to TBE `fused_params` which is then fed into `SSDTableBatchedEmbeddingBags`. See `_populate_ssd_tbe_params` in batched_embedding_kernel and `add_params_from_parameter_sharding` in distributed/utils.py.
- Added CacheParams creation to set `prefetch_pipeline=True` due to warning below.

NOTE: The `prefetch_pipeline` attribute of `CacheParams` is set to True, due to the following complaint without it:
{F1983388476,width=300,height=200}

Update on November 11, 2025:
- The line `pipeline.progress(iter(bench_inputs)) ` is commented out on `benchmark_train_pipeline.py` due to conflict with `pipeline.reset()`. This gives an error on the forward pass when using `pipeline="prefetch"` with KV-ZCH.

Reviewed By: TroyGarden

Differential Revision: D86677315
Summary:

Implements KV-ZCH with the benchmarking platform

Several things were added to make it work with KV-ZCH:
- Added eviction policies
- Added KeyValueParam to add parameters to TBE `fused_params` which is then fed into `SSDTableBatchedEmbeddingBags`. See `_populate_ssd_tbe_params` in batched_embedding_kernel and `add_params_from_parameter_sharding` in distributed/utils.py.
- Added CacheParams creation to set `prefetch_pipeline=True` due to warning below.

NOTE: The `prefetch_pipeline` attribute of `CacheParams` is set to True, due to the following complaint without it:
{F1983388476,width=300,height=200}

Update on November 11, 2025:
- The line `pipeline.progress(iter(bench_inputs)) ` is commented out on `benchmark_train_pipeline.py` due to conflict with `pipeline.reset()`. This gives an error on the forward pass when using `pipeline="prefetch"` with KV-ZCH.

Reviewed By: TroyGarden

Differential Revision: D86677315
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant