Skip to content

Add logic to stream weights in EmbeddingKVDB #4058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chouxi
Copy link

@chouxi chouxi commented Apr 30, 2025

Summary:
X-link: pytorch/torchrec#2930

Gated by enable_raw_embedding_streaming
Add the logic to send the passed in tensors to TrainingParameterServerService thrift service in EmbeddingKVDB
The passed in

  • table_names to get the table FQN when streaming
  • table_offsets to get the global row id across TBEs.
  • table_sizes to get size of each table in TBE to infer which table a specific row belongs to.
  • ps_server_port is the port that runs the local TrainingParameterServerService to stream tensors to.

It creates a new thread weights_stream_thread_ in EmbeddingKBDB to stream the weights out of trainers asynchronously.

Differential Revision: D73792631

chouxi added 2 commits April 30, 2025 16:01
…ytorch#4053)

Summary:
X-link: facebookresearch/FBGEMM#1138


X-link: pytorch/torchrec#2928

As titled, add this option all the way to gate the upcoming changes of raw embedding streaming in SSDTBE.

Differential Revision: D73691088
Summary:
X-link: pytorch/torchrec#2930

Gated by enable_raw_embedding_streaming
Add the logic to send the passed in tensors to `TrainingParameterServerService` thrift service in EmbeddingKVDB
The passed in
- `table_names` to get the table FQN when streaming
- `table_offsets` to get the global row id across TBEs.
- `table_sizes` to get size of each table in TBE to infer which table a specific row belongs to.
- `ps_server_port` is the port that runs the local `TrainingParameterServerService` to stream tensors to.

It creates a new thread `weights_stream_thread_` in EmbeddingKBDB to stream the weights out of trainers asynchronously.

Differential Revision: D73792631
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73792631

Copy link

netlify bot commented Apr 30, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 0eb120e
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6812abd5b7a4d3000880bec6
😎 Deploy Preview https://deploy-preview-4058--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants