[core][distributed] support variable length object in shm broadcast #5768

youkaichao · 2024-06-23T07:05:51Z

The shm transport introduced in #5399 , reserve a fixed chunk size (1MB) for each object, which is not flexible, and waste share memory space.

This PR tries to add variable length support. The initialization argument is now, the max total bytes, and the max number of objects (basically queue size). Only when either is full, the enqueue operation will be blocking.

In most cases, it should not block, as we just have a few broadcast in vLLM, and the worker will read them before the next broadcast. The queue size will not grow dramatically.

The variable length support is more complicated than I thought. The high-watermark and low-watermark stuff is quite difficult to get it right. I added a stress test to send over 400MB data in 10K messages, which should give us enough confidence that this implementation is correct.

njhill · 2024-06-23T15:13:53Z

reserve a fixed chunk size (1MB) for each object, which is not flexible, and waste share memory space.

@youkaichao 1MB doesn't sound like much? Could we avoid the additional complexity until it becomes clear that it's needed?

youkaichao · 2024-06-23T17:20:25Z

@njhill I agree this is not urgent now. But the point is not resource waste. The most important thing is how large object we can broadcast. Currently we only broadcast shape/dtype information, in the future we'd like to broadcast the whole input and separate the driver & TP 0 worker, at that time the object we broadcast will be large (contains input ids, block tables, etc.) and it is difficult to guess the upper bound.

Yard1 · 2024-06-24T03:25:37Z

I also think this optimization is premature. RAM is cheap, we can easily set this buffer to be 10, or even 100 MBs. This seems like a lot of complexity for little gain. We should revisit this only if this becomes an issue in the future.

If we cannot guess the upper bound, I think that simply reinitializing the buffer when a message that's too big is encountered with eg. desired size + 50% (up to some upper limit) will be enough.

youkaichao · 2024-06-24T03:34:18Z

Thanks for your opinions ❤️ I’m also afraid this might introduce more bugs, so I’m fine with criticism!

we might need this when we use it for all the batched rpc calls proposed in this rfc , especially with async scheduling and queuing many rpc calls. but anyway, we don’t need this at present. let's revisit later.

youkaichao added 6 commits June 22, 2024 22:27

support varlens

b4ec3ef

support varlens

1dcf70b

add stress tests

789f138

refactor init args

f792cb4

refactor init args

c75c1b4

use different seed

b4402a3

youkaichao closed this Jun 24, 2024

youkaichao mentioned this pull request Jun 24, 2024

[bugfix][distributed] fix shm broadcast when the queue size is full #5801

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[core][distributed] support variable length object in shm broadcast #5768

[core][distributed] support variable length object in shm broadcast #5768

Uh oh!

youkaichao commented Jun 23, 2024

Uh oh!

njhill commented Jun 23, 2024

Uh oh!

youkaichao commented Jun 23, 2024

Uh oh!

Yard1 commented Jun 24, 2024 •

edited

Loading

Uh oh!

youkaichao commented Jun 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[core][distributed] support variable length object in shm broadcast #5768

[core][distributed] support variable length object in shm broadcast #5768

Uh oh!

Conversation

youkaichao commented Jun 23, 2024

Uh oh!

njhill commented Jun 23, 2024

Uh oh!

youkaichao commented Jun 23, 2024

Uh oh!

Yard1 commented Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Jun 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yard1 commented Jun 24, 2024 •

edited

Loading