How to set ParallelConfig and SchedulerConfig?

Is  ParallelConfig.pipeline_parallel_size  used on multiple gpu cards? Can it be set to the number of GPU cards? Does it relate to processing multiple prompts and generating multiple results in parallel? For example, if there are 2 gpu cards and 7 requests, will it distribute the 7 requests simultaneously to the 2 gpu cards? How is the allocation done? Also, what do the parameters "max_num_batched_tokens" and "max_num_seqs" represent in SchedulerConfig? How can I set it to preserve longer context?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to set ParallelConfig and SchedulerConfig? #361

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

How to set ParallelConfig and SchedulerConfig? #361

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions