[Core] Extend chaos testing utility to more closely replicate spot instance preemptions

### Description

We're running a large-scale batch inference job on spot instances and trying to use as many GPUs as possible. We've observed this pattern for preemptions:
![image](https://github.com/ray-project/ray/assets/26107013/22ab6921-cdb3-41d9-ae93-1dee817e72b5)

To more closely replicate this pattern, it'd be helpful if the chaos testing utility could:
1) Kill many nodes at once (e.g., ~200 above)
2) Prevent the cluster from scaling back up (to mimic spot instances being unavailable)

### Use case

See above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Extend chaos testing utility to more closely replicate spot instance preemptions #46367

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Core] Extend chaos testing utility to more closely replicate spot instance preemptions #46367

Description

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions