Description
Co-Author @kkhatua
#8879 - [RFC] High Level Vision for Core Search in OpenSearch
#11061 - [RFC] Query Sandboxing for search requests
A common challenge with managing resources on OpenSearch clusters has been keeping runaway queries in check. With Search Backpressure, the ability to avoid running out of resources is now available, but there is no capability of protecting tenants who might be unfairly penalized. The goal of this RFC is to propose a mechanism for how admin users will be able to organize tenants into different groups (aka Sandboxes), and limit the cumulative resource utilization of these groups. We will mention an idea of how the sandbox is enforced, but will likely be a separate RFC due to its complexity.
What is a Sandbox ?
A sandbox is a logic construct designed for running search requests within the virtual limits. The sandboxing service will track and aggregate a node's resource usage for different sandboxes, and based on the mode of containment, limit or terminate tasks of a sandbox that's exceeding its limits.
A sandbox's definition is stored in cluster state, hence the limits that need to be enforced are available to nodes across the cluster.
Tenets
- A sandbox can provide constraints on one or more resources.
- A user may hint preference for a sandbox
- A request can map to only one sandbox
- A request that maps to multiple sandboxes will be assigned to a sandbox with the highest similarity score
- A request maps to no sandbox or multiple sandboxes with the same similarity score will be assigned to a system-defined catchAll sandbox called genPop (general population).
- The mapping attributes of a sandbox must be unique as compared to other sandboxes
- The sandbox thresholds for a resource cumulatively cannot exceed 100%
Schemas
Sandbox Definition
The following is an abstract example of a sandbox’s definition, and is broadly broken into 4 essential elements within the document
{
"name": "<nameOfSandbox>",
"attributes": {
"attributeA": "<condition1,condition2>",
...
},
"resources": {
"resourceX": {
"allocation": "<floatValue:(0-1)>",
"threshold": "<floatValue:(0-1)>"
},
...
},
"enforcement": "<monitor|soft|hard>"
}
- name : defines a unique String identifier. Internally a UUID is used
- attributes : a collection of unique attributes that need to be matched by a request to be governed by that sandbox. We are considering user_name, indices_name, role, and custom_id
- resources : The set of unique resources for which this sandbox is being defined. We are considering jvm and cpu
- enforcement : How is the sandbox being enforced. This can be either monitor or soft or hard
Resource Definition
For each resource, a cluster level schema is also required, and the following is an abstract example
{
"resources": {
"resourceX": {
"threshold": {
"rejection" : "<floatValue:(0-1)>",
"cancellation" : "<floatValue:(0-1)>"
},
"enforcement": "<monitor|soft|hard>"
},
...
}
}
- resources : The limits of a resource within which all the sandboxes must cumulatively operate before requests get rejected or cancelled. rejectionThreshold <= cancellationThreshold
- enforcement : How the containment of sandboxes for this resource will be managed. This can be either monitor or soft or hard , and can potentially override a sandbox’s enforcement.
High Level Flow
A request landing on a coordinator node will first need to be mapped to a sandbox, as per the tenets. Once a sandbox has been mapped, all child tasks spawned from the request will also inherit the sandbox allocation, irrespective on which node it runs.
data:image/s3,"s3://crabby-images/83611/83611774fd5d30f052e881268f4e718029b896da" alt="HLF"
Sandbox Resolution
The sandbox resolution happens on the coordinator node and will persist with all the tasks (and child tasks) of that request for the entirety of its lifecycle.
data:image/s3,"s3://crabby-images/91169/91169f19ab307817e17e61710f79d0b440376b87" alt="SCR-20240215-mfff"
Thresholds Enforcement
As the sandboxes are enforced at a node level, for each resource, there are 2 thresholds defined cluster-wide:
- rejection — Reject new tasks if this threshold is breached
- cancellation — Cancel inflight tasks if this threshold is breached
data:image/s3,"s3://crabby-images/3fe3d/3fe3d33c2e1236eabf4b2e73bbdd2f8e74dcbca7" alt="thresholds life cycle"
Each Sandbox level thresholds are always proportional to the node level thresholds
The following is an example where
- 2 sandboxes have 20% and 25% thresholds for a resource, and the rest (100% - 20% - 25% = 55% ) is part of the general population (aka genPop) sandbox.
- If a node exceeds 85% usage for a resource, it will actively reject new tasks destined for the overflowing sandboxes.
- If the resource usage exceeds 95%, the node will actively cancel inflight tasks on the overflowing sandboxes
data:image/s3,"s3://crabby-images/7a80c/7a80cfeac9c3a83845aa85c86009d66a9fcef638" alt="all sandboxes together"
Additional Context
This RFC aims to discuss ideas at a high level. More details are provided in this google doc. Anyone with the link has comment access and we would love to gather feedback.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Now(This Quarter)
Status
In Progress
Activity