Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limiting concurrent query tasks in datanode #4638

Open
sunng87 opened this issue Aug 28, 2024 · 8 comments
Open

Limiting concurrent query tasks in datanode #4638

sunng87 opened this issue Aug 28, 2024 · 8 comments
Labels
C-enhancement Category Enhancements

Comments

@sunng87
Copy link
Member

sunng87 commented Aug 28, 2024

What type of enhancement is this?

Performance

What does the enhancement do?

As suggested by field user, it would be helpful to add a semaphore or something like that to control the queries executed in datanode in parallel.

This can help us to improve quality of service for datanode, when there are a bunch of cpu or memory intensive queries coming concurrently. Experienced users can use this mechanism to protect their datanodes from OOM or overload, based on their load pattern.

Implementation challenges

No response

@sunng87 sunng87 added the C-enhancement Category Enhancements label Aug 28, 2024
@lyang24
Copy link
Contributor

lyang24 commented Aug 29, 2024

This is very Interesting. Are we referring to limit the query concurrency in the context of the limit applies to all of the datanodes or just for a single datanode?

@sunng87
Copy link
Member Author

sunng87 commented Aug 29, 2024

Just for single datanode. The rational is to protect it from overloaded.

@lyang24
Copy link
Contributor

lyang24 commented Aug 29, 2024

I was thinking about implementing a multi queue type of data structure in mito engine to manage workload.
This struct can support managing different tasks (queries, compactions, data rebalancing ...) to datanode. This struct should be able to support add, release, cancel tasks, and maybe support updating concurrency limit in real time (maybe in the sql layer).

would something like this make sense

#[derive(Debug, Clone)]
struct Permit {
    valid: bool,
}

#[derive(Debug, Clone)]
struct Task {
    priority: f64,
    queue_type: i32,
    heap_idx: usize,
    permit_c: Option<mpsc::Sender<Permit>>,
}

struct MultiQueue {
    mu: Arc<Mutex<()>>, 
    concurrency_limit: usize,
    runs_available: usize,
    queue_type_mapping: HashMap<i32, i32>,
    last_queue_index: i32,
    outstanding: Vec<BinaryHeap<Task>>,
}

impl MultiQueue {
   // add new task, Error when heap is full
   fn add(queueType: u32, priority: u8) Error
   // cancel existing task
   fn cancel(t: &Task)
   // when a task is done release the slot
   fn release(t: &Permit)
   // update concurrency limit in runtime
   fn update_limit(new_limit: usize)
}

@sunng87
Copy link
Member Author

sunng87 commented Sep 3, 2024

My idea was straight-forward to include a tokio semaphore on datanode's query interface.
@waynexia WDYT

@waynexia
Copy link
Member

waynexia commented Sep 4, 2024

How about including a db-scope semaphore so that we can limit inflight queries at the db level?

@sunng87
Copy link
Member Author

sunng87 commented Sep 4, 2024

@waynexia in a distributed setup, it can be difficult to make db-scoped bulkhead accurate. I would suggest to keep it simple as the final solution to protect datanode.

@lyang24
Copy link
Contributor

lyang24 commented Sep 9, 2024

could i give this a try?
I think i can add a new optional opt in datanode config to create the Option and put the struct in the RegionServerInner struct, and on handle_request consult the semaphore for permits?

@sunng87
Copy link
Member Author

sunng87 commented Sep 9, 2024

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements
Projects
None yet
Development

No branches or pull requests

3 participants