Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use priority queue in ctl::TaskQueue #379

Open
gavv opened this issue May 24, 2020 · 17 comments
Open

Use priority queue in ctl::TaskQueue #379

gavv opened this issue May 24, 2020 · 17 comments
Assignees
Labels
algorithms Algorithms and data structures easy hacks The solution is expected to be straightforward even if you are new to the project help wanted An important and awaited task but we have no human resources for it yet performance

Comments

@gavv
Copy link
Member

gavv commented May 24, 2020

Problem

ctl::TaskQueue is a thread-safe task queue, allowing to schedule task for immediate or delayed execution.

Tasks for immediate execution are stored in a lock-free FIFO. Tasks for delayed execution are stored in a sorted linked list. Thus, while we have O(1) pop, the insert operation into a sorted list is O(N), which does not scale to big numbers of tasks.

The typical data structure used in such cases is a priority queue implemented using some heap variation.

Solution

  1. Select a priority queue algorithm meeting our requirements.
  2. Add an implementation to roc_core and cover it with unit tests.
  3. Use it instead of core::List in ctl::TaskQueue.
  4. Add benchmarks for schedule_at and reschedule_at and compare results for old and new implementations.

Requirements

Requirements to the priority queue implementation:

  • It should be intrusive, i.e. it should not perform allocations by itself, but instead should use data embedded into nodes. The same approach is used for core::List, core::MpscQueue, and core::SharedPtr. The limitation of such approach is that a node can be added only to one container, but this is okay for us.

  • It should have O(1) top/pop.

  • Ideally, it should have amortized O(1) insert, but O(log n) is also acceptable.

  • It should have O(1) membership check (check whether the element belongs to the queue). This can be implemented by just storing a pointer to the containing queue in each node, like we do for core::List.

  • Whenever possible, the implementation should be compact and readable.

  • It should NOT be thread-safe. Access to it is already serialized by ctl::TaskQueue.

Notes

Other TaskQueue optimizations:

@gavv gavv added performance help wanted An important and awaited task but we have no human resources for it yet labels May 24, 2020
@gavv gavv added the algorithms Algorithms and data structures label May 25, 2020
@Ewaolx
Copy link

Ewaolx commented Jul 4, 2020

Hi, it seems like all the variants of priority queue has O(log n) delete operation(pop) and some of the variants has O(1) insert operation. So, does this requires O(1) delete operation?

@gavv
Copy link
Member Author

gavv commented Jul 4, 2020

@Ewaolx Indeed, thanks for noticing. According to this thread, when choosing between various heaps and trees, we'll have to pay O(log N) either for insert() or for pop(), which makes sense.

In our case, ctl::TaskQueue is used to schedule background low-priority work from real-time threads. Hence, if we should choose between faster insert() or faster pop(), insert() is preferred, since it's usually called when a task is scheduled from real-time thread, and pop() is usually called from TaskQueue background thread.

@Ewaolx
Copy link

Ewaolx commented Jul 4, 2020

Sounds good. In this case, we may have something like Fibonnaci heap which provides us insert in O(1), and O(log N) for pop operation. Also, I see currently sorted linked list is being used for core:List in ctl::TaskQueue, so that would be replaced by heap and consequently we will have heap insert and delete instead of linked list's, right?

@gavv
Copy link
Member Author

gavv commented Jul 5, 2020

In this case, we may have something like Fibonnaci heap which provides us insert in O(1), and O(log N) for pop operation.

Looks like a good option! Are you going to work on this?

Except theoretical big-O complexity, we should also ensure that the actual task insertion performance wont hurt in the good case, where there are not much pending tasks. So we'll need a small benchmark for TaskQueue schedule.

Quick look at the Fibonacci heap algorithm suggest that it should be okay.

Also, I see currently sorted linked list is being used for core:List in ctl::TaskQueue, so that would be replaced by heap and consequently we will have heap insert and delete instead of linked list's, right?

Yes. IIRC, we'll need to replace three methods: fetch_sleeping_task_, insert_sleeping_task_, remove_sleeping_task_.

@Ewaolx
Copy link

Ewaolx commented Jul 5, 2020

Okay, got it. Thanks for the update.

Yes, I can start working on this. I see roc_ctl is under develop branch, and not on master. So are the guidelines same for building and running for develop branch?

@gavv
Copy link
Member Author

gavv commented Jul 5, 2020

Great! Develop branch is quite ahead of master and the guidelines were changed. They're not published on the web site (it's built from master), but you can find them in git:

@MBkkt
Copy link

MBkkt commented Jul 19, 2020

something like Fibonnaci heap

I don't recommend using this algorithm, it's good in theory, but in practice there is a huge constant. I advise you to try a regular binary heap, and only if the performance turns out to be insufficient, you should look for optimizations

@gavv
Copy link
Member Author

gavv commented Jul 19, 2020

@MBkkt thanks for the notice!

I haven't work much with heaps by myself, but quick googling and even Wikipedia article on Fibonacci heap confirms your point.

I feel that Binary heap isn't very well for us because it has O(log n) insertion, while in our specific case insertion speed is the most critical, since tasks are inserted from latency-sensitive threads.

Wikipedia and this page also suggest Pairing heap as a better (faster in practice) alternative to Fibonacci heap. Actually even theoretically it seems better for our case because Pairing heap insertion seems to be O(1) even in the worst case, while Fibonacci heap insertion is only amortized O(1). The implementation of the Paring heap also looks simpler, arguably.

@Ewaolx what do you think? How far did you get on the Fibonacci heap implementation?

@MBkkt
Copy link

MBkkt commented Jul 20, 2020

I understand the concern, but what is the order of the number n? Are you sure the insert will be the bottleneck? In general, given that you want an intrusive queue, probably a binary heap is really not suitable.

@Ewaolx
Copy link

Ewaolx commented Jul 21, 2020

@gavv Sorry for late reply, but as you mentioned I also think if we want strictly O(1) amortized for insertion then binary heap would not work. Also, pairing heap sounds like a good option if we are concerned about the Fibonnaci's practical performance. We can go ahead with pairing heap in this case.

Let me know what do you think. Thanks.

@gavv
Copy link
Member Author

gavv commented Aug 6, 2020

I understand the concern, but what is the order of the number n? Are you sure the insert will be the bottleneck? In general, given that you want an intrusive queue, probably a binary heap is really not suitable.

Usually the number of tasks should be low. However, the key point here is that it can grow uncontrollably during load peaks, and at the same time we don't want latency to grow uncontrollably.

We don't optimize every single piece of code, but try to follow design principles which would keep us from ending up with code that we're unable to optimize at all.

One of such principles chosen in this project is that delays in one thread (no matter of their reason) should not cause delays in other threads. Such design decouples threads in terms of latency and delays, so when it comes to performance problems, you can review threads in isolation, which is much much simpler.

TaskQueue is used for communication between pipeline and control threads. If it's not O(1), then delays in control thread will lead to accumulating tasks, thus causing longer insertion times, thus probably causing delays in pipeline threads. And probably not, who knows. You'll have to profile it each time you're debugging glitches.

So it's very attractive just to use O(1) data structure and be 100% sure that TaskQueue insertion always takes fixed time and control thread is guaranteed not to affect pipeline thread. Then, if you hear glitches, you can just profile pipeline thread in isolation.

As for the specific numbers, as mentioned in the issue, it would be nice to add benchmarks and measure insertion times on different queue sizes.

@gavv
Copy link
Member Author

gavv commented Aug 6, 2020

@gavv Sorry for late reply, but as you mentioned I also think if we want strictly O(1) amortized for insertion then binary heap would not work. Also, pairing heap sounds like a good option if we are concerned about the Fibonnaci's practical performance. We can go ahead with pairing heap in this case.

Let me know what do you think. Thanks.

Yeah, let's try a pairing heap then.

@ashishsiyag
Copy link

Hey @Ewaolx are you working on pairing heap implementation ?

@Hassan-A
Copy link

Hello @gavv , this seems inactive. Mind if I try as well?

@gavv
Copy link
Member Author

gavv commented Sep 23, 2020

Mind if I try as well?

Hi, sure.

@divyavit
Copy link

divyavit commented Oct 1, 2020

Hi, I was interested in working on it and wanted to give it a try.. has it already been done ?

@Hassan-A
Copy link

Hassan-A commented Oct 2, 2020

Hi @divyavit , still in progress.

@gavv gavv unassigned Ewaolx Mar 7, 2021
@gavv gavv added the easy hacks The solution is expected to be straightforward even if you are new to the project label Sep 22, 2023
@gavv gavv added this to Roc Toolkit Jul 6, 2024
@gavv gavv moved this to Help wanted in Roc Toolkit Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithms Algorithms and data structures easy hacks The solution is expected to be straightforward even if you are new to the project help wanted An important and awaited task but we have no human resources for it yet performance
Projects
Status: Help wanted
Development

No branches or pull requests

6 participants