-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace List with Pairing Heap in Task Queue #584
base: develop
Are you sure you want to change the base?
Conversation
Nice, thanks for PR! We will review it in upcoming days.
I think we can merge this PR without benchmarks too, but in case you're interested, here are details. Example of benchmarks for control task queue is here: https://github.com/roc-streaming/roc-toolkit/blob/master/src/tests/roc_ctl/bench_task_queue_contention.cpp That specific benchmark simulates case of high contention, when many threads schedule tasks simultaneously. It allows to verify that schedule time (time spend in schedule call) is not growing because of contention. In benchmark for this task we would need to test another case: schedule many tasks with deadline sequentially, to verify that schedule time is not growing because of having many queued tasks. We should see that it was growing linearly before this PR and now schedule time is O(1) (and hopefully is not high). There are also some docs on benchmark invocation here: https://roc-streaming.org/toolkit/docs/building/developer_cookbook.html |
@@ -0,0 +1,349 @@ | |||
/* | |||
* Copyright (c) 2015 Roc Streaming authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: please update copyrights in newly added files to 2023.
//! element ownership when it's added to the pairing heap and release ownership when it's | ||
//! removed from the pairing heap. | ||
template <class T, template <class TT> class OwnershipPolicy = RefCountedOwnership> | ||
class PairingHeap : public NonCopyable<> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface of this class is quite low-level as it reflects the underlying algorithm.
I suggest to use different approach: provide much more narrow interface which is actually needed for the user.
What we need in roc_ctl is a classic priority queue with just three operations:
- push (insert element according to its priority)
- top (look up element with highest priority)
- pop (remove element with highest priority)
So I suggest to rename this class to PriorityQueue and hide all internals behind these high-level operations.
Benefits:
- interface will be much simpler to understand and use, and there will be less bugs in usage
- interface will depend on algorithm properties, but not on specific algorithm; we will state that it has cheap push and more expensive pop, but we won't attach caller to specific data structure and can change it if needed
To achieve this, we'll need to tell PriorityQueue how it can get priority of T. We can use the same approach as we use in hashmap: T will have to implement necessary methods, for example:
// get object priority
Prio prio() const;
// compare two priorities (return -1, 0, or 1)
static int prio_cmp(Prio prio1, Prio prio2);
(See also comments in Hashmap).
I think it would be pretty convenient, since this is intrusive data structure, and we anyway place special requirements on elements (inherit from node class).
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gavv I will try! 😅
questions:
* pop (remove element with highest priority)
Are we assuming we never remove / change deadline of elements other than the top one? Since there is ControlTaskQueue::cancel_task_
function, I wasn't too sure.
static int prio_cmp(Prio prio1, Prio prio2);
If this returns 0 I'm assuming priority is equal. Do we need to make sure which task goes first? or can we do either one. This came up because when I was testing with:
TEST(task_queue, schedule_at_same_deadline) {...}
it assumes the task that was scheduled first would be in front of the other one. However I don't think we can guarantee this using a heap because of how heap merge works.
Example of benchmarks for control task queue is here: https://github.com/roc-streaming/roc-toolkit/blob/master/src/tests/roc_ctl/bench_task_queue_contention.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good questions,
Are we assuming we never remove / change deadline of elements other than the top one? Since there is ControlTaskQueue::cancel_task_ function, I wasn't too sure.
My bad, we indeed need to remove tasks. The interface then can be:
void push(T&)
T* top()
void pop()
bool contains(T&)
void remove(T&)
BTW, it would be better to use naming similar to List and Hashmap:
void insert(T&)
T* front()
void pop_front()
bool contains(T&)
void remove(T&)
If this returns 0 I'm assuming priority is equal.
Yep, like in strcmp() and memcmp(). Another option is to make it static bool prio_less(Prio prio1, Prio prio2)
and return true if prio1 < prio2
. Choose whatever you think fits better.
Do we need to make sure which task goes first? or can we do either one.
I think it would be useful to have this guarantee, because it will prevent some easy to make bugs in user code.
Imagine there is a code that calculates next deadline for some event (e.g. periodic timer), and then runs some logic that schedules a few tasks for that deadline, like:
next_deadline_ = ...;
...
if (need_x) {
schedule_at(next_deadline_, task_x);
...
if (also_need_y) {
schedule_at(next_deadline_, task_y);
}
}
It would be easy to forget that order of task_x and task_y is not guaranteed, especially given that it's guaranteed for regular schedule (without deadline).
However I don't think we can guarantee this using a heap because of how heap merge works.
But how do you handle it currently? I guess it works somehow because tests pass on this PR?
Also, if needed, we could upgrade PairingHeap (PriorityQueue) so that each node will contain a list of nodes with the same priority. When you add node, and another node with same prio already exists, it will be added to that list. And the list will maintain insertion order.
my contention test freezes here:
This indicates a bug in benchmarks I think. They work on my machine and a few others where I tested them. (BTW they take quite a lot of time).
It would be great if you can debug the reason of the freeze because I can't reproduce it. Anyway it should be done separately from this PR I think.
Added hacktoberfest-accepted just in case you need it. |
🤖 The latest upstream change made this pull request unmergeable. Please resolve the merge conflicts. |
Hi, I'm preparing 0.3.0 release and have rebased develop on master (this workflow is described here). Please reset develop in your fork to up-to-date version and rebase your PR on that. (You can use |
Hi, I've rebased PR on fresh develop to keep it alive. |
🤖 The latest upstream change made this pull request unmergeable. Please resolve the merge conflicts. |
Rebased on fresh develop. |
#379
Tried to keep it as similar to List as possible.
Not sure how to do the benchmarks