Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace List with Pairing Heap in Task Queue #584

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Hassan-A
Copy link

@Hassan-A Hassan-A commented Oct 1, 2023

#379
Tried to keep it as similar to List as possible.

Not sure how to do the benchmarks

"4. Add benchmarks for schedule_at and reschedule_at and compare results for old and new implementations."

@Hassan-A Hassan-A marked this pull request as ready for review October 1, 2023 09:52
@gavv
Copy link
Member

gavv commented Oct 1, 2023

Nice, thanks for PR! We will review it in upcoming days.

Not sure how to do the benchmarks

I think we can merge this PR without benchmarks too, but in case you're interested, here are details.

Example of benchmarks for control task queue is here: https://github.com/roc-streaming/roc-toolkit/blob/master/src/tests/roc_ctl/bench_task_queue_contention.cpp

That specific benchmark simulates case of high contention, when many threads schedule tasks simultaneously. It allows to verify that schedule time (time spend in schedule call) is not growing because of contention.

In benchmark for this task we would need to test another case: schedule many tasks with deadline sequentially, to verify that schedule time is not growing because of having many queued tasks. We should see that it was growing linearly before this PR and now schedule time is O(1) (and hopefully is not high).

There are also some docs on benchmark invocation here: https://roc-streaming.org/toolkit/docs/building/developer_cookbook.html

@gavv gavv added the ready for review Pull request can be reviewed label Oct 1, 2023
@@ -0,0 +1,349 @@
/*
* Copyright (c) 2015 Roc Streaming authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: please update copyrights in newly added files to 2023.

//! element ownership when it's added to the pairing heap and release ownership when it's
//! removed from the pairing heap.
template <class T, template <class TT> class OwnershipPolicy = RefCountedOwnership>
class PairingHeap : public NonCopyable<> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface of this class is quite low-level as it reflects the underlying algorithm.

I suggest to use different approach: provide much more narrow interface which is actually needed for the user.

What we need in roc_ctl is a classic priority queue with just three operations:

  • push (insert element according to its priority)
  • top (look up element with highest priority)
  • pop (remove element with highest priority)

So I suggest to rename this class to PriorityQueue and hide all internals behind these high-level operations.

Benefits:

  • interface will be much simpler to understand and use, and there will be less bugs in usage
  • interface will depend on algorithm properties, but not on specific algorithm; we will state that it has cheap push and more expensive pop, but we won't attach caller to specific data structure and can change it if needed

To achieve this, we'll need to tell PriorityQueue how it can get priority of T. We can use the same approach as we use in hashmap: T will have to implement necessary methods, for example:

   // get object priority
   Prio prio() const;

   // compare two priorities (return -1, 0, or 1)
   static int prio_cmp(Prio prio1, Prio prio2);

(See also comments in Hashmap).

I think it would be pretty convenient, since this is intrusive data structure, and we anyway place special requirements on elements (inherit from node class).

What do you think?

Copy link
Author

@Hassan-A Hassan-A Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gavv I will try! 😅

questions:

* pop (remove element with highest priority) 

Are we assuming we never remove / change deadline of elements other than the top one? Since there is ControlTaskQueue::cancel_task_ function, I wasn't too sure.

 static int prio_cmp(Prio prio1, Prio prio2);

If this returns 0 I'm assuming priority is equal. Do we need to make sure which task goes first? or can we do either one. This came up because when I was testing with:
TEST(task_queue, schedule_at_same_deadline) {...}
it assumes the task that was scheduled first would be in front of the other one. However I don't think we can guarantee this using a heap because of how heap merge works.

Example of benchmarks for control task queue is here: https://github.com/roc-streaming/roc-toolkit/blob/master/src/tests/roc_ctl/bench_task_queue_contention.cpp

my contention test freezes here:
contention
is it just my side?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good questions,

Are we assuming we never remove / change deadline of elements other than the top one? Since there is ControlTaskQueue::cancel_task_ function, I wasn't too sure.

My bad, we indeed need to remove tasks. The interface then can be:

  • void push(T&)
  • T* top()
  • void pop()
  • bool contains(T&)
  • void remove(T&)

BTW, it would be better to use naming similar to List and Hashmap:

  • void insert(T&)
  • T* front()
  • void pop_front()
  • bool contains(T&)
  • void remove(T&)

If this returns 0 I'm assuming priority is equal.

Yep, like in strcmp() and memcmp(). Another option is to make it static bool prio_less(Prio prio1, Prio prio2) and return true if prio1 < prio2. Choose whatever you think fits better.

Do we need to make sure which task goes first? or can we do either one.

I think it would be useful to have this guarantee, because it will prevent some easy to make bugs in user code.

Imagine there is a code that calculates next deadline for some event (e.g. periodic timer), and then runs some logic that schedules a few tasks for that deadline, like:

next_deadline_ = ...;

...

if (need_x) {
   schedule_at(next_deadline_, task_x);

   ...

   if (also_need_y) {
      schedule_at(next_deadline_, task_y);
   }
}

It would be easy to forget that order of task_x and task_y is not guaranteed, especially given that it's guaranteed for regular schedule (without deadline).

However I don't think we can guarantee this using a heap because of how heap merge works.

But how do you handle it currently? I guess it works somehow because tests pass on this PR?

Also, if needed, we could upgrade PairingHeap (PriorityQueue) so that each node will contain a list of nodes with the same priority. When you add node, and another node with same prio already exists, it will be added to that list. And the list will maintain insertion order.

my contention test freezes here:

This indicates a bug in benchmarks I think. They work on my machine and a few others where I tested them. (BTW they take quite a lot of time).

It would be great if you can debug the reason of the freeze because I can't reproduce it. Anyway it should be done separately from this PR I think.

@gavv gavv added needs revision Pull request should be revised by its author contribution A pull-request by someone else except maintainers and removed ready for review Pull request can be reviewed labels Oct 2, 2023
@gavv
Copy link
Member

gavv commented Oct 14, 2023

Added hacktoberfest-accepted just in case you need it.

@github-actions github-actions bot added the needs rebase Pull request has conflicts and should be rebased label Nov 19, 2023
Copy link

🤖 The latest upstream change made this pull request unmergeable. Please resolve the merge conflicts.

@gavv
Copy link
Member

gavv commented Nov 19, 2023

Hi, I'm preparing 0.3.0 release and have rebased develop on master (this workflow is described here).

Please reset develop in your fork to up-to-date version and rebase your PR on that. (You can use git rebase --onto or just cherry-pick your commits.)

@github-actions github-actions bot removed the needs rebase Pull request has conflicts and should be rebased label Dec 22, 2023
@gavv
Copy link
Member

gavv commented Dec 22, 2023

Hi, I've rebased PR on fresh develop to keep it alive.

Copy link

🤖 The latest upstream change made this pull request unmergeable. Please resolve the merge conflicts.

@github-actions github-actions bot added the needs rebase Pull request has conflicts and should be rebased label May 14, 2024
@github-actions github-actions bot removed the needs rebase Pull request has conflicts and should be rebased label May 30, 2024
@gavv
Copy link
Member

gavv commented May 30, 2024

Rebased on fresh develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution A pull-request by someone else except maintainers hacktoberfest-accepted needs revision Pull request should be revised by its author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants