Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of task processing #319

Open
tsmgeek opened this issue Aug 23, 2021 · 14 comments
Open

Order of task processing #319

tsmgeek opened this issue Aug 23, 2021 · 14 comments

Comments

@tsmgeek
Copy link

tsmgeek commented Aug 23, 2021

I am reading that order that jobs are dequeued for processing is the order they are submitted to server (single priority).

Does this also apply to tasks which are run in parallel across multiple workers?

I am queuing lots of Tasks and waiting for them to finish but they always process in reverse to which they were added.

I know that there will be slightly different order based on processing time but this just relates to general order they are given to workers rather than completion.

Is this expected in the server or a problem caused by the PECL library that should be submitting the jobs to gearmand?

A - being uniq id set on task
B - counter on worker

          A,  B
COMPLETE: 11, 1: !dlroW olleH
COMPLETE: 10, 2: Hello World!
COMPLETE: 9, 3: !dlroW olleH
COMPLETE: 8, 4: Hello World!
COMPLETE: 7, 5: !dlroW olleH
COMPLETE: 6, 6: Hello World!
COMPLETE: 5, 7: !dlroW olleH
COMPLETE: 4, 8: Hello World!
COMPLETE: 3, 9: !dlroW olleH
COMPLETE: 2, 10: Hello World!
COMPLETE: 1, 11: !dlroW olleH
@esabol
Copy link
Member

esabol commented Aug 23, 2021

What version of gearmand are you using?

Does this also apply to tasks which are run in parallel across multiple workers?

What do you get if you only have one worker?

Is this expected in the server or a problem caused by the PECL library that should be submitting the jobs to gearmand?

I think that's unlikely, but you could try submitting the jobs using something other than PHP to eliminate that as a possibility.

@tsmgeek
Copy link
Author

tsmgeek commented Aug 23, 2021

gearmand 1.1.18-161-ga95d1c1
php-pecl-gearman 2.1.0

Im one version back for gearman, without a changelog its not clear of actual fixes, I would be surprised this has not been an issue affecting anyone else before so im hoping its a PECL issue.

Ile write up a python client test script to rule out PECL being the issue.

@SpamapS
Copy link
Member

SpamapS commented Aug 24, 2021 via email

@tsmgeek
Copy link
Author

tsmgeek commented Aug 24, 2021

Ive checked and using a single client/worker the processing is exactly in the reverse order to which it was added to tasls.

@SpamapS I do not want the order to be perfect but at least be roughly in the same order to which I added them, give or take.
What you are pointing to is that the PECL library is posting them to GM in reverse order, so I should post this issue on the PECL side?

@esabol
Copy link
Member

esabol commented Aug 24, 2021

I think what @SpamapS is saying is that the order that jobs are assigned to workers is FIFO, but the order that the job results are returned to the client is unspecified and not guaranteed. You should not rely on that order being consistent.

Regardless, opening an issue with php/pecl-networking-gearmand seems premature, imho.

Here's what I would do to test this:

  1. Kill all the workers.
  2. Have a client submit a job and then sleep one second. The input value (i.e. the payload) for the job should include a counter.
  3. Repeat step 2 nine more times, incrementing the counter each time, until 10 jobs have been submitted and are sitting in the queue. You should use a separate client process for each job submitted, resulting in 10 clients waiting for their results and 10 jobs sitting in the gearmand queue.
  4. Start N worker(s) registered to accept these jobs. The worker should write to a PID-specific log file the date/time it accepted the job and what the job payload was. Then, it should sleep 1 second before it returns the job result. Include the same date/time that was logged in the job result.
  5. Note the order than the clients receive the job results. Compare with the order in the workers' logs.

Do the above for N=1, 5, and 10.

@tsmgeek
Copy link
Author

tsmgeek commented Sep 22, 2021

I added output results to PECL ticket I created.
It shows that the order of tasks (in a batch using runTasks) processed is always in reverse to which they were added.
Ie if I add three tasks (1,2,3,4,5,6) then call runTasks, they will be run on the workers in the order (6,5,4,3,2,1), if I add multiple workers the order is still reversed but you can see the order change (6,4,5,2,3,1) which varies with how quick it responds, in my situation many of the tasks are roughly the same time to process.
Yes I know its a FIFO buffer, but the task batch is being submitted to the server in reverse order or the task array items are being popped off the end of the array then sent instead of working from the start.
I do not expect perfect ordering but if I add 100 tasks I would like it bear some resemblance of the order I submitted them in.
With smaller batches of 2/3/4 etc it does not matter but as I get to 50/100/200/500 then it does.

@esabol
Copy link
Member

esabol commented Sep 22, 2021

I don’t think it was clear from your initial description that you were submitting a batch of multiple tasks from a single client in one fell swoop. That’s not a typical Gearman use-case, I think. I think the typical use-case is having multiple clients submitting multiple jobs to the job server one job at a time.

Possible workaround: Try submitting the jobs from the client (one at a time) as background tasks instead. Refer to https://www.php.net/manual/en/gearmanclient.dobackground.php for details.

When you call addTask() in PHP, it’s probably adding the task to some internal data structure, presumably a linked list or a buffer. It’s probably adding to the front of the list instead of to the end of the list. I would not be opposed to changing that to append to the end of the list, assuming that functionality is in libgearman. This is strictly speculation, of course. I haven’t looked at the code yet.

@tsmgeek
Copy link
Author

tsmgeek commented Sep 23, 2021

The test is from a single client but in production this process can happen by multiple clients at the same time, but I am really only concerned about it when looking at a single client as what they do does not cross with other clients.

PECL claims that they are just using gearman_client_add_task / gearman_client_run_tasks directly and no queueing within the library itself, so that is pushing the responsibility over to the gearman library itself.

I am not up with C so I can only go by what the plugin dev says.
https://github.com/php/pecl-networking-gearman/blob/69d6b78374fc9914906ecaea0fe919b6903cd526/php_gearman_client.c#L655

Workaround using background tasks is not ideal as it means tracking all the job handles and checking all of them are done etc.
In testing I stored all the tasks in an array then reversed it and added tasks in reverse to prove issue, but that is a hack that starts to become a mess and would mean quite a bit of changes in our code.

@esabol
Copy link
Member

esabol commented Sep 23, 2021

Workaround using background tasks is not ideal as it means tracking all the job handles and checking all of them are done etc.

Yeah, it's more complicated and will take more PHP code, but I'm fairly sure it would work the way you want and it would be the most expedient solution, entirely within your control.

The other workaround option is to just call runTasks() immediately after adding each task, of course, but I take it you don't want to do that for some reason.

@SpamapS
Copy link
Member

SpamapS commented Sep 23, 2021 via email

@tsmgeek
Copy link
Author

tsmgeek commented Sep 23, 2021

Ive been using it for nearly 10 years, I know how the workers/clients model works, we have a system with hundred workers doing image/video processing, various file transfer jobs and other background tasks.

I do not have a hard order dependence on order, just it would be nice that when submitting 100 tasks as a batch that they come back in some sort of order, sure if I have 100 workers then the order will be anything at all, but if I have only 2/5 workers it would be good that it comes back in similar order to what it was queued up.

I am not sure if something is being lost in translation, its 100% processed in reverse to how it was added as tasks before submitting the task batch. I am not saying to do any thing other than push the tasks to the server in the order that they were added in the first place when running the task batch so they are taken off the stack by workers in roughly the same order. I am aware processing time will mean that results can come back out of order.

client - addTask - 1
client - addTask - 2
...
client - addTask - 99

worker 1 - task 99
worker 2 - task 98
worker 1 - task 97
worker 1 - task 96
worker 2 - task 95
worker 3 - task 94
worker 2 - task 93
etc

instead of
worker 1 - task 1
worker 2 - task 2
worker 1 - task 3
worker 1 - task 4
worker 2 - task 5
worker 3 - task 6
etc

@SpamapS
Copy link
Member

SpamapS commented Sep 23, 2021 via email

@octavn
Copy link

octavn commented May 12, 2022

I agree with the original poster in that the jobs should be assigned to workers in the general order that they came in. And that's what we see (FIFO behavior) with background jobs on Gearman 1.1.18 .

This order makes more sense when you have long queues that need hours to process. Here's our story:

We use Gearman to process tens of thousands of recordings per day, but one day we had to process hundreds of thousands. We also use two priorities: high and low (and only one function). Because there was a flood of recordings, Gearman kept shipping the high-priority jobs to workers 1st. Because workers could not keep up, the low-priority jobs stacked up. All this is normal and expected. Once the flood ended, workers quickly picked up and finished any high-priority jobs and started work on the low-priority ones. At this point, there were let's say about 24 hours worth of (low priority) recordings/jobs waiting to be processed. From our processing_start timestamp, we can see that Gearman shipped the oldest jobs from this stack 1st. This was the expected behavior. In a FILO order, the 1st recordings/jobs would have been further penalized but with FIFO, everyone in the low-priority queue got the same treatment.

@SpamapS
Copy link
Member

SpamapS commented Jun 3, 2022

Let's be really clear though: Jobs are assigned to workers in the order they are given to the server.

However, the task system in libgearman is an abstraction above jobs, and sends these "tasks" as jobs. It sends them all at one time, and it happens to send them LIFO.

This order isn't really defined in the docs. It only says that the task is added to the client structure, and ... well now .. I found a funny doc bug:

26a8bde

Now, with that fixed, the way it is intended to work is that a bunch of tasks are added and then sent to the servers all at once. Making it FIFO would be a feature change and I'm not against it but it deserves a proper reason. If you want to send them in a particular order, you can now, just use gearman_client_do_background.

Anyway, this isn't a bug, but I will leave it here as an incomplete enhancement request. If anyone wants to make clear what the purpose is, and write up the patch, it will of course be considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants