Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I-controller for wait_until lag #413

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

I-controller for wait_until lag #413

wants to merge 14 commits into from

Conversation

soyerefsane
Copy link
Collaborator

This PR implements an I-controller for wait_until function internal lag, as discussed in #332. Currently the I-controller's gain is only configured for Linux based platforms, so the other platforms zero out the calculated error control value. The following is the pseudocode of the updated wait_until function:

wait_until(requested_time) {
    static control_value = 0
    adjusted_time = requested time - control value

    if (now > requested_time) return
    else if (now > adjusted_time && now < requested_time) {
        control_value = control_value - K_i * (now - adjusted_time)
        //busy wait and return

    //call lf_cond_timedwait assuming no physical action, it returns at some physical time t'
    control_value = control_value + K_i * (t' - requested_time)
    
    if (now > requested_time)
        // busy wait
    return

I have experimented with different K_i values, and it turns out K_i = 1 causes the lag to oscillate between 60 usec and ~0, K_i < 1 just delays the start of the oscillation, K_i = 2 causes unstable oscillation, and K_i = 1.5 stabilizes the lag. I'm currently not sure why 1.5 works, so if there is any insight you have it would be highly appreciated.

@soyerefsane soyerefsane added the feature New feature label Apr 23, 2024
@soyerefsane soyerefsane self-assigned this Apr 23, 2024
@soyerefsane
Copy link
Collaborator Author

Some helpful graphs with different constant values:

Without any lag controller, but with SCHED_FIFO and real-time priorities
Untitled

When the constant multiplier is 1:
Untitled-2
One can see oscillations better when we look at time vs lag graph:
Untitled

When the constant multiplier is 0.1 (the oscillation starts late):
Untitled-2

When the constant multiplier is 2:
Untitled-2

When the constant multiplier is 1.5:
Untitled

@erlingrj
Copy link
Collaborator

This looks like promising work I have a few questions:

  • Should this type of optimization always be enabled, or is it something that we could enable with a target property? It is only relevant for real-time systems and it comes at a cost.
  • I think it would be better to move the controller implementation out of wait_until function. I think it should be behind a more logical API maybe in clock.c?
  • I am finding this part a little hard to follow
else if (now > adjusted_time && now < requested_time) {
        control_value = control_value - K_i * (now - adjusted_time)
        //busy wait and return

A scenario that would trigger this is
now=90msec
requested_time=100 msec (e.g. in 10 msec)
control_value=20 msec
adjusted_time= 80 msec.

I don't see why we would want to update the control_value here, we do not have any information regarding the "wakeup overhead" which is what we want to measure and account for.

@soyerefsane
Copy link
Collaborator Author

Thanks for the feedback @erlingrj

  • I'm happy to put this under a target flag; the newly implemented thread policy property would probably be a good candidate. Though I don't really see what's the cost of employing it for everything.
  • For moving it out of the wait_until function, I don't see a very good reason to move it, to be honest. The only other place I can think of moving the control algorithm is to the lf_cond_timedwait function (as it's the underlying sleep function). However, I think it makes more inherent sense with the wait_until function since it also fully gets rid of the MIN_SLEEP_DURATION logic from the wait_until function -- which was arbitrarily set.
  • I can see how the else if case is a little harder to digest and unintuitive -- I also didn't originally have it. The logic behind it is as follows:
    • If adjusted_time < now && requested_time > now, it doesn't make sense to call the underlying sleep function, because the control value implies that if we call the underlying sleep function, we will see a lag. Therefore in this else if case, we would want to busy wait. [This was the original algorithm I had -- busy waiting under the else if case without updating the control value]
    • Why do we want to update the control value? Consider the scenario where control_value gets really large (due to an outlier wait time). Then without any updates for the control value, it's possible that for the rest of the program, you would just keep falling to busy_wait without utilizing the underlying sleep schedule. This is not a desired behavior. Therefore, Edward suggested reducing the control_value by the amount it overshot the current physical time.
      • Constantly busy-waiting also causes the OS scheduler to take over control later (since we are not calling the sleep function ourselves -- which causes larger side lags.
      • Though I agree that within the example that you had written down, there is no inherent reason to change the control value.

@erlingrj
Copy link
Collaborator

erlingrj commented Apr 25, 2024

  • The cost of this optimization is that you wake up early and busy-wait which are cycles that could be spent on something else.
  • I am not talking about where the control value should be applied, but where the control logic should be implemented. I think it makes sense to move the implementation to a separate function that only computes the new control value based on the error. I suggest to create a more generic PID controller module that we could also use for other purposes in the runtime
  • OK, this makes sense. Probably there should be something like a moving average filter on the errors and outlier rejection. We probably don't want an unfortunate page fault to trigger us to wake up milliseconds early next time.

@soyerefsane
Copy link
Collaborator Author

  • Yes, this makes sense. I think I can easily put this under the suggested thread-policy target property` and make it only used when the underlying scheduler is set to SCHED_FIFO or SCHED_RR.
  • I see your point on abstracting the I controller away. At the same time, I'm not sure how this would work with the else if case also updating the control value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants