Skip to content
This repository has been archived by the owner on Nov 29, 2023. It is now read-only.

Optionally recompute the evaluation cascade asynchronously #881

Merged

Conversation

eyelidlessness
Copy link
Contributor

@eyelidlessness eyelidlessness commented Apr 19, 2022

Async evaluation cascade

Currently under the feature flag config.experimentalOptimizations.computeAsync, this allows the evaluation cascade to be performed asynchronously. This improves UI responsiveness for expensive calculations, especially when chained. Currently the feature flag can be enabled dynamically by appending &computeAsync to the URL when running the validation app.

This is achieved using requestIdleCallback1, which calls the callback on when the event loop is idle or after timeout milliseconds, whichever comes first. The default timeout of 16 is approximately the number of milliseconds which would achieve 60 frames per second. This was chosen over requestAnimationFrame because the latter is higher priority, calling the callback before the next tick of the event loop.

Risks of async

While I don't believe this specific change introduces a possibility of a race condition, it's possible I have missed a case where it would. As MDN notes:

Functions are generally called in first-in-first-out order; however, callbacks which have a timeout specified may be called out-of-order if necessary in order to run them before the timeout elapses.

In my experience, this has not been the case. I expect that's because callbacks requested with the timeout value are effectively placed in a priority queue. When timeout is always the same value, the priority should always be the same. But there's a possibility I've misunderstood this and the evaluation cascade order may be unstable with this setting.

My goal in calling this out is:

  • To have my understanding of this risk validated/scrutinized in review
  • If we determine the risk is as low as I believe, consider whether to remove the feature flag and make this the default behavior
  • Otherwise, consider ways we can have users test/validate the behavior

Validation of perceived performance with async

This change also (likely temporarily, pending #870) reintroduces an option config.forceClearNonRelevant to clear non-relevant values. This flag can also be enabled by appending &clearNonRelevant to the validation app URL. This differs from #870 in that it will not restore values which have been cleared on questions which become relevant again. This setting was introduced to simulate similar performance issues found in #870 on the large WHO form (also included in this PR).

The perceived performance difference can be observed by comparing:

  1. Open the large WHO form
  2. Click Go to End
  3. Answer "Yes" to "Did the respondent give consent?"
  4. Switch the answer to "No"

What you should see:

  • With both flags off, there's a brief delay on step 3 before the "Yes" radio appears checked. Step 4, and any repetition of steps 3-4 should feel instantaneous.
  • With only &clearNonRelevant, both steps 3 and 4 have a noticeably longer delay, even when repeated.
  • With both flags enabled, steps 3 and 4 should feel instantaneous or nearly so each time.

Other changes and details

In the course of validating these changes against a few other forms found here, I noticed that the SOAR form also included in this PR was quite slow to load.

Initially I considered applying the same async approach to Form#init, but backed out of this because:

  • It would change the method's signature. Although this could be isolated with the feature flag, that would not be possible should this ever become the default approach.
  • I couldn't find an effective way, with the SOAR form, to improve perceived initial load time (roughly analogous to First Contentful Paint) without introducing quite a lot of delayed repaints. This felt like a worse user experience than simply accepting the full loading time delay.

That said, I discovered on that form that the largest performance impact on load was recalculating for each repeat as its template was added to the form. I was able to improve the actual load time significantly (from about 8-10 seconds on my computer to about 2-3 seconds) by performing all of those calculations at once after all repeats are added.

Risks of init repeat calculation changes

I refined this portion of the change several times, with each earlier approach failing certain tests. My hope is that with those tests failing, then passing, there is enough coverage that any other behavior change I might have missed would be caught. But my confidence in that is slightly lower than my confidence in the safety of the async change, so I feel it's important to call that out. Repeat behavior is tested extensively, of course, with many edge cases considered. But I'm hoping review will validate that this change is desired and safe.

I didn't place this change behind a feature flag, mainly because I forgot to do that in haste of opening this PR. I'd be happy to make that change if desired.

Footnotes

  1. Where available. In browsers which don't currently support requestIdleCallback (currently Safari, unless enabled with a flag), it will fall back to setTimeout with no delay.

@eyelidlessness eyelidlessness force-pushed the features/perf-ui-responsiveness branch 2 times, most recently from 082430e to 3a6af5a Compare April 20, 2022 17:12
@eyelidlessness eyelidlessness marked this pull request as ready for review April 20, 2022 18:01
@eyelidlessness eyelidlessness requested a review from MartijnR April 20, 2022 18:04
Copy link
Member

@MartijnR MartijnR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very intriguing!!

I only looked at the code so far, and would love to see this in action (and comment on the risks as you asked). I aim to do so tomorrow. However, I may not manage to do that, and in that case it would take another week at least. So I just wanted to share these initial comments already.

Gruntfile.js Outdated Show resolved Hide resolved
src/js/form.js Outdated Show resolved Hide resolved
src/js/form.js Outdated Show resolved Hide resolved
@MartijnR
Copy link
Member

MartijnR commented Apr 22, 2022

the evaluation cascade order may be unstable with this setting

I am wondering if a slow XPath expression could throw off the order by completing after a fast subsequent expression. I'm guessing e.g. count(instance('d')//a) could be very slow if the instance is large. If so, that would probably be fatal for the async-cascade idea. But I don't know anything about the event loop etc, so hopefully the answer is no.

app.js Outdated Show resolved Hide resolved
src/js/form.js Outdated Show resolved Hide resolved
@MartijnR
Copy link
Member

But my confidence in that is slightly lower than my confidence in the safety of the async change

Fwiw, I cannot think of a problem with the repeat initialization change. Perhaps a subtle undiscovered (but fixable) bug in the caching by getRelatedNodes. But I think that part is ready to merge and does not need a flag.

@eyelidlessness
Copy link
Contributor Author

the evaluation cascade order may be unstable with this setting

I am wondering if a slow XPath expression could throw off the order by completing after a fast subsequent expression. I'm guessing e.g. count(instance('d')//a) could be very slow if the instance is large. If so, that would probably be fatal for the async-cascade idea. But I don't know anything about the event loop etc, so hopefully the answer is no.

I tried this with a sequence of:

  1. A fast function.
  2. A much slower function.
  3. Another fast function.

They always complete in the order queued. The same is true if there are nested calls to requestIdleCallback/setTimeout. In hindsight this makes sense: the previously requested callbacks have already been enqueued at the same priority. This is even true if the fast function in step 3 has a shorter timeout. This strongly suggests the order should actually be stable! The only way it should not be is by using timers with different priorities.

@eyelidlessness eyelidlessness force-pushed the features/perf-ui-responsiveness branch from 5e9c82f to 3a9000f Compare May 3, 2022 17:41
@lognaturel
Copy link
Contributor

It's really nice to see this isolated. Makes a lot of sense and I agree that the only real risk is something unexpected about order. With @eyelidlessness' last exploration, that seems pretty unlikely.

Let's merge this with the flags for now so it's easier to do quality assurance. With what @eyelidlessness has done I'm currently leaning towards releasing without the flag. Of course we'll know more after a QA pass. @MartijnR do you have a strong feeling one way or the other?

@jkuester this performance improvement may be another effort you're interested in. It was prompted by the fact that the relevance change we're working on does impact performance. This would maintain the same perceived performance.

@lognaturel lognaturel merged commit 827c1f3 into enketo:master May 3, 2022
@MartijnR
Copy link
Member

MartijnR commented May 4, 2022

Thanks! Without a flag seems fine with me too.

The forceClearNonRelevant config can be removed after #870 right?

@lognaturel
Copy link
Contributor

Without a flag seems fine with me too.

Fabulous news! We’ll keep pushing on it and #870 for another couple weeks and then hopefully feel ready to release.

The forceClearNonRelevant config can be removed after #870 right?

Yes. We’re aiming to review that one in isolation and merge with its flag for some testing. Then hopefully we can remove all flags.

Do you think there’s value to posting to either or both of the mailing lists to invite folks to try their forms on a staging server? Anywhere else to share?

@MartijnR
Copy link
Member

MartijnR commented May 4, 2022

great!

Do you think there’s value to posting to either or both of the mailing lists to invite folks to try their forms on a staging server? Anywhere else to share?

Yes, that's a good idea and since it includes the repeats change as well, we can entice people with slow forms to try it out and see if the form logic still works as expected. On getodk.org/xlsform?

@lognaturel
Copy link
Contributor

On getodk.org/xlsform

Definitely there. I've been thinking about a good option for forms that need external secondary instances. Maybe I'll ask folks who want to verify those to get in touch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants