Optionally recompute the evaluation cascade asynchronously #881

eyelidlessness · 2022-04-19T17:54:10Z

Async evaluation cascade

Currently under the feature flag config.experimentalOptimizations.computeAsync, this allows the evaluation cascade to be performed asynchronously. This improves UI responsiveness for expensive calculations, especially when chained. Currently the feature flag can be enabled dynamically by appending &computeAsync to the URL when running the validation app.

This is achieved using requestIdleCallback¹, which calls the callback on when the event loop is idle or after timeout milliseconds, whichever comes first. The default timeout of 16 is approximately the number of milliseconds which would achieve 60 frames per second. This was chosen over requestAnimationFrame because the latter is higher priority, calling the callback before the next tick of the event loop.

Risks of async

While I don't believe this specific change introduces a possibility of a race condition, it's possible I have missed a case where it would. As MDN notes:

Functions are generally called in first-in-first-out order; however, callbacks which have a timeout specified may be called out-of-order if necessary in order to run them before the timeout elapses.

In my experience, this has not been the case. I expect that's because callbacks requested with the timeout value are effectively placed in a priority queue. When timeout is always the same value, the priority should always be the same. But there's a possibility I've misunderstood this and the evaluation cascade order may be unstable with this setting.

My goal in calling this out is:

To have my understanding of this risk validated/scrutinized in review
If we determine the risk is as low as I believe, consider whether to remove the feature flag and make this the default behavior
Otherwise, consider ways we can have users test/validate the behavior

Validation of perceived performance with async

This change also (likely temporarily, pending #870) reintroduces an option config.forceClearNonRelevant to clear non-relevant values. This flag can also be enabled by appending &clearNonRelevant to the validation app URL. This differs from #870 in that it will not restore values which have been cleared on questions which become relevant again. This setting was introduced to simulate similar performance issues found in #870 on the large WHO form (also included in this PR).

The perceived performance difference can be observed by comparing:

Open the large WHO form
Click Go to End
Answer "Yes" to "Did the respondent give consent?"
Switch the answer to "No"

What you should see:

With both flags off, there's a brief delay on step 3 before the "Yes" radio appears checked. Step 4, and any repetition of steps 3-4 should feel instantaneous.
With only &clearNonRelevant, both steps 3 and 4 have a noticeably longer delay, even when repeated.
With both flags enabled, steps 3 and 4 should feel instantaneous or nearly so each time.

Other changes and details

In the course of validating these changes against a few other forms found here, I noticed that the SOAR form also included in this PR was quite slow to load.

Initially I considered applying the same async approach to Form#init, but backed out of this because:

It would change the method's signature. Although this could be isolated with the feature flag, that would not be possible should this ever become the default approach.
I couldn't find an effective way, with the SOAR form, to improve perceived initial load time (roughly analogous to First Contentful Paint) without introducing quite a lot of delayed repaints. This felt like a worse user experience than simply accepting the full loading time delay.

That said, I discovered on that form that the largest performance impact on load was recalculating for each repeat as its template was added to the form. I was able to improve the actual load time significantly (from about 8-10 seconds on my computer to about 2-3 seconds) by performing all of those calculations at once after all repeats are added.

Risks of init repeat calculation changes

I refined this portion of the change several times, with each earlier approach failing certain tests. My hope is that with those tests failing, then passing, there is enough coverage that any other behavior change I might have missed would be caught. But my confidence in that is slightly lower than my confidence in the safety of the async change, so I feel it's important to call that out. Repeat behavior is tested extensively, of course, with many edge cases considered. But I'm hoping review will validate that this change is desired and safe.

I didn't place this change behind a feature flag, mainly because I forgot to do that in haste of opening this PR. I'd be happy to make that change if desired.

Where available. In browsers which don't currently support requestIdleCallback (currently Safari, unless enabled with a flag), it will fall back to setTimeout with no delay. ↩

MartijnR

This is very intriguing!!

I only looked at the code so far, and would love to see this in action (and comment on the risks as you asked). I aim to do so tomorrow. However, I may not manage to do that, and in that case it would take another week at least. So I just wanted to share these initial comments already.

Gruntfile.js

src/js/form.js

MartijnR · 2022-04-22T14:48:53Z

the evaluation cascade order may be unstable with this setting

I am wondering if a slow XPath expression could throw off the order by completing after a fast subsequent expression. I'm guessing e.g. count(instance('d')//a) could be very slow if the instance is large. If so, that would probably be fatal for the async-cascade idea. But I don't know anything about the event loop etc, so hopefully the answer is no.

app.js

src/js/form.js

MartijnR · 2022-04-22T15:08:40Z

But my confidence in that is slightly lower than my confidence in the safety of the async change

Fwiw, I cannot think of a problem with the repeat initialization change. Perhaps a subtle undiscovered (but fixable) bug in the caching by getRelatedNodes. But I think that part is ready to merge and does not need a flag.

eyelidlessness · 2022-04-25T23:38:51Z

the evaluation cascade order may be unstable with this setting

I am wondering if a slow XPath expression could throw off the order by completing after a fast subsequent expression. I'm guessing e.g. count(instance('d')//a) could be very slow if the instance is large. If so, that would probably be fatal for the async-cascade idea. But I don't know anything about the event loop etc, so hopefully the answer is no.

I tried this with a sequence of:

A fast function.
A much slower function.
Another fast function.

They always complete in the order queued. The same is true if there are nested calls to requestIdleCallback/setTimeout. In hindsight this makes sense: the previously requested callbacks have already been enqueued at the same priority. This is even true if the fast function in step 3 has a shorter timeout. This strongly suggests the order should actually be stable! The only way it should not be is by using timers with different priorities.

…ions

lognaturel · 2022-05-03T21:36:05Z

It's really nice to see this isolated. Makes a lot of sense and I agree that the only real risk is something unexpected about order. With @eyelidlessness' last exploration, that seems pretty unlikely.

Let's merge this with the flags for now so it's easier to do quality assurance. With what @eyelidlessness has done I'm currently leaning towards releasing without the flag. Of course we'll know more after a QA pass. @MartijnR do you have a strong feeling one way or the other?

@jkuester this performance improvement may be another effort you're interested in. It was prompted by the fact that the relevance change we're working on does impact performance. This would maintain the same perceived performance.

MartijnR · 2022-05-04T14:49:35Z

Thanks! Without a flag seems fine with me too.

The forceClearNonRelevant config can be removed after #870 right?

lognaturel · 2022-05-04T18:08:17Z

Without a flag seems fine with me too.

Fabulous news! We’ll keep pushing on it and #870 for another couple weeks and then hopefully feel ready to release.

The forceClearNonRelevant config can be removed after #870 right?

Yes. We’re aiming to review that one in isolation and merge with its flag for some testing. Then hopefully we can remove all flags.

Do you think there’s value to posting to either or both of the mailing lists to invite folks to try their forms on a staging server? Anywhere else to share?

MartijnR · 2022-05-04T18:16:37Z

great!

Do you think there’s value to posting to either or both of the mailing lists to invite folks to try their forms on a staging server? Anywhere else to share?

Yes, that's a good idea and since it includes the repeats change as well, we can entice people with slow forms to try it out and see if the form logic still works as expected. On getodk.org/xlsform?

lognaturel · 2022-05-04T18:31:14Z

On getodk.org/xlsform

Definitely there. I've been thinking about a good option for forms that need external secondary instances. Maybe I'll ask folks who want to verify those to get in touch.

eyelidlessness force-pushed the features/perf-ui-responsiveness branch 2 times, most recently from 082430e to 3a6af5a Compare April 20, 2022 17:12

eyelidlessness marked this pull request as ready for review April 20, 2022 18:01

eyelidlessness requested a review from MartijnR April 20, 2022 18:04

MartijnR reviewed Apr 21, 2022

View reviewed changes

Gruntfile.js Outdated Show resolved Hide resolved

src/js/form.js Outdated Show resolved Hide resolved

src/js/form.js Outdated Show resolved Hide resolved

MartijnR reviewed Apr 22, 2022

View reviewed changes

app.js Outdated Show resolved Hide resolved

MartijnR reviewed Apr 22, 2022

View reviewed changes

src/js/form.js Outdated Show resolved Hide resolved

This was referenced Apr 27, 2022

Cache local transforms #886

Merged

Form init performance: recalculate repeats once #887

Merged

eyelidlessness added 3 commits May 3, 2022 10:41

Optionally recompute the evaluation cascade asynchronously

5945e91

Increase Mocha timeout for test using large WHO form

ca9cbae

Fix build-docs

3a9000f

eyelidlessness force-pushed the features/perf-ui-responsiveness branch from 5e9c82f to 3a9000f Compare May 3, 2022 17:41

Ensure evaluation cascade additions are called async after base funct…

b0cac7b

…ions

lognaturel approved these changes May 3, 2022

View reviewed changes

lognaturel merged commit 827c1f3 into enketo:master May 3, 2022

m5r mentioned this pull request Sep 13, 2023

Update to latest enketo-core medic/cht-core#7599

Closed

jkuester mentioned this pull request Nov 30, 2023

Support computeAsync Enketo functionality medic/cht-core#8741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally recompute the evaluation cascade asynchronously #881

Optionally recompute the evaluation cascade asynchronously #881

eyelidlessness commented Apr 19, 2022 •

edited

Loading

MartijnR left a comment

MartijnR commented Apr 22, 2022 •

edited

Loading

MartijnR commented Apr 22, 2022

eyelidlessness commented Apr 25, 2022

lognaturel commented May 3, 2022

MartijnR commented May 4, 2022

lognaturel commented May 4, 2022

MartijnR commented May 4, 2022

lognaturel commented May 4, 2022

Optionally recompute the evaluation cascade asynchronously #881

Optionally recompute the evaluation cascade asynchronously #881

Conversation

eyelidlessness commented Apr 19, 2022 • edited Loading

Async evaluation cascade

Risks of async

Validation of perceived performance with async

Other changes and details

Risks of init repeat calculation changes

Footnotes

MartijnR left a comment

Choose a reason for hiding this comment

MartijnR commented Apr 22, 2022 • edited Loading

MartijnR commented Apr 22, 2022

eyelidlessness commented Apr 25, 2022

lognaturel commented May 3, 2022

MartijnR commented May 4, 2022

lognaturel commented May 4, 2022

MartijnR commented May 4, 2022

lognaturel commented May 4, 2022

eyelidlessness commented Apr 19, 2022 •

edited

Loading

MartijnR commented Apr 22, 2022 •

edited

Loading