Skip to content

Strategy for dealing with Stack Overflows #696

Closed
@aearly

Description

@aearly

Stack overflow issues have come up frequently with async. Most of the time, they come from using synchronous iteration functions with large inputs, e.g. async.eachSeries(Array(1000000), function () { callback(); }, done). (#66, #75, #110, #117, #133, #296, #413, #510, #573, #588, #590, #604, #622) Some of those issues aren't caused by large arrays exactly, but most of the time they boil down to using a sync iteration function.

A common way to fix the problem is to use an async.nextTick() internally to re-set the call-stack for the next step of iteration. This masks the problem, and slows performance. In node and later IE's, you can use setImmediate or process.nexTick which is very fast, but in other browsers it is implemented as setTimeout(fn, 0), which actually ends up being about setTimeout(fn, 4). It adds unnecessary delay to your processing, if your iteration functions are actually asynchronous. This setImmediate-type guarding was added to waterfall and it appears to cause a pretty big performance hit: #40, #152, #513, #621

I feel like we need to come up with a comprehensive and consistent strategy for dealing with these competing issues.

Option 1: Don't guard against stack overflows, leave it to the user

I propose removing all async.nextTicks and setImmediates from async library functions, to remove the performance hit. This would be a pretty simple change. We can still offer the async.nextTick and async.setImmediate functions to help people fix their sync functions, when they do need to use them in a waterfall or something similar.

We would need to document what we actually mean by synchronous functions , based on the confusion in #629. A RangeError is a pretty good indicator of using a sync function.

This would be a breaking change, a 1.0-type feature. We could also provide a wrapper function that will make sync functions work, to ease migration. stackSafe from #516 is an interesting idea.

Option 2: Guard against stack overflows, but use a better setImmediate

There is a pretty comprehensive polyfill for setImmediate that handles a variety of browsers and JS execution environments: YuzuJS/setImmediate. It uses the native setImmediate in node and later IE's, postMessage tricks in modern browsers, and script.onreadystatechange tricks in older IE's. It gives sub-millisecond deferrals for 99%+ of JS execution environments.

If we were to use this for the expressed purpose of preventing stack overflows, we should use it everywhere. There should be tests for every async library function with huge arrays and sync functions. This would be a bit of work.

It would require a build-step (e.g. browserify --standalone) for the standalone distribution of async. Async would now have an npm dependency, The other option is to just copy the setImmediate code in, which is compatible with the MIT license I believe.

Option 3: Have a "dev-mode" that can detect sync iterators

Based on an environment variable or async._devMode flag, we could conditionally add some guards to iteration functions to see if they callback on the same event loop tick. Stack overflows would still happen, but you would get warnings that would make the source of the error easier to find. This would be tricky to implement, but it is a compromise between Options 1 and 2.

Implementing it without causing a performance hit will be difficult, and would probably require a build that can strip dead code. I'm reminded of how envify and Uglify can strip out if (process.env.NODE_ENV !== "production") { ... } blocks when browserifying, but I'm not sure if having both async.js and async.dev.js is desirable.

This is also a breaking, 1.0-type feature.

Option 4: Do nothing

Just leave things as-is. Some functions guard against stack overflows, others don't. We have a mis-match of philosophies. waterfall is slower than is has to be, eachSeries is fast, but will RangeError with a large array and a sync iteration function. We will continue to receive bug reports about stack overflows and poor performance.


So -- thoughts anyone? (@caolan @beaugunderson ) I am in favor of Option 1, but I am also biased because I also have experience with diagnosing the errors caused by sync iterators. 😃

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions