Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default / "global" context is potentially very dangerous #14

Closed
Sencerd opened this issue Dec 9, 2013 · 14 comments
Closed

default / "global" context is potentially very dangerous #14

Sencerd opened this issue Dec 9, 2013 · 14 comments

Comments

@Sencerd
Copy link

Sencerd commented Dec 9, 2013

The existence of the default/global context has the potential to be incredibly dangerous.

Using the example from the readme, imagine you are setting a user's session data in CLS and then reading it somewhere else in order to establish, let's say, permissions for a resource. Here is an example of a normal request cycle:

Request arrives -> User session is set -> Other stuff happens -> User session is read -> Permissions are checked -> Response is sent

The problem is, if you were to introduce code that makes your callback chain lose it's context (for example, you use Q and don't make use of the cls-q workaround) you would end up writing to and reading from the global context. This is bad for obvious reasons, but now here is the best bit, this is probably not going to be apparent in a development environment, because you will usually not be making concurrent requests, and so long as the above flow all happens in sequence, your app will appear to be functioning correctly, but what happens when you have concurrent requests being processed? Here is that flow, with broken callback chains that are using the global context:

UserA arrives -> UserA session is set in global context -> UserB arrives -> UserB session is set in global context -> UserA reads session from global context, get's UserB's session -> your whole product disappears up it's own rear end

Is there actually a reason for the global context to exist? Given that the purpose of this module is to provide thread local storage, it seems like having a global context is only ever going to be dangerous, why would you ever want to store something in CLS globally? Why wouldn't you just use a global for that?

@paulhill
Copy link

paulhill commented Dec 9, 2013

+1

@othiym23
Copy link
Owner

othiym23 commented Dec 9, 2013

To be blunt, the global context exists mostly to simplify the code and to make it easier for me to test the framework. I'll take a look at what would be required to remove the global context. I don't actually make use of the global context myself in New Relic or my own CLS-related projects (except in tests, which would all need to be reworked were I to shoot the notion of a default context in the head).

I also agree that this is a very dangerous and not entirely obvious footgun.

PRs are welcome, but I'd prefer patches that fix all the tests as well. 👯

@Sencerd
Copy link
Author

Sencerd commented Dec 9, 2013

It worries me a bit going to prod with this knowing that if someone messes up in the future it could cause havoc, so I will take a look at fixing this myself - no promises though :)

Speaking of pull requests, you have one for cls-q

Out of interest, we use New Relic and Q, would Q be causing any issues for CLS in New Relic?

Also, +1000000 for being blunt

@othiym23
Copy link
Owner

Dang, I must not have watched cls-q, I'll get to that in a minute.

New Relic follows a slightly different path to making sure that CLS state gets propagated to where it needs to go. Because the New Relic execution tracer wraps mongodb and redis directly, it's able to wrap all of the callback chains that might be broken by the absence of the Q shim.

cls-q isn't even needed most of the time with Q. It's only when something (like redis) interrupts the execution of the continuation chain that it's necessary to wrap up thenables.

@Sencerd
Copy link
Author

Sencerd commented Dec 10, 2013

Good to know.

I haven't looked into it at length, but I'm fairly sure node-memcached is what broke the Q callback chains for us.

@othiym23
Copy link
Owner

It wouldn't surprise me. The hashring stuff in node-memcached is tricky. It's also something that node-newrelic happens to instrument, so we're covered there as well. Did you have to write another shim for CLS and node-memcached, or was cls-q enough?

@Sencerd
Copy link
Author

Sencerd commented Dec 10, 2013

cls-q seems to have done the trick for us, I'm still in the process of implementing CLS, but since using cls-q I haven't run into any issues with losing context

@othiym23
Copy link
Owner

Check out 5b7ae06, which removes the default context and makes the necessary fixes to accommodate that (and also fixes up the tests and docs). It feels like a risky change to me, but all of New Relic's tests pass against it (as well as its own, now), so apparently New Relic was doing the right thing. Be aware that attempts to call namespace.set() when outside a context now throw; this is more correct, but can crash apps. As such, I'm going to leave this change on GitHub without publishing a new version of the module (which will probably be 2.7). If this change causes problems for you, I'll probably tweak it more and wait for @trevnorris to finish his rewrite of the asyncListener API (and will also fix up the async-listener polyfill to match) and then release CLS 3.0. Either way, it should be released before the end of the year.

If you want this sooner and don't encounter any trouble with what's on master, I can be persuaded to publish 2.7 within the next couple days. I just don't want to break people's stuff unknowingly.

Leaving this issue open until this change lands on npm.

@Sencerd
Copy link
Author

Sencerd commented Dec 10, 2013

Looks good to me, I'll try running this with our code, I wrote a few tests designed to detect context being lost through an API request. Throwing is a huge improvement, if context is lost we definitely want the app to crash, the alternative is silent data corruption. (Well, silent until the CS reports start flooding in ;))

Thanks for making this change so quickly, it's very much appreciated!

@Sencerd
Copy link
Author

Sencerd commented Dec 11, 2013

Just to confirm, this is working as expected with our code, when context is lost, set throws and get returns undefined - no more global context fudging!

@othiym23
Copy link
Owner

Sweet. Once I get a chance to bang on this in the New Relic module a little bit more, I'll cut 2.7 (probably tomorrow or this weekend), since it seems that it's going to take Trevor a little while to finish his work on the native-shim version of asyncListener.

@dazld
Copy link

dazld commented Dec 13, 2013

Is there actually a reason for the global context to exist? Given that the purpose of this module is to provide thread local storage, it seems like having a global context is only ever going to be dangerous, why would you ever want to store something in CLS globally? Why wouldn't you just use a global for that?

+1 for all of the above. global objects exist for a reason, and CLS exists for a reason - pointless to mix those intentions. the doc change also helps make it clearer too!

@dazld
Copy link

dazld commented Dec 14, 2013

also, our app is stable with 5b7ae06 👍

@othiym23
Copy link
Owner

continuation-local-storage v3.0.0 is now live on npm (assuming npm isn't having a bad night). I realized it's a compatibility-breaking change, so had to bump the major version. Thanks, @dazld and @Sencerd, for confirming that your apps work properly with the new changes; be sure to file new issues if you notice anything weird with 3.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants