Skip to content

JS: Add routing trees library #7049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Dec 17, 2021
Merged

JS: Add routing trees library #7049

merged 37 commits into from
Dec 17, 2021

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Nov 3, 2021

This introduces the Routing library, modeling the composition of middleware functions and route handlers, and helps reason about data flow between these.

Previously we were missing data flow in cases like this:

app.use((req, res) => {
  req.data = req.query.x;
});
app.get('/', (req, res) => {
  let x = req.data; // <-- missed flow from req.query.x
});

Data-flow configurations as well as API graphs can now track flow between the two functions above.

What's in this PR

  • The Routing library with a bunch of abstract classes for framework models to extend.
  • It is instantiated for Express and Fastify for now. There are more framework models, but I had to stop creeping the scope of the PR.
  • The framework models themselves haven't changed much; they just contribute to the routing tree model, which then contributes new data-flow and API graph steps.
  • The CSRF and rate limiting queries now use the routing tree model.
  • req.body is now seen as a deeply tainted object in more cases, as the logic for detecting the body parser is now based on routing trees. We get some new results because of this.

Routing tree examples

The structure of the router hierarchy is seen as a tree, with each node being a function that can dispatch the incoming request to its children, or pass it on to its next sibling. We a graph-structure used to represent the possible trees that can be constructed at runtime. (Branching control-flow can cause the graph to be non-tree shaped)

For example:

app.use(foo());
app.get(bar());
app.get('/', a, b, c);

would generate a tree of form:

  • app
    • app.use()
      • foo()
    • app.get()
      • bar()
    • app.get('/)
      • a
      • b
      • c

Now suppose foo() is defined to return another router:

function foo() {
  let router = express.Router();
  router.use(...);
  return router;
}

The inner router would appear as a child in the overall routing tree:

  • app
    • app.use()
      • foo()
        • router
          • router.use(...)
    • ...

In general, if there is flow from A to B, then A becomes a child of B. For example,

function A(req, res) {}
let B1 = A;
app.use(B1);
let B2 = A;
app.use(B2);

The hierarchy would be as follows, with A occurring in two different places, but their corresponding use-sites (B1, B2) still explicit in the tree.

  • app
    • app.use()
      • B1
        • A
    • app.use()
      • B2
        • A

(In this example, the explicit let bindings were just there to make it easier to refer to the use-sites, but separate use-site nodes would exist even if the let bindings were inlined)

If a router is returned, the data-flow rule wires things up nicely:

function makeRouter() {
  let router = express.Router();
  router.use(A);
  return router;
}
app.use(makeRouter());
  • app
    • app.use()
      • makeRouter()
        • router
          • router.use()
            • A

An interesting case is when a mutable router object escapes into another function. In this case, we need interprocedural reasoning about side effects, since ordering of middlewares reflects the order in which certain calls are made. We simplify this by creating a separate node for each function that touches the router object, and at the point where the router escapes, we add a child relationship between caller and callee. For example:

function addMiddlewares(router) {
  router.use(A);
  router.use(B);
}
app.use(cookie());
addMiddlewares(app);
app.get('/', ...)
  • app
    • app.use(cookie())
    • addMiddlewares(app) (point of escape)
      • router (callee's view of router)
        • router.use(A)
        • router.use(B)
    • app.get('/', ...)

With the above tree, we can see that A,B come after cookie() but before the get('/') handler.

Evaluations (internal links)

  • On EJS slugs we get 17 new alerts, not counting the 465 missing rate-limiting alerts which is a bit ridiculous. Mostly TPs as far as I can tell, although some of them are existing alerts with a new source detected through middleware-flow.
  • Default slugs shows good performance and some new SQL injections, and both new TPs and fixed FPs for missing CSRF middleware

@asgerf asgerf added JS JS:changes-sources-or-sinks Changes taint sources/sinks for the JS analysis labels Nov 3, 2021
@asgerf asgerf force-pushed the js/routing-trees branch 2 times, most recently from a3374f2 to 57cc611 Compare November 4, 2021 13:19
@asgerf asgerf marked this pull request as ready for review November 8, 2021 11:19
@asgerf asgerf requested a review from a team as a code owner November 8, 2021 11:19
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great 🎉

This is just a preliminary commit-by-commit review with some small comments.
I'll take another look at the entire thing later.

Comment on lines 32 to 34
// The callback parameter is either a MongoClient or Db depending on the mongodb package version,
// but we just model it as both.
result = getAMongoDbCallback().getParameter(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a motivating example (or test) for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db parameter on this line is actually a client/connection, so the previous filter led to a FN there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually digging into this a bit further, it looks like mongodb@2.x had a .db method on a database, and had no separate concept for a connection/client. I'll update the model to reflect this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and merged the two concepts entirely

@@ -18,12 +18,13 @@ import javascript
import semmle.javascript.security.dataflow.MissingRateLimiting
import semmle.javascript.RestrictedLocations

// TODO: Previously we used (FirstLineOf) here, but that doesn't work anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just remove this TODO.

Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>
@erik-krogh erik-krogh self-assigned this Dec 13, 2021
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very, very nice work 👍

It takes a bit to understand how the routing-trees are recursively created, but it's nice.

I only have some minor comments.
Should be good to merge afterwards.

@asgerf
Copy link
Contributor Author

asgerf commented Dec 16, 2021

Thanks for the thorough review @erik-krogh! All comments addressed, and I've started another round of evaluations.

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@asgerf
Copy link
Contributor Author

asgerf commented Dec 17, 2021

Updated evaluations:

  • Default slugs still looks ok
  • Fastify slugs shows 2 new XSS-like alerts which seem plausible, though the projects are hard to setup so I haven't verified

@codeql-ci codeql-ci merged commit 39ec713 into github:main Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation JS:changes-sources-or-sinks Changes taint sources/sinks for the JS analysis JS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants