JS: Initial models-as-data implementation #7171

asgerf · 2021-11-18T14:49:32Z

Initial implementation of models-as-data. Please use the backlinked issue for high-level discussion of the format itself.

The implementation is split in three files:

Shared.qll is intended to be shared between languages (some day; currently only works for JS)
Impl.qll is intended to contain the langauge-specific details.
ModelsAsData.qll is the public interface from the rest of the library

Some predicates from Shared.qll must be usable from Impl.qll, but should not be public in the library, so ModelsAsData.qll acts as a gatekeeper toward the public API.

The API exposes two modules:

ModelInput is used for contributing models
- For example, subclass ModelInput::SinkModelCsv to add new sinks
ModelOutput is used for accessing interpreted models in terms of API nodes
- For example, use ModelOutput::getASinkNode("sql-injection") to get SQL injection sinks

In this PR I have ported Sequelize and Spanner, as initial validation.

Evaluation from a slightly earlier version.

The new result is due to MaD always using API graphs whereas the old model used local flow for credentials.
(The new taint sources were due to the inclusion of an rough Express MaD model in the earlier version. They only show up in the diff because getSourceType() returned a different string for those source, not because they were truly new)

asgerf · 2021-12-13T10:30:27Z

Updated with some new access path tokens.

Access path tokens in general now take a comma-separated list of arguments, like Argument[1,2] or Member[foo,bar].
Arguments and ranges:
- A range of arguments is ..-separated again, so Argument[0..3] is an argument at index 0, 1, 2, or 3.
- The upper bound of an argument range can be omitted, so Argument[1..] is everything except the first argument.
- Argument[N-1] refers to the last argument, N-2 to the second-last and so on. This will never resolve to the a negative argument index, so things like Argument[-1] aren't picked up by accident.
- Ranges can be combined with N-1, so Argument[1..N-2] is everything except the first and last argument.
Call site filters have been added, which match a subset of the invocations matched by the base access path:
- WithArity[n] only allows calls with arity n.
- NewCall only matches new calls
- Call only matches non-new calls
- These can be chained, for example NewCall.WithArity[2,3] matches new-calls with arity 2 or 3.

asgerf · 2021-12-13T10:32:34Z

Evaluation on default slugs seems reasonable.
Evaluation on some Sequelize-specific slugs shows one new result, due to a switch from local reasoning to API graphs.

aschackmull · 2021-12-13T12:34:41Z

javascript/ql/lib/semmle/javascript/frameworks/data/internal/Shared.qll

+
+/** Holds if `path` has the form `basePath.token` where `token` is a single token. */
+bindingset[path]
+private predicate decomposePath(string path, string basePath, string token) {


This way of decomposing is quadratic in the number of tokens in path. Might as well use a linear decomposition. I.e. for "a.b.c.d" just construct each of "a", "b", "c", and "d", instead of also constructing "a.b.c" and "a.b".

Probably doesn't matter because of the package-level pruning, but yeah we might as well clean it up.

We previously needed a value to represent each step of the access path, which we don't get by simply splitting into individual tokens. The resolution predicates now take an int n parameter and resolve the first n tokens of the given full access path, with no sharing of common prefixes.

To simplify accessing the tokens of an access path I added the AccessPath class as well.

asgerf · 2021-12-14T11:44:11Z

Evaluation still looks OK

erik-krogh

It's looking really good 👍
I only have a few minor comments.

javascript/ql/lib/semmle/javascript/frameworks/data/internal/Impl.qll

javascript/ql/lib/semmle/javascript/frameworks/data/internal/Shared.qll

asgerf · 2022-01-05T13:37:56Z

Thanks for the review @erik-krogh! I've addressed the comments and force-pushed to resolve the conflict in SQL.qll.

erik-krogh

Looks good 👍

There is just an auto-format issue.
And some implicit-this QL-for-QL warnings.
I've fixed all of that in the suggestions below.

javascript/ql/lib/semmle/javascript/frameworks/data/internal/Shared.qll

Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>

asgerf added the JS label Nov 18, 2021

asgerf force-pushed the js/mad branch from b546172 to 80df398 Compare December 13, 2021 10:19

asgerf marked this pull request as ready for review December 13, 2021 10:32

asgerf requested a review from a team as a code owner December 13, 2021 10:32

asgerf added the no-change-note-required This PR does not need a change note label Dec 13, 2021

aschackmull reviewed Dec 13, 2021

View reviewed changes

erik-krogh mentioned this pull request Jan 4, 2022

Python: remove duplicated spaces in qldoc #7511

Merged

erik-krogh reviewed Jan 4, 2022

View reviewed changes

asgerf added 7 commits January 5, 2022 14:34

JS: Initial support for models as data

772681d

JS: Resolve first N tokens instead of constructing each prefix

3ced5c9

JS: Update documentation in Impl.qll

1989d51

JS: Rename padded -> inversePad

21928be

JS: Add test for WithArity

d33200e

JS: Factor out common regexp in AccessPathToken

486beda

JS: Fix double space

a7698b8

asgerf force-pushed the js/mad branch from b22d3b5 to a7698b8 Compare January 5, 2022 13:37

erik-krogh previously approved these changes Jan 5, 2022

View reviewed changes

Apply suggestions from code review

c9fcdb8

Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>

asgerf dismissed erik-krogh’s stale review via c9fcdb8 January 6, 2022 10:51

erik-krogh approved these changes Jan 6, 2022

View reviewed changes

codeql-ci merged commit d912a98 into github:main Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JS: Initial models-as-data implementation #7171

JS: Initial models-as-data implementation #7171

Uh oh!

asgerf commented Nov 18, 2021

Uh oh!

asgerf commented Dec 13, 2021

Uh oh!

asgerf commented Dec 13, 2021

Uh oh!

aschackmull Dec 13, 2021

Uh oh!

asgerf Dec 13, 2021

Uh oh!

asgerf commented Dec 14, 2021

Uh oh!

erik-krogh left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asgerf commented Jan 5, 2022

Uh oh!

erik-krogh left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JS: Initial models-as-data implementation #7171

JS: Initial models-as-data implementation #7171

Uh oh!

Conversation

asgerf commented Nov 18, 2021

Uh oh!

asgerf commented Dec 13, 2021

Uh oh!

asgerf commented Dec 13, 2021

Uh oh!

aschackmull Dec 13, 2021

Choose a reason for hiding this comment

Uh oh!

asgerf Dec 13, 2021

Choose a reason for hiding this comment

Uh oh!

asgerf commented Dec 14, 2021

Uh oh!

erik-krogh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asgerf commented Jan 5, 2022

Uh oh!

erik-krogh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!