Skip to content

JS: Initial models-as-data implementation #7171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 10, 2022
Merged

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Nov 18, 2021

Initial implementation of models-as-data. Please use the backlinked issue for high-level discussion of the format itself.

The implementation is split in three files:

  • Shared.qll is intended to be shared between languages (some day; currently only works for JS)
  • Impl.qll is intended to contain the langauge-specific details.
  • ModelsAsData.qll is the public interface from the rest of the library

Some predicates from Shared.qll must be usable from Impl.qll, but should not be public in the library, so ModelsAsData.qll acts as a gatekeeper toward the public API.

The API exposes two modules:

  • ModelInput is used for contributing models
    • For example, subclass ModelInput::SinkModelCsv to add new sinks
  • ModelOutput is used for accessing interpreted models in terms of API nodes
    • For example, use ModelOutput::getASinkNode("sql-injection") to get SQL injection sinks

In this PR I have ported Sequelize and Spanner, as initial validation.

Evaluation from a slightly earlier version.

  • The new result is due to MaD always using API graphs whereas the old model used local flow for credentials.
  • (The new taint sources were due to the inclusion of an rough Express MaD model in the earlier version. They only show up in the diff because getSourceType() returned a different string for those source, not because they were truly new)

@asgerf
Copy link
Contributor Author

asgerf commented Dec 13, 2021

Updated with some new access path tokens.

  • Access path tokens in general now take a comma-separated list of arguments, like Argument[1,2] or Member[foo,bar].
  • Arguments and ranges:
    • A range of arguments is ..-separated again, so Argument[0..3] is an argument at index 0, 1, 2, or 3.
    • The upper bound of an argument range can be omitted, so Argument[1..] is everything except the first argument.
    • Argument[N-1] refers to the last argument, N-2 to the second-last and so on. This will never resolve to the a negative argument index, so things like Argument[-1] aren't picked up by accident.
    • Ranges can be combined with N-1, so Argument[1..N-2] is everything except the first and last argument.
  • Call site filters have been added, which match a subset of the invocations matched by the base access path:
    • WithArity[n] only allows calls with arity n.
    • NewCall only matches new calls
    • Call only matches non-new calls
    • These can be chained, for example NewCall.WithArity[2,3] matches new-calls with arity 2 or 3.

@asgerf
Copy link
Contributor Author

asgerf commented Dec 13, 2021

  • Evaluation on default slugs seems reasonable.
  • Evaluation on some Sequelize-specific slugs shows one new result, due to a switch from local reasoning to API graphs.

@asgerf asgerf marked this pull request as ready for review December 13, 2021 10:32
@asgerf asgerf requested a review from a team as a code owner December 13, 2021 10:32
@asgerf asgerf added the no-change-note-required This PR does not need a change note label Dec 13, 2021

/** Holds if `path` has the form `basePath.token` where `token` is a single token. */
bindingset[path]
private predicate decomposePath(string path, string basePath, string token) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of decomposing is quadratic in the number of tokens in path. Might as well use a linear decomposition. I.e. for "a.b.c.d" just construct each of "a", "b", "c", and "d", instead of also constructing "a.b.c" and "a.b".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't matter because of the package-level pruning, but yeah we might as well clean it up.

We previously needed a value to represent each step of the access path, which we don't get by simply splitting into individual tokens. The resolution predicates now take an int n parameter and resolve the first n tokens of the given full access path, with no sharing of common prefixes.

To simplify accessing the tokens of an access path I added the AccessPath class as well.

@asgerf
Copy link
Contributor Author

asgerf commented Dec 14, 2021

Evaluation still looks OK

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking really good 👍
I only have a few minor comments.

@asgerf
Copy link
Contributor Author

asgerf commented Jan 5, 2022

Thanks for the review @erik-krogh! I've addressed the comments and force-pushed to resolve the conflict in SQL.qll.

erik-krogh
erik-krogh previously approved these changes Jan 5, 2022
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

There is just an auto-format issue.
And some implicit-this QL-for-QL warnings.
I've fixed all of that in the suggestions below.

Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>
@codeql-ci codeql-ci merged commit d912a98 into github:main Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JS no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants