Skip to content

Commit

Permalink
initial
Browse files Browse the repository at this point in the history
  • Loading branch information
jpleasu committed May 21, 2021
0 parents commit a815c7c
Show file tree
Hide file tree
Showing 143 changed files with 17,455 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.classpath
.project
.settings/
**/target/
**/private/
65 changes: 65 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
LDGGrep is licensed for use as follows:

Copyright(C) 2021, Institute for Defense Analyses
4850 Mark Center Drive, Alexandria, VA; 703-845-2500
This material may be reproduced by or for the US Government
pursuant to the copyright license under the clauses at DFARS
252.227-7013 and 252.227-7014.


All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDER NOR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.


The source code of dk.brics.automaton is included with modifications. Its
license is as follows:

Copyright (c) 2001-2017 Anders Moeller
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The name of the author may not be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


220 changes: 220 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# LDGGrep
A path query tool for program analysis.


<!-- vim-markdown-toc GFM -->

* [Overview](#overview)
* [Ghidra extension](#ghidra-extension)
* [Usage](#usage)
* [Building the Ghidra extension (on Linux)](#building-the-ghidra-extension-on-linux)
* [LDGGrep tools](#ldggrep-tools)
* [Building tools](#building-tools)
* [`javagrep`](#javagrep)
* [`gfgrep`](#gfgrep)
* [`restgrep`](#restgrep)
* [Introduction to LDGGrep queries](#introduction-to-ldggrep-queries)
* [from regular expressions](#from-regular-expressions)
* [predicates](#predicates)
* [node predicates](#node-predicates)
* [grammar](#grammar)

<!-- vim-markdown-toc -->



# Overview
Given a [labelled](https://en.wikipedia.org/wiki/Graph_labeling) [directed
graph](https://en.wikipedia.org/wiki/Directed_graph) and a [query
expression](#ldggrep-query-syntax), LDGGrep computes the minimal graph
representing all matching paths.

With experience, LDGGrep can be used to quickly reduce large, unwieldy graphs
to minimal, problem-focused graphs for further analysis and visualization.

# Ghidra extension
[Download a release](/../../releases)

The Ghidra extension includes graph models built from program data, a query
dialog with basic documentation, and a graph viewer.

* `RefGrep` uses the graph of Ghidra references. Nodes are addresses and edges
are references like function calls and data indirections.
([example queries](ghidra/extension/ghidra_scripts/RefGrepExamples.txt))

* `RefGrepWithDataStarts` is `RefGrep` except all reference sources, and not
just functions, are added to the set of starting nodes. That can be a large
set, so queries should start with a selective first predicate.
([example queries](ghidra/extension/ghidra_scripts/RefGrepWithDataStartsExamples.txt))

* `RefGrepExt` demonstrates extending `RefGrep` with extra predicates.
([example queries](ghidra/extension/ghidra_scripts/RefGrepExtExamples.txt))

* `BlockGrep` extends `RefGrep` with basic blocks for control flow graph
queries.
([example queries](ghidra/extension/ghidra_scripts/BlockGrepExamples.txt))

* `BaseGhidraGrep` is the abstract `GhidraScript` that all of the above inherit
from. It provides a query dialog with history and all the wiring to the
engine, just add a model.


## Usage
Install the extension and restart Ghidra. (alternatively, add the bundle and
ghidra_scripts directory via the Bundle Manager)

From the script manager, select the LDGGrep category.

Running each script displays a dialog with a query textbox and a help window.
Clicking on a query expression in the help window populates the query box.

Clicking the "graph" button or pressing enter will submit the graph query. If
parsing fails, an error message will show in Ghidra's console. On success,
either no match is found and "no match" is written to Ghidra's console, or a
graph window will open containing the minimized graph of matching paths.

In the graph window, clicking on a node or edge jumps to the corresponding
location in the open program. Selecting a set of nodes will highlight the
corresponding locations in the code browser.

Clicking the "mem" button in the query dialog will perform the same query, but
the stored nodes are presented in a table. If the `sto` predicate doesn't
appear in the query expression, the table will be empty.

## Building the Ghidra extension (on Linux)
```bash
mvn package -Dghidra.version=9.2.3
ls -l ./ghidra/extension/target/ghidra_9.2.3_*_ldggrep.zip
```
(Maven calls the Bash script [`ghidra/extension/build.sh`](ghidra/extension/build.sh), so for now building
the extension depends on Bash)


# LDGGrep tools
Tools that use LDGGrep and support classes for the development of new ones.

## Building tools
```bash
mvn package
ls -l ./tools/target/appassembler/bin
```

## `javagrep`
Disassemble jars and classes to generate a graph of references, then query it
from a repl.

e.g. to find all call paths from LDGGrep's primary match method to the
`dk.brics` API:
```bash
./tools/target/appassembler/bin/javagrep ./ldggrep/target/ldggrep-*.jar
> </LDGMatcher::match/> (callx </jpleasu/>)* callx </dk\.brics/>
```

## `gfgrep`
A graph file grep built from the jgrapht-io parsers.

```bash
./tools/target/appassembler/bin/gfgrep ./tools/src/test/resources/test.dot
> /x/ </b/>
```

## `restgrep`
`restgrep` is a web service for querying graphs and
[`restclient.py`](tools/src/main/python/restclient.py) is a sample client.
```bash
# start the server w/ "showmatch" so that every matched graph shows in a window
./tools/target/appassembler/bin/restgrep -showmatch

# install client dependencies
pip3 install pydot networkx requests

# send a graph and do some queries
./tools/src/main/python/restclient.py
```
To change the listening port, use the `-port #` option to `restgrep`, and
change the URL in `restclient.py`.

# Introduction to LDGGrep queries
If the subject of "regular expressions" provokes you, LDGGrep is not for you.

Query expressions in LDGGrep are like [regular
expressions](https://en.wikipedia.org/wiki/Regular_expression), except instead
of matching a string composed of a sequence of a characters, we're matching
a [walk](https://en.wikipedia.org/wiki/Path_(graph_theory)#Directed_walk,_trail,_path)
in a directed graph composed of a sequence of nodes and edges.

## from regular expressions
LDGGrep uses `.`, `|`, `+`, `?`, `*`, `()`, and `{#,#}` in the exact same way
as regular expressions, but to go from characters to objects we need some more
syntax.

First, recall that in regular expressions (most) characters are matched with an
identical literal - so `a` matches `a`. For more power matching a character, we
have [character
classes](https://en.wikipedia.org/wiki/Regular_expression#Character_classes),
e.g. `[:alpha:]` matches `a` but not `1`.

With just a tiny bit of imagination, we might allow any
[predicate](https://en.wikipedia.org/wiki/Predicate_(mathematical_logic)) to
take the place of `[:alpha:]`. With a stateless predicate, the semantics of
regular expressions are unchanged. We could even replace each literal, like
`b`, with a predicate, like `[:is_b:]`. It's predicates all the way down!

In LDGGrep, this idea of using predicates lets us generalize from characters to
objects. (note: LDGGrep uses square brackets entirely differently, only the
idea is shared!)

## predicates
As noted above, `.` does the same thing in LDGGrep as in regular expressions -
it matches anything. E.g. the LDGGrep query `.{2,4}` matches all walks with
length from 2 to 4.

We often want to match objects by their name, so there are two kinds of
predicates to match strings. The _model_ of an graph provides the conversion
function to make strings of objects.

String literal predicates are enclosed in double quotes and regex predicates
are enclosed in forward slashes. E.g. the LDGGrep query `"a" /b/` matches
walks of length 2 whose first edge is named `a` and second edge has a name that
contains `b`.

For direct access to objects in a predicate, there are Iverson
bracketed JavaScript expressions. The object to be tested is named `x` in the
expression, and the expession is executed in the `with(x)` scope for more
succinct access to its members.

Iverson bracket predicates are enclosed in square brackets. E.g the LDGGrep
query `[x.fieldA]`, or equivalently `[fieldA]`, matches edges whose "fieldA"
member is (convertable to) true.

Finally, the model of a graph can provide "bareword" predicate _methods_,
annotated methods of the model that take an object and return true or false.
They're referred to as "bareword" because they're distinguished from the other
predicates by _not_ being enclosed in anything in particular.

## node predicates

In LDGGrep, node predicates are different from edge predicates because they _do
not advance_ in the graph during matching, they only filter.

Node predicates are enclosed in angle brackets. What's inside the angle
brackets is a predicate as in the previous section. For example, the LDGGrep
query `<"x"> "b"` matches all walks of length one that start on a node with
name "x" and continue to an edge with name "b".


## grammar
```
expr ::= alt (";" alt)*
alt ::= cat ("|" cat)*
cat ::= atom atom*
atom ::= "(" expr ")" | rep
rep ::= pred ("{" NUMBER?, NUMBER? "}" | "*" | "+" )?
pred ::= node_pred | edge_pred
edge_pred ::= pred_expr
node_pred ::= "<" pred_expr ">"
pred_expr ::= "!" pred_expr | DOUBLE_QUOTE STRING DOUBLE_QUOTE | "/" REGEX "/" | IDENTIFIER | "[" JAVASCRIPT "]"
```

see [ldgpat.jj](dggrep/src/main/javacc/ldgpat.jj) for more detail.

80 changes: 80 additions & 0 deletions ghidra/extension/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#!/bin/bash

# config
EXTENSION_NAME='LDGGrep'
EXTENSION_DESCRIPTION="LDGGrep is a path query tool for program analysis."


rp() {
x=$(readlink -m $1)
if [ -z "$x" ]; then x=$1;fi
echo "$x"
}

D="$( cd "$( dirname "$( rp "${BASH_SOURCE[0]}" )" )" && cd -P "$( dirname "$SOURCE" )" && pwd )"




if [[ "$#" -eq 0 ]]; then
echo "Usage : $0 <ghidra version> [ghidra distribution]"
echo
echo " where"
echo " version is something like 9.2.3 and "
echo " distribution is somethign like PUBLIC"
exit 1
fi


# set cleanup on script exit
unset tmp_dir
cleanup() {
if [ ! -z ${tmp_dir+x} ]; then
echo removing $tmp_dir
rm -rf "$tmp_dir"
fi
}
trap cleanup EXIT

GHIDRA_VERSION="$1"
GHIDRA_DISTRIBUTION="${2:-PUBLIC}"

deps_dir=$(rp "$D/target/dependencies_${GHIDRA_VERSION}")
if [ ! -d "$deps_dir" ]; then
echo "can't find dependencies dir $deps_dir"
echo " build with mvn first. From root of ldggrep repo:"
echo " mvn package -Dghidra.version=$GHIDRA_VERSION"
exit 1
fi

tmp_dir=`mktemp -d`

# tmp_extension must be absolute
tmp_extension=${tmp_dir}/${EXTENSION_NAME}


mkdir -p ${tmp_extension}/ghidra_scripts
mkdir -p ${tmp_extension}/lib
touch ${tmp_extension}/Module.manifest
cat > ${tmp_extension}/extension.properties <<EOT
name=${EXTENSION_NAME}
description=${EXTENSION_DESCRIPTION}
author=Jason P. Leasure
createdOn=$(date +%m/%d/%Y)
version=${GHIDRA_VERSION}
EOT

cp -r "$D/data" ${tmp_extension}/
cp -r "$D/ghidra_scripts" ${tmp_extension}/

cp -u ${deps_dir}/* ${tmp_extension}/lib

echo remove module-info from jars in lib
for x in ${tmp_extension}/lib/*.jar; do
zip -qd $x ./module-info.class ./module-info.java 2>/dev/null 1>&2 || true
done

ZIPNAME=ghidra_${GHIDRA_VERSION}_${GHIDRA_DISTRIBUTION}_`date +'%Y%m%d'`_${EXTENSION_NAME,,}.zip
rm -f "${ZIPNAME}"
( cd "$tmp_dir" && zip -r "${D}/target/${ZIPNAME}" "${EXTENSION_NAME}" )

Loading

0 comments on commit a815c7c

Please sign in to comment.