Using Geometric Encoding based Context Sensitive Points to Analysis (geomPTA)

Contributed by Xiao Xiao

Introduction

geomPTA is a context sensitive points-to analysis based on SPARK. It uses the call graph generated by SPARK to build the full context sensitivity model for subsequent analysis. The call graph cycles are handled by a special form of 1CFA for better precision, because Java programs intend to have large cycles. The algorithm details can be found in our technical report, where the evaluation results are only for reference because we are continuing to improve the precision and performance of geomPTA.

Running geomPTA

Starting geomPTA is simply by specifying the options "cg.spark enabled:true" and "cg.spark geom-pta:true". An note is that the option "cg.spark simplify-offline" should be false. An useful option is "cg.spark geom-runs", which is 1 by default. This option controls how many times the geomPTA will be iterated, where more iterations mean better precision. Usually, setting "cg.spark geom-runs:2" can obtain high precision.

By default, geomPTA does not compute the context sensitive points-to information for all variables that are seen by SPARK. First, geomPTA performs a local equivalence merging to merge the pointers that are local to the same function and have same points-to information in the final result. This process is similar to the SPARK offline simplification controlled by "cg.spark simplify-offline". Second, geomPTA scans the whole program and artificially marks the functions in the packages with the names starting by "java.", "sun.", and etc. (For full list, please see function isJavaLibraryClass in class SootClass) as library functions. For rest of the functions, they are called application functions, although not all are really from user's code. geomPTA collects the pointers in library functions that potentially impact the points-to information of pointers in application code (a.k.a supporting pointers) and the pointers in application code. They are called core pointers and will be updated by geomPTA. In order to soundly refine the call graph, the base pointers at virtual callsites are also marked as core pointers.

When geomPTA finished, only the core pointers gain the context sensitive points-to information. For non-core pointers, only the context insensitive points-to information computed by SPARK is available. If you want to change this default behavior to computing points-to information for all pointers, please set the option "cg.spark geom-app-only:false".

Both the classes Parameters and Constants are in the package soot.jimple.spark.geom.geomPA.

geomPTA also updates the call graph computed by SPARK, which can be obtained by Scene.v().getCallGraph(). After geomPTA, the virtual calls (including the interface calls) will be refined with up-to-date points-to information. The spurious call edges will be removed and after the removal, the unreachable methods will also be cleaned. Generally, the refined call graph will be much precise than that generated by SPARK.

Note: If you iterate the call graph edges obtained by Scene.v().getCallGraph().listener(), the call edges removed in the updated call graph may also present in the iterator. This is because all the edges ever added to the call graph are kept tracked.

Querying

Querying with SPARK interface

SPARK provides various formats of reachingObject for querying points-to information for a given local variable, global variable, or an object field. The result is returned in PointsToSet, an interface that provides basic utilities for visiting the points-to result. However, it doesn't permit programmers iterating the objects contained in the set. To do this, you can always safely cast a PointsToSet object returned by reachingObject to the type PointsToSetInternal, which has a forall utility. Sample code for querying points-to information for pointer l is as follows:

PointsToSetInternal pts = (PointsToSetInternal)geomPTA.reachingObject(l)

pts.forall( new P2SetVisitor() {

`public void visit(Node n) {`

    `// Do what you like with n, which is in the type of AllocNode`

});

Querying with geomPTA interface

In addition to the SPARK querying interface, geomPTA provides its own interface soot.jimple.spark.geom.geomPA.GeomQueries for querying context sensitive points-to information in more sophisticated usage scenarios. In the following text, we assume:

The pointer pointer is l;
The enclosing function of l is func(l).

k-CFA query

The most common way for specifying context for a pointer is using the last K call edges to the enclosing method of the querying pointer. We provide two functions for this kind of querying:

public boolean contextsByCallChain(Edge[] callEdgeChain, Local l, PtSensVisitor visitor)

public boolean contextByCallChain(Edge[] callEdgeChain, Local l, SparkField field, PtSensVisitor visitor)

The two functions are similar except that the first function queries variable l and the second function queries the expression l.field. The call edges given in callEdgeChain are in the order that callEdgeChain[0] is the farthest call edge in the chain and callEdgeChain[k-1] is direct call edge of the enclosing method of l. The last parameter visitor is a container that stores the querying result. Usually, you can use following code to create a container:

PtSensVisitor visitor = new Obj_full_extractor();

Specified any edge as part of context query

For this query, we specify any edge e and use all the paths from main to func(l) that passes e as the contexts for l. The querying functions are:

public boolean contexsByAnyCallEdge( Edge sootEdge, Local l, PtSensVisitor visitor )

public boolean contextsByAnyCallEdge(Edge sootEdge, Local l, SparkField field, PtSensVisitor visitor)

One application of this function is proving if a given library call could call back to application code. Sample code for this application is here.

Is alias query

The most common query is deciding if pointer p is an alias of pointer q under any contexts. GeomQueries provides three ways for use:

public boolean isAliasCI(Local l1, Local l2)

public boolean isAlias(IVarAbstraction pn1, IVarAbstraction pn2)

public boolean isAlias(Local l1, Local l2)

TBC......

Also check out Soot's webpage.

NOTE: If you find any bugs in those tutorials (or other parts of Soot) please help us out by reporting them in our issue tracker.

Home
Getting Help
Tutorials
Reference Material
General Notions
Getting Started
A Few Uses of Soot
Using Soot as a Command-Line Tool
- Annotation Options for Array Bounds Checks and Null Pointer Checks
- Using Side Effect Attributes
Using the Soot Eclipse Plugin
- Eclipse Plugin Installation
Using Soot as a Compiler Framework
Building Soot
Coding Conventions
Contributing to Soot
Updating the Soot Web Page
Reporting Bugs
Preparing a New Soot Release

Uh oh!

Using Geometric Encoding based Context Sensitive Points to Analysis (geomPTA)

Introduction

Running geomPTA

Querying

Querying with SPARK interface

Querying with geomPTA interface

k-CFA query

Specified any edge as part of context query

Is alias query

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Table of Contents

Clone this wiki locally