Analyzing library code for dangerous flows #7537

vlkl-sap · 2022-01-07T14:20:41Z

vlkl-sap
Jan 7, 2022

Hi,

I'm looking at a Java library here and I would like to find out where tainted data passed as parameters by clients is flowing into sensitive sinks within the library (say, the java.net.URL() constructor).

I include the query that I currently have below. I have succeeded in tainting the parameters, but am facing the problem that the query does not distinguish whether the taint source is called from inside the library. For example, in the following below class E, the method pt(...) is detected as a source of a flow even though there is no way for the clients of E to inject tainted data into the sink hidden in m(...).

For the moment, I do not know how to proceed. Any hints would be appreciated. Hope the question makes sense.

Thanks!

public class E {

    public static void mmm() throws java.net.MalformedURLException {
        m(pt("untainted"));
    }

    private static void m(String s) throws java.net.MalformedURLException {
        String s1 = s;
        new java.net.URL(s1);
    }

    public static String pt(String str) {
        return str;
    }

}

/**
@id my-query
@kind path-problem
@problem.severity recommendation
*/

import semmle.code.java.dataflow.TaintTracking
import DataFlow::PathGraph
import java

class ParamToURLConfiguration extends TaintTracking::Configuration {
  ParamToURLConfiguration() {
    this = "ParamToURLConfiguration"
  }

  override predicate isSource(DataFlow::Node source) {
    source.getEnclosingCallable().isPublic() and source instanceof DataFlow::ParameterNode
  }

  override predicate isSink(DataFlow::Node sink) {
    exists(Call call |
      sink.asExpr() = call.getArgument(0) and
      call.getCallee().(Constructor).getDeclaringType().hasQualifiedName("java.net", "URL")
    )
  }
}

from DataFlow::PathNode src, DataFlow::PathNode sink, ParamToURLConfiguration config
where config.hasFlow(src.getNode(), sink.getNode())
select src.getNode().asParameter().getCallable(), src,  sink, "xxx"

Answered by smowton

Jan 10, 2022

You have a couple of choices here:

If pt is never user controlled, or if it is but its argument never reaches a sink without going back into user code (e.g. in the example you gave, the user would have to write mmm(pt(userControlled))), classify the argument to mmm as a source, not pt. You could manually enumerate these functions, or use polyCalls*(someSinkMethod) to approximate methods that can reach a sink.

The trickiest case is when pt isn't as simple as you depict, but can be both a source and a propagator of taint. For example, if the real function looks like

public static String pt(String str) {
  if(someCond)
    maybeSink(str);
  return str;
}

Then we really do need to consider str…

View full answer

smowton · 2022-01-10T12:02:44Z

smowton
Jan 10, 2022
Maintainer

You have a couple of choices here:

If pt is never user controlled, or if it is but its argument never reaches a sink without going back into user code (e.g. in the example you gave, the user would have to write mmm(pt(userControlled))), classify the argument to mmm as a source, not pt. You could manually enumerate these functions, or use polyCalls*(someSinkMethod) to approximate methods that can reach a sink.

The trickiest case is when pt isn't as simple as you depict, but can be both a source and a propagator of taint. For example, if the real function looks like

public static String pt(String str) {
  if(someCond)
    maybeSink(str);
  return str;
}

Then we really do need to consider str a source, but also notice that it only works as an argument -> return value propagator with pre-existing taint. Probably the best we could do here would be to define

override predicate isSanitizer(DataFlow::Node node) {
  node = any(DataFlow::CallNode cn | /* cn calls pt */ and /* cn's argument is obviously not tainted */)
}

This will hack around the parameter being considered always tainted by sanitizing the return value at callsites within library code. It's up to you how ambitiously you define the sanitizer -- the simplest version would be just excluding compile-time constants, but perhaps you could define callers that are definitely handling untainted input more ambitiously. As always with CodeQL, start with a simple definition, check whether the approximation produces satisfactory results, then get incrementally more ambitious if you need to.

0 replies

vlkl-sap · 2022-01-13T12:40:38Z

vlkl-sap
Jan 13, 2022
Author

Many thanks for the ideas, Chris.

I have added the polyCalls* clause and the query has been running on LGTM since a day now. Before that it was a couple of hours. I also think that the polyCalls* solution can introduce FP as the predicate can be satisfied for control flow paths that do not have relevant data flow.

Overall, I am surprised that there is no proper solution for something like that in CodeQL. The problem seems quite relevant to me. Maybe your team could think about building a solution.

I appreciate your help in any case.

Vladimir

1 reply

smowton Jan 13, 2022
Maintainer

Using polyCalls* to restrict the sources should not be that time-consuming. Maybe paste an example of what you've implemented?

vlkl-sap · 2022-01-21T11:43:31Z

vlkl-sap
Jan 21, 2022
Author

It was kindly pointed out to me that the increased runtime was due to me switching over to a path-problem query to better understand the flows (and a very high number of results). Reverting to the non-path-problem query resulted in good performance again. The results look reasonable as well. I still wish there was a cleaner way to formulate the query, but the polyCalls* clause gives a workable approximation for the moment.

Thanks for your help, Chris.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Analyzing library code for dangerous flows #7537

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Analyzing library code for dangerous flows #7537

Uh oh!

vlkl-sap Jan 7, 2022

Replies: 3 comments · 1 reply

Uh oh!

Uh oh!

smowton Jan 10, 2022 Maintainer

Uh oh!

vlkl-sap Jan 13, 2022 Author

Uh oh!

smowton Jan 13, 2022 Maintainer

Uh oh!

vlkl-sap Jan 21, 2022 Author

vlkl-sap
Jan 7, 2022

Replies: 3 comments 1 reply

smowton
Jan 10, 2022
Maintainer

vlkl-sap
Jan 13, 2022
Author

smowton Jan 13, 2022
Maintainer

vlkl-sap
Jan 21, 2022
Author