-
Hi, I'm looking at a Java library here and I would like to find out where tainted data passed as parameters by clients is flowing into sensitive sinks within the library (say, the java.net.URL() constructor). I include the query that I currently have below. I have succeeded in tainting the parameters, but am facing the problem that the query does not distinguish whether the taint source is called from inside the library. For example, in the following below class E, the method pt(...) is detected as a source of a flow even though there is no way for the clients of E to inject tainted data into the sink hidden in m(...). For the moment, I do not know how to proceed. Any hints would be appreciated. Hope the question makes sense. Thanks!
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
You have a couple of choices here: If The trickiest case is when
Then we really do need to consider
This will hack around the parameter being considered always tainted by sanitizing the return value at callsites within library code. It's up to you how ambitiously you define the sanitizer -- the simplest version would be just excluding compile-time constants, but perhaps you could define callers that are definitely handling untainted input more ambitiously. As always with CodeQL, start with a simple definition, check whether the approximation produces satisfactory results, then get incrementally more ambitious if you need to. |
Beta Was this translation helpful? Give feedback.
-
Many thanks for the ideas, Chris. I have added the polyCalls* clause and the query has been running on LGTM since a day now. Before that it was a couple of hours. I also think that the polyCalls* solution can introduce FP as the predicate can be satisfied for control flow paths that do not have relevant data flow. Overall, I am surprised that there is no proper solution for something like that in CodeQL. The problem seems quite relevant to me. Maybe your team could think about building a solution. I appreciate your help in any case. Vladimir |
Beta Was this translation helpful? Give feedback.
-
It was kindly pointed out to me that the increased runtime was due to me switching over to a path-problem query to better understand the flows (and a very high number of results). Reverting to the non-path-problem query resulted in good performance again. The results look reasonable as well. I still wish there was a cleaner way to formulate the query, but the polyCalls* clause gives a workable approximation for the moment. Thanks for your help, Chris. |
Beta Was this translation helpful? Give feedback.
You have a couple of choices here:
If
pt
is never user controlled, or if it is but its argument never reaches a sink without going back into user code (e.g. in the example you gave, the user would have to writemmm(pt(userControlled))
), classify the argument tommm
as a source, notpt
. You could manually enumerate these functions, or usepolyCalls*(someSinkMethod)
to approximate methods that can reach a sink.The trickiest case is when
pt
isn't as simple as you depict, but can be both a source and a propagator of taint. For example, if the real function looks likeThen we really do need to consider
str
…