Java: Add models for flow- and taint-preserving functions in Commons ObjectUtils#5344
Conversation
| exists(DataFlow::Node src, DataFlow::Node sink, TaintFlowConf conf | conf.hasFlow(src, sink) | | ||
| sink.getLocation() = location and | ||
| element = sink.toString() and | ||
| value = "y" |
There was a problem hiding this comment.
You could also match the empty string "" here, see #5347
| // but we don't support value-preserving varargs functions at the moment. | ||
| "org.apache.commons.lang3;ObjectUtils;false;clone;;;Argument;ReturnValue;value", | ||
| "org.apache.commons.lang3;ObjectUtils;false;cloneIfPossible;;;Argument;ReturnValue;value", | ||
| "org.apache.commons.lang3;ObjectUtils;false;CONST;;;Argument;ReturnValue;value", |
There was a problem hiding this comment.
Should this also match CONST_BYTE and CONST_SHORT?
There was a problem hiding this comment.
I believe these won't work due to a type mismatch? Not 100% sure though
There was a problem hiding this comment.
Do you mean they won't work for dataflow because CodeQL does not support dataflow with different types? Would it at least be good then to model them as taint?
There was a problem hiding this comment.
They should work just fine. The type system that's used for pruning data flow allows for implicit casts, it effectively does not distinguish between the different numeric primitive types.
Please elaborate. |
I tried defining value-preserving edges from the actual arguments to the implicit varargs array created at the callsite, and from the array to the return value, but no flow resulted, presumably due to type differences between the array and its elements? |
18af0ff to
80b0ab7
Compare
|
@aschackmull applied the above comments, believe this is ready |
There was a problem hiding this comment.
You probably had some internal discussion about this, but in case varargs arguments should be covered with value, the following review comments mention the necessary changes.
Please ignore them if flow from a varargs argument should still be treated as taint.
Though there is some ambiguity for dataflow then, e.g. consider this (contrived) example:
public static Object[] copy(Object[]... args) {
return args.clone();
}
public static Object[] get(Object[]... args) {
return args[r.nextInt(args.length)];
}Unless there is some special CSV model syntax, for one of the methods dataflow cannot be modeled correctly.
Maybe it would make sense to introduce a syntax similar to Argument[...] to indicate that the input / output is an argument of the implicit varargs array?
E.g. for the above example code:
org.example;MyClass;false;copy;;;Argument;ReturnValue;value
org.example;MyClass;false;get;;;Argument[...];ReturnValue;value
| override predicate row(string row) { | ||
| row = | ||
| [ | ||
| // Note all the functions annotated with `taint` flow really should have `value` flow, |
There was a problem hiding this comment.
This comment might now be redundant.
| "org.apache.commons.lang3;ObjectUtils;false;CONST_BYTE;;;Argument;ReturnValue;value", | ||
| "org.apache.commons.lang3;ObjectUtils;false;CONST_SHORT;;;Argument;ReturnValue;value", | ||
| "org.apache.commons.lang3;ObjectUtils;false;defaultIfNull;;;Argument;ReturnValue;value", | ||
| "org.apache.commons.lang3;ObjectUtils;false;firstNonNull;;;Argument;ReturnValue;taint", |
There was a problem hiding this comment.
| "org.apache.commons.lang3;ObjectUtils;false;firstNonNull;;;Argument;ReturnValue;taint", | |
| "org.apache.commons.lang3;ObjectUtils;false;firstNonNull;;;Argument;ReturnValue;value", |
There was a problem hiding this comment.
No, this has to be "taint" a while yet until we support proper specification of array contents. The argument is an array (possibly implicitly created), but the method does not return that exact array, so this is not value-preserving from Argument to ReturnValue. The proper value-preserving spec is from Content[Array] of Argument to ReturnValue, but that isn't supported yet.
| "org.apache.commons.lang3;ObjectUtils;false;max;;;Argument;ReturnValue;taint", | ||
| "org.apache.commons.lang3;ObjectUtils;false;median;;;Argument;ReturnValue;taint", | ||
| "org.apache.commons.lang3;ObjectUtils;false;min;;;Argument;ReturnValue;taint", | ||
| "org.apache.commons.lang3;ObjectUtils;false;mode;;;Argument;ReturnValue;taint", |
There was a problem hiding this comment.
| "org.apache.commons.lang3;ObjectUtils;false;max;;;Argument;ReturnValue;taint", | |
| "org.apache.commons.lang3;ObjectUtils;false;median;;;Argument;ReturnValue;taint", | |
| "org.apache.commons.lang3;ObjectUtils;false;min;;;Argument;ReturnValue;taint", | |
| "org.apache.commons.lang3;ObjectUtils;false;mode;;;Argument;ReturnValue;taint", | |
| "org.apache.commons.lang3;ObjectUtils;false;max;;;Argument;ReturnValue;value", | |
| "org.apache.commons.lang3;ObjectUtils;false;median;;;Argument;ReturnValue;value", | |
| "org.apache.commons.lang3;ObjectUtils;false;min;;;Argument;ReturnValue;value", | |
| "org.apache.commons.lang3;ObjectUtils;false;mode;;;Argument;ReturnValue;value", |
aschackmull
left a comment
There was a problem hiding this comment.
LGTM, but needs rebase to solve conflict.
…Utils. These should all be value-preserving, but we don't support value-preserving varargs methods yet.
Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
04614d4 to
82a000b
Compare
|
@aschackmull rebased. |
| override predicate hasActualResult(Location location, string element, string tag, string value) { | ||
| tag = "hasTaintFlow" and | ||
| exists(DataFlow::Node src, DataFlow::Node sink, Conf conf | conf.hasFlow(src, sink) | | ||
| exists(DataFlow::Node src, DataFlow::Node sink, TaintFlowConf conf | conf.hasFlow(src, sink) | |
There was a problem hiding this comment.
Might as well add and not any(ValueFlowConf conf | conf.hasFlow(src, sink)) to make the expected output shorter - there's no need to report taint-flow if we have value-flow as that will always be the case.
These should all be value-preserving, but we don't support value-preserving varargs methods yet.