-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Rewrite cpp/uncontrolled-process-operation
to not use DefaultTaintTracking
#14561
C++: Rewrite cpp/uncontrolled-process-operation
to not use DefaultTaintTracking
#14561
Conversation
c8658fe
to
d013b4a
Compare
d013b4a
to
ed4b7a4
Compare
ed4b7a4
to
9df3252
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if you're happy with the MRVA results. I think the query as it's written now makes a lot of sense 😍.
exists(int processOperationArg, FunctionCall call | | ||
isProcessOperationArgument(processOperation, processOperationArg) and | ||
call.getTarget().getName() = processOperation and | ||
call.getArgument(processOperationArg) = arg | ||
call.getArgument(processOperationArg) = [arg.asExpr(), arg.asIndirectExpr()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should, once all this DTT stuff has been merged, should investigate what happens if we remove output.isReturnValue()
our flow sources (and simply keep the output.isReturnValueDeref()
cases) in models such as this one: https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/models/implementations/Getenv.qll#L18
A quick grep only reveals that this is a problem for:
- https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/models/implementations/Getenv.qll#L18
- https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/models/implementations/Gets.qll#L51
- https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/models/implementations/Gets.qll#L108
This would mean that we didn't have to exclude ataFlow::ExprNode
from isSource
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this predicate is used on the sink-side, not the source-side. What you're saying does apply to the not node instanceof DataFlow::ExprNode
we have in the source predicate below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry. Yes, I meant to comment on the source-side. I don't know why I put the comment on this line of code 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we open an internal issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can do that now.
sink = sinkNode.getNode() and | ||
isProcessOperationExplanation(sink, processOperation) and | ||
Flow::flowPath(sourceNode, sinkNode) | ||
select sink, sourceNode, sinkNode, | ||
"The value of this argument may come from $@ and is being passed to " + processOperation + ".", | ||
source, source.toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the getSourceType predicate from the source node to obtain a better alert message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can. That's what you did elsewhere, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I think we source types might need some further tuning, but that can be done later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. But this in itself LGTM!
20a4f19
to
46e6e72
Compare
Not disregarding the barrier it's 165 (source, sink) pairs |
Rebased for internal PR purposes. |
Note that one of the internal tests also needs updating. I'll do that once we're happy with what we have here.
I added a barrier looking at arithmetic types, as these seemed to be the source of many results that are in the end not very interesting. Note that this does mean that we lose some results where the input buffer is copied character-by-character. However, even with the barrier disabled we lose some of these. I'm not sure how worried we should be about this.
Summary of MRVA results:
Still running a MRVA experiment to see how many (source, sink)-pairs we lose when we do not disregard the barrier.