Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Rewrite cpp/uncontrolled-process-operation to not use DefaultTaintTracking #14561

Merged
merged 2 commits into from
Nov 15, 2023

Conversation

jketema
Copy link
Contributor

@jketema jketema commented Oct 23, 2023

Note that one of the internal tests also needs updating. I'll do that once we're happy with what we have here.

I added a barrier looking at arithmetic types, as these seemed to be the source of many results that are in the end not very interesting. Note that this does mean that we lose some results where the input buffer is copied character-by-character. However, even with the barrier disabled we lose some of these. I'm not sure how worried we should be about this.

Summary of MRVA results:

  • Total number of results reported by MRVA before on 1000s projects: 1915
  • Total number of results reported by MRVA after on 1000s projects: 1545
  • Disregarding the barrier I added, we lose 58 (source, sink)-pairs. They all seem FPs related to pointer/pointee confusion in DTT.
  • Not disregarding the barrier, we gain 82 (source, sink)-pair. I spot checked these, and they look reasonable. I'm not too worried about this number anyway, since this is a medium precision query.

Still running a MRVA experiment to see how many (source, sink)-pairs we lose when we do not disregard the barrier.

@github-actions github-actions bot added the C++ label Oct 23, 2023
@jketema jketema force-pushed the rewrite-uncontrolled-process-operation branch 3 times, most recently from c8658fe to d013b4a Compare October 26, 2023 12:19
@jketema jketema force-pushed the rewrite-uncontrolled-process-operation branch from d013b4a to ed4b7a4 Compare November 1, 2023 12:18
@jketema jketema force-pushed the rewrite-uncontrolled-process-operation branch from ed4b7a4 to 9df3252 Compare November 15, 2023 10:55
@jketema jketema marked this pull request as ready for review November 15, 2023 11:15
@jketema jketema requested a review from a team as a code owner November 15, 2023 11:15
MathiasVP
MathiasVP previously approved these changes Nov 15, 2023
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if you're happy with the MRVA results. I think the query as it's written now makes a lot of sense 😍.

exists(int processOperationArg, FunctionCall call |
isProcessOperationArgument(processOperation, processOperationArg) and
call.getTarget().getName() = processOperation and
call.getArgument(processOperationArg) = arg
call.getArgument(processOperationArg) = [arg.asExpr(), arg.asIndirectExpr()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should, once all this DTT stuff has been merged, should investigate what happens if we remove output.isReturnValue() our flow sources (and simply keep the output.isReturnValueDeref() cases) in models such as this one: https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/models/implementations/Getenv.qll#L18

A quick grep only reveals that this is a problem for:

This would mean that we didn't have to exclude ataFlow::ExprNode from isSource in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this predicate is used on the sink-side, not the source-side. What you're saying does apply to the not node instanceof DataFlow::ExprNode we have in the source predicate below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry. Yes, I meant to comment on the source-side. I don't know why I put the comment on this line of code 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we open an internal issue for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can do that now.

sink = sinkNode.getNode() and
isProcessOperationExplanation(sink, processOperation) and
Flow::flowPath(sourceNode, sinkNode)
select sink, sourceNode, sinkNode,
"The value of this argument may come from $@ and is being passed to " + processOperation + ".",
source, source.toString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the getSourceType predicate from the source node to obtain a better alert message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can. That's what you did elsewhere, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I think we source types might need some further tuning, but that can be done later.

Copy link
Contributor

@MathiasVP MathiasVP Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. But this in itself LGTM!

@jketema jketema added no-change-note-required This PR does not need a change note depends on internal PR This PR should only be merged in sync with an internal Semmle PR labels Nov 15, 2023
MathiasVP
MathiasVP previously approved these changes Nov 15, 2023
@jketema jketema force-pushed the rewrite-uncontrolled-process-operation branch from 20a4f19 to 46e6e72 Compare November 15, 2023 14:00
@jketema
Copy link
Contributor Author

jketema commented Nov 15, 2023

  • Disregarding the barrier I added, we lose 58 (source, sink)-pairs. They all seem FPs related to pointer/pointee confusion in DTT.

Not disregarding the barrier it's 165 (source, sink) pairs

@jketema
Copy link
Contributor Author

jketema commented Nov 15, 2023

Rebased for internal PR purposes.

@jketema jketema merged commit f22979f into github:main Nov 15, 2023
@jketema jketema deleted the rewrite-uncontrolled-process-operation branch November 15, 2023 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants