control-flow-cpp.rst

Analyzing control flow

CodeQL for C/C++

.. rst-class:: setup

Setup

For this example you need to set up CodeQL for Visual Studio Code and download the CodeQL database for ChakraCore from GitHub.

.. rst-class:: agenda

Agenda

Control flow graphs
Exercise: use after free
Recursion over the control flow graph
Basic blocks
Guard conditions

Control flow graphs

We frequently want to ask questions about the possible order of execution for a program.

Example:

if (x) {
  return 1;
} else {
  return 2;
}

Possible execution order is usually represented by a control flow graph:

.. graphviz::

      digraph {
      graph [ dpi = 1000 ]
      node [shape=polygon,sides=4,color=blue4,style="filled,rounded",fontname=consolas,fontcolor=white]
      a [label=<if<BR /><FONT POINT-SIZE="10">IfStmt</FONT>>]
      b [label=<x<BR /><FONT POINT-SIZE="10">VariableAccess</FONT>>]
      c [label=<1<BR /><FONT POINT-SIZE="10">Literal</FONT>>]
      d [label=<2<BR /><FONT POINT-SIZE="10">Literal</FONT>>]
      e [label=<return<BR /><FONT POINT-SIZE="10">ReturnStmt</FONT>>]
      f [label=<return<BR /><FONT POINT-SIZE="10">ReturnStmt</FONT>>]

      a -> b
      b -> {c, d}
      c -> e
      d -> f
   }

Note

The control flow graph is a static over-approximation of possible control flow at runtime. Its nodes are program elements such as expressions and statements. If there is an edge from one node to another, then it means that the semantic operation corresponding to the first node may be immediately followed by the operation corresponding to the second node. Some nodes (such as conditions of “if” statements or loop conditions) have more than one successor, representing conditional control flow at runtime.

Modeling control flow

The control flow is modeled with a CodeQL class, ControlFlowNode. Examples of control flow nodes include statements and expressions.

ControlFlowNode provides API for traversing the control flow graph:
- ControlFlowNode ControlFlowNode.getASuccessor()
- ControlFlowNode ControlFlowNode.getAPredecessor()
- ControlFlowNode ControlFlowNode.getATrueSuccessor()
- ControlFlowNode ControlFlowNode.getAFalseSuccessor()

The control-flow graph is intra-procedural–in other words, only models paths within a function. To find the associated function, use

Function ControlFlowNode.getControlFlowScope()

Note

The control flow graph is similar in concept to data flow graphs. In contrast to data flow, however, the AST nodes are directly control flow graph nodes.

The predecessor/successor predicates are prime examples of member predicates with results that are used in functional syntax, but that are not actually functions. This is because a control flow node may have any number of predecessors and successors (including zero or more than one).

Example: malloc/free pairs

Find calls to free that are reachable from an allocation on the same variable:

.. literalinclude:: ../query-examples/cpp/control-flow-cpp-1.ql
   :language: ql

Note

Predicates allocationCall and freeCall are defined in the standard library and model a number of standard alloc/free-like functions.

Exercise: use after free

Based on this query, write a query that finds accesses to the variable that occur after the free.

.. rst-class:: build

What do you find? What problems occur with this approach to detecting use-after-free vulnerabilities?

.. rst-class:: build

    .. literalinclude:: ../query-examples/cpp/control-flow-cpp-2.ql
       :language: ql

Utilizing recursion

The main problem we observed in the previous exercise was that the successor's relation is unaware of changes to the variable that would invalidate our results.

We can fix this by writing our own successor predicate that stops traversing the CFG if the variable is re-defined.

Utilizing recursion

ControlFlowNode reachesWithoutReassignment(FunctionCall free, LocalScopeVariable v)
{
  freeCall(free, v.getAnAccess()) and
  (
    // base case
    result = free
    or
    // recursive case
    exists(ControlFlowNode mid |
      mid = reachesWithoutReassignment(free, v) and
      result = mid.getASuccessor() and
      // stop tracking when the value may change
      not result = v.getAnAssignedValue() and
      not result.(AddressOfExpr).getOperand() = v.getAnAccess()
    )
  )
}

Exercise

Find local variables that are written to, and then never accessed again.

Hint: Use LocalVariable.getAnAssignment().

.. rst-class:: build

   .. literalinclude:: ../query-examples/cpp/control-flow-cpp-3.ql
      :language: ql

.. rst-class:: background2

More control flow

Basic blocks

BasicBlock represents basic blocks, that is, straight-line sequences of control flow nodes without branching.

ControlFlowNode BasicBlock.getNode(int)
BasicBlock BasicBlock.getASuccessor()
BasicBlock BasicBlock.getAPredecessor()
BasicBlock BasicBlock.getATrueSuccessor()
BasicBlock BasicBlock.getAFalseSuccessor()

Often, queries can be made more efficient by treating basic blocks as a unit instead of reasoning about individual control flow nodes.

Exercise: unreachable blocks

Write a query to find unreachable basic blocks.

Hint: First define a recursive predicate to identify reachable blocks. Class EntryBasicBlock may be useful.

.. rst-class:: build

.. literalinclude:: ../query-examples/cpp/control-flow-cpp-4.ql
   :language: ql

Note

This query has a good number of false positives on Chakra, many of them to do with templating and macros.

Guard conditions

A GuardCondition is a Boolean condition that controls one or more basic blocks in the sense that it is known to be true/false at the entry of those blocks.

GuardCondition.controls(BasicBlock bb, boolean outcome): the entry of bb can only be reached if the guard evaluates to outcome
GuardCondition.comparesLt, GuardCondition.ensuresLt, GuardCondition.comparesEq: auxiliary predicates to identify conditions that guarantee that one expression is less than/equal to another

Further materials

CodeQL for C/C++: https://codeql.github.com/docs/codeql-language-guides/codeql-for-cpp/
API reference: https://codeql.github.com/codeql-standard-libraries/cpp

.. rst-class:: end-slide

Extra slides

.. rst-class:: background2

Appendix: Library customizations

Call graph customizations

The default implementation of call target resolution does not handle function pointers, because they are difficult to deal with in general.

We can, however, add support for particular patterns of use by contributing a new override of Call.getTarget.

Exercise: unresolvable calls

Write a query that finds all calls for which no call target can be determined, and run it on libjpeg-turbo.

Examine the results. What do you notice?

.. rst-class:: build

   .. code-block:: ql

      import cpp

      from Call c
      where not exists(c.getTarget())
      select c

.. rst-class:: build

Many results are calls through struct fields emulating virtual dispatch.

Exercise: resolving calls through variables

Write a query that resolves the call at cjpeg.c:640.

Hint: Use classes ExprCall, PointerDereferenceExpr, and Access.

.. rst-class:: build

.. literalinclude:: ../query-examples/cpp/control-flow-cpp-5.ql
   :language: ql

Exercise: customizing the call graph

Create a subclass of ExprCall that uses your query to implement getTarget.

.. rst-class:: build

class CallThroughVariable extends ExprCall {
  Variable v;
  CallThroughVariable() {
    exists(PointerDereferenceExpr callee | callee = getExpr() |
      callee.getOperand() = v.getAnAccess()
    )
  }
  override Function getTarget() {
    result = super.getTarget() or
    exists(Access init | init = v.getAnAssignedValue() |
      result = init.getTarget()
    )
  }
}

Control-flow graph customizations

The default control-flow graph implementation recognizes a few common patterns for non-returning functions, but sometimes it fails to spot them, which can cause imprecision.

We can add support for new non-returning functions by overriding ControlFlowNode.getASuccessor().

Exercise: calls to `error_exit`

Write a query that finds all calls to a field called error_exit.

Hint: Reuse (parts of) the CallThroughVariable class from before.

.. rst-class:: build

class CallThroughVariable extends ExprCall { ... }

class ErrorExitCall extends CallThroughVariable {
  override Field v;

  ErrorExitCall() { v.getName() = "error_exit" }
}

from ErrorExitCall eec
select eec

Exercise: customizing the control-flow graph

Override ControlFlowNode to mark calls to error_exit as non-returning.

Hint: ExprCall is an indirect subclass of ControlFlowNode.

.. rst-class:: build

class CallThroughVariable extends ExprCall { ... }

class ErrorExitCall extends CallThroughVariable {
  override Field v;

  ErrorExitCall() { v.getName() = "error_exit" }

  override ControlFlowNode getASuccessor() { none() }
}

`CustomOptions` class

The Options library defines a CustomOptions class with various member predicates that can be overridden to customize aspects of the analysis.

In particular, it has an exprExits predicate that can be overridden to more easily perform the customization on the previous slide:

import Options

class MyOptions extends CustomOptions {
 override predicate exprExits(Expr e) {
   super.exprExits(e) or ...
 }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyzing control flow

Setup

Agenda

Control flow graphs

Modeling control flow

Example: malloc/free pairs

Exercise: use after free

Utilizing recursion

Utilizing recursion

Exercise

More control flow

Basic blocks

Exercise: unreachable blocks

Guard conditions

Further materials

Extra slides

Appendix: Library customizations

Call graph customizations

Exercise: unresolvable calls

Exercise: resolving calls through variables

Exercise: customizing the call graph

Control-flow graph customizations

Exercise: calls to `error_exit`

Exercise: customizing the control-flow graph

`CustomOptions` class

FilesExpand file tree

control-flow-cpp.rst

Latest commit

History

control-flow-cpp.rst

File metadata and controls

Analyzing control flow

Setup

Agenda

Control flow graphs

Modeling control flow

Example: malloc/free pairs

Exercise: use after free

Utilizing recursion

Utilizing recursion

Exercise

More control flow

Basic blocks

Exercise: unreachable blocks

Guard conditions

Further materials

Extra slides

Appendix: Library customizations

Call graph customizations

Exercise: unresolvable calls

Exercise: resolving calls through variables

Exercise: customizing the call graph

Control-flow graph customizations

Exercise: calls to error_exit

Exercise: customizing the control-flow graph

CustomOptions class

Exercise: calls to `error_exit`

`CustomOptions` class