Skip to content

Latest commit

 

History

History
180 lines (108 loc) · 7.49 KB

File metadata and controls

180 lines (108 loc) · 7.49 KB

Introduction to data flow

Finding string formatting vulnerabilities in C/C++

.. rst-class:: setup

Setup

For this example you need to set up CodeQL for Visual Studio Code and download the CodeQL database for dotnet/coreclr from GitHub.

.. rst-class:: agenda

Agenda

  • Non-constant format string
  • Data flow
  • Modules and libraries
  • Local data flow
  • Local taint tracking

Motivation

Let’s write a query to identify instances of CWE-134 Use of externally controlled format string.

printf(userControlledString, arg1);

Goal: Find uses of printf (or similar) where the format string can be controlled by an attacker.

Note

Formatting functions allow the programmer to construct a string output using a format string and an optional set of arguments. The format string is specified using a simple template language, where the output string is constructed by processing the format string to find format specifiers, and inserting values provided as arguments. For example:

printf("Name: %s, Age: %d", "Freddie", 2);

would produce the output "Name: Freddie, Age: 2”. So far, so good. However, problems arise if there is a mismatch between the number of formatting specifiers, and the number of arguments. For example:

printf("Name: %s, Age: %d", "Freddie");

In this case, we have one more format specifier than we have arguments. In a managed language such as Java or C#, this simply leads to a runtime exception. However, in C/C++, the formatting functions are typically implemented by reading values from the stack without any validation of the number of arguments. This means a mismatch in the number of format specifiers and format arguments can lead to information disclosure.

Of course, in practice this happens rarely with constant formatting strings. Instead, it’s most problematic when the formatting string can be specified by the user, allowing an attacker to provide a formatting string with the wrong number of format specifiers. Furthermore, if an attacker can control the format string, they may be able to provide the %n format specifier, which causes printf to write the number characters in the generated output string to a specified location.

See https://en.wikipedia.org/wiki/Uncontrolled_format_string for more background.

Exercise: Non-constant format string

Write a query that flags printf calls where the format argument is not a StringLiteral.

Hint: Import semmle.code.cpp.commons.Printf and use class FormattingFunction and getFormatParameterIndex().

.. rst-class:: build

.. literalinclude:: ../query-examples/cpp/data-flow-cpp-1.ql
  :language: ql

Note

This first query is about finding places where the format specifier is not a constant string. In the CodeQL libraries for C/C++, constant strings are modeled as StringLiteral nodes, so we are looking for calls to format functions where the format specifier argument is not a string literal.

The C/C++ standard libraries include many different formatting functions that may be vulnerable to this particular attack–including printf, snprintf, and others. Furthermore, each of these different formatting functions may include the format string in a different position in the argument list. Instead of laboriously listing all these different variants, we can make use of the standard CodeQL class FormattingFunction, which provides an interface that models common formatting functions in C/C++.

Meh...

Results are unsatisfactory:

  • Query flags cases where the format string is a symbolic constant.
  • Query flags cases where the format string is itself a format argument.
  • Query doesn't recognize wrapper functions around printf-like functions.

We need something better.

Note

For example, consider the results which appear in /src/ToolBox/SOS/Strike/util.h, between lines 965 and 970:

const char *format = align == AlignLeft ? "%-*.*s" : "%*.*s";

      if (IsDMLEnabled())
          DMLOut(format, width, precision, mValue);
      else
          ExtOut(format, width, precision, mValue);

Here, DMLOut and ExtOut are macros that expand to formatting calls. The format specifier is not constant, in the sense that the format argument is not a string literal. However, it is clearly one of two possible constants, both with the same number of format specifiers.

What we need is a way to determine whether the format argument is ever set to something that is not constant.

Exercise: source nodes

Define a subclass of DataFlow::Node representing “source” nodes, that is, nodes without a (local) data flow predecessor.

Hint: use not exists().

.. rst-class:: build

class SourceNode extends DataFlow::Node {
    SourceNode() {
      not DataFlow::localFlowStep(_, this)
    }
  }

Note

Note the scoping of the don’t-care variable “_” in this example: the body of the characteristic predicate is equivalent to:

not exists(DataFlow::Node pred | DataFlow::localFlowStep(pred, this))

which is not the same as:

exists(DataFlow::Node pred | not DataFlow::localFlowStep(pred, this)).

Revisiting non-constant format strings

Refine the query to find calls to printf-like functions where the format argument derives from a local source that is not a constant string.

.. rst-class:: build

import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.commons.Printf

class SourceNode extends DataFlow::Node { ... }

from FormattingFunction f, Call c, SourceNode src, DataFlow::Node  arg
where c.getTarget() = f and
      arg.asExpr() = c.getArgument(f.getFormatParameterIndex()) and
      DataFlow::localFlow(src, arg) and
      not src.asExpr() instanceof StringLiteral
select arg, "Non-constant format string."

Refinements (take home exercise)

Audit the results and apply any refinements you deem necessary.

Suggestions:

  • Replace DataFlow::localFlowStep with a custom predicate that includes steps through global variable definitions.

    Hint: Use class GlobalVariable and its member predicates getAnAssignedValue() and getAnAccess().

  • Exclude calls in wrapper functions that just forward their format argument to another printf-like function; instead, flag calls to those functions.

Beyond local data flow

  • Results are still underwhelming.
  • Dealing with parameter passing becomes cumbersome.
  • Instead, let’s turn the problem around and find user-controlled data that flows into a printf format argument, potentially through calls.
  • This needs :doc:`global data flow <global-data-flow-cpp>`.