Finding SPARQL injection vulnerabilities in Java
.. rst-class:: setup
For this example you need to set up CodeQL for Visual Studio Code and download the CodeQL database for VIVO Vitro from GitHub.
.. rst-class:: agenda
- SPARQL injection
- Data flow
- Modules and libraries
- Local data flow
- Local taint tracking
SPARQL is a language for querying key-value databases in RDF format, which can suffer from SQL injection-like vulnerabilities:
sparqlAskQuery("ASK { <" + individualURI + "> ?p ?o }")
individualURI is provided by a user, allowing an attacker to prematurely close the >, and provide additional content.
Goal: Find query strings that are created by concatenation.
Note
If you have completed the “Example: Query injection” slide deck which was part of the previous course, this example will look familiar to you.
To understand the scope of this vulnerability, consider what would happen if a malicious user could provide the following as the content of the individualURI variable:
“http://vivoweb.org/ontology/core#FacultyMember> ?p ?o . FILTER regex("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!", "(.*a){50}") } #
We can write a simple query that finds string concatenations that occur in calls to SPARQL query APIs.
.. rst-class:: build
.. literalinclude:: ../query-examples/java/data-flow-java-1.ql :language: ql
Note
This is similar, but not identical, to the formulation we had in the previous training deck. It has been rewritten to make it easier for the next step.
Query finds a CVE reported by Semmle (CVE-2019-6986), plus one other result, but misses other opportunities where:
- String concatenation occurs on a different line in the same method.
- String concatenation occurs in a different method.
- String concatenation occurs through
StringBuildersor similar.- Entirety of user input is provided as the query.
We want to improve our query to catch more of these cases.
Note
For more details of the CVE, see: https://github.com/Semmle/SecurityExploits/tree/master/vivo-project/CVE-2019-6986
As an example, consider this SPARQL query call:
String queryString = "ASK { <" + individualURI + "> ?p ?o }";
sparqlAskQuery(queryString);
Here the concatenation occurs before the call, so the existing query would miss this - the string concatenation does not occur directly as the first argument of the call.
Refine the query to find string concatenation that occurs in the same method, but a different line.
Hint: Use DataFlow::localFlow to assert that the result flows to the SPARQL call argument, using DataFlow::exprNode to get the data flow nodes for the relevant expression nodes.
.. rst-class:: build
.. literalinclude:: ../query-examples/java/data-flow-java-2.ql :language: ql
In Java, strings are often created using StringBuilder and StringBuffer classes. For example:
StringBuilder queryBuilder = new StringBuilder(); queryBuilder.add("ASK { <"); queryBuilder.add(individualURI); queryBuilder.add("> ?p ?o }"); sparqlAskQuery(queryBuilder);
Exercise: Refine the query to consider strings created from StringBuilder and StringBuffer classes as sources of concatenation.
- We are still missing possible results.
- Concatenation that occurs outside the enclosing method.
- Instead, let’s turn the problem around and find user-controlled data that flows into a
printfformat argument, potentially through calls. - This needs :doc:`global data flow <global-data-flow-java>`.