-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Expand file tree
/
Copy pathCFG.qll
More file actions
423 lines (402 loc) · 17.4 KB
/
CFG.qll
File metadata and controls
423 lines (402 loc) · 17.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
/**
* Provides classes for working with a CFG-based program representation.
*
* ## Overview
*
* Each `StmtContainer` (that is, function or toplevel) has an intra-procedural
* CFG associated with it, which is composed of `ControlFlowNode`s under a successor
* relation exposed by predicates `ControlFlowNode.getASuccessor()` and
* `ControlFlowNode.getAPredecessor()`.
*
* Each CFG has designated entry and exit nodes with types
* `ControlFlowEntryNode` and `ControlFlowExitNode`, respectively, which are the only two
* subtypes of `SyntheticControlFlowNode`. All `ControlFlowNode`s that are _not_
* `SyntheticControlFlowNode`s belong to class `ConcreteControlFlowNode`.
*
* The predicate `ASTNode.getFirstControlFlowNode()` relates AST nodes
* to the first (concrete) CFG node in the sub-graph of the CFG
* corresponding to the node.
*
* Most statement containers also have a _start node_, obtained by
* `StmtContainer.getStart()`, which is the unique CFG node at which execution
* of the toplevel or function begins. Unlike the entry node, which is a synthetic
* construct, the start node corresponds to an AST node: for instance, for
* toplevels, it is the first CFG node of the first statement, and for functions
* with parameters it is the CFG node corresponding to the first parameter.
*
* Empty toplevels do not have a start node, since all their CFG nodes are
* synthetic.
*
* ## CFG Nodes
*
* Non-synthetic CFG nodes exist for six kinds of AST nodes, representing various
* aspects of the program's runtime semantics:
*
* - `Expr`: the CFG node represents the evaluation of the expression,
* including any side effects this may have;
* - `Stmt`: the CFG node represents the execution of the statement;
* - `Property`: the CFG node represents the assignment of the property;
* - `PropertyPattern`: the CFG node represents the matching of the property;
* - `MemberDefinition`: the CFG node represents the definition of the member
* method or field;
* - `MemberSignature`: the CFG node represents the point where the signature
* is declared, although this has no effect at runtime.
*
* ## CFG Structure
*
* ### Expressions
*
* For most expressions, the successor relation visits sub-expressions first,
* and then the expression itself, representing the order of evaluation at
* runtime. For example, the CFG for the expression `23 + 19` is
*
* <pre>
* … → [23] → [19] → [23 + 19] → …
* </pre>
*
* In particular, this means that `23` is the first CFG node of the expression
* `23 + 19`.
*
* Similarly, for assignments the left hand side is visited first, then
* the right hand side, then the assignment itself:
*
* <pre>
* … → [x] → [y] → [x = y] → …
* </pre>
*
* For properties, the name expression is visited first, then the value,
* then the default value, if any. The same principle applies for getter
* and setter properties: in this case, the "value" is simply the accessor
* function, and there is no default value.
*
* There are only a few exceptions, generally for cases where the value of
* the whole expression is the value of one of its sub-expressions. That
* sub-expression then comes last in the CFG:
*
* - Parenthesized expression:
* <pre>
* … → [(x)] → [x] → …
* </pre>
* - Conditional expressions:
* <pre>
* … → [x ? y : z] → [x] ┬→ [y] → … <br>
* └→ [z] → …
* </pre>
* - Short-circuiting operator `&&` (same for `||`):
* <pre>
* … → [x && y] → [x] → … <br>
* ↓ <br>
* [y] → …
* </pre>
* - Sequence/comma expressions:
* <pre>
* … → [x, y] → [x] → [y] → …
* </pre>
*
* Finally, array expressions and object expressions also precede their
* sub-expressions in the CFG to model the fact that the new array/object
* is created before its elements/properties are evaluated:
*
* <pre>
* … → [{ x: 42 }] → [x] → [42] → [x : 42] → …
* </pre>
*
* ### Statements
*
* For most statements, the successor relation visits the statement first and then
* its sub-expressions and sub-statements.
*
* For example, the CFG of a block statement first visits the individual statements,
* then the block statement itself.
*
* Similarly, the CFG for an `if` statement first visits the statement itself, then
* the condition. The condition, in turn, has the "then" branch as one of its successors
* and the "else" branch (if it exists) or the next statement after the "if" (if it does not)
* as the other:
*
* <pre>
* … → [if (x) s1 else s2] → [x] ┬→ [s1] → …
* └→ [s2] → …
* </pre>
*
* For loops, the CFG reflects the order in which the loop test and the body are
* executed.
*
* For instance, the CFG of a `while` loop starts with the statement itself, followed by
* the condition. The condition has two successors: the body, and the statement following
* the loop. The body, in turn, has the condition as its successor. This reflects the fact
* that `while` loops first test their condition before executing their body:
*
* <pre>
* … → [while (x) s] → [x] → …
* ⇅
* [s]
* </pre>
*
* On the other hand, `do`-`while` loops first execute their body before testing their condition:
*
* <pre>
* … → [do s while (x)] → [s] ⇄ [x] → …
* </pre>
*
* The CFG of a for loop starts with the loop itself, followed by the initializer expression
* (if any), then the test expression (if any). The test expression has two successors: the
* body, and the statement following the loop. The body, in turn, has the update expression
* (if any) as its successor, and the update expression has the test expression as its only
* successor:
*
* <pre>
* … → [for(i;t;u) s] → [i] → [t] → …
* ↙ ↖
* [s] → [u]
* </pre>
*
* The CFG of a for-in loop `for(x in y) s` starts with the loop itself, followed by the
* iteration domain `y`. That node has two successors: the iterator `x`, and the statement
* following the loop (modeling early exit in case `y` is empty). After the iterator `x`
* comes the loop body `s`, which again has two successors: the iterator `x` (modeling the
* case where there are more elements to iterate over), and the statement following the loop
* (modeling the case where there are no more elements to iterate):
*
* <pre>
* … → [for(x in y) s] → [y] → …
* ↓ ↑
* [x] ⇄ [s]
* </pre>
*
* For-of loops are the same.
*
* Finally, `return` and `throw` statements are different from all other statement types in
* that for them the statement itself comes _after_ the operand, reflecting the fact that
* the operand is evaluated before the return or throw is initiated:
*
* <pre>
* … → [x] → [return x;] → …
* </pre>
*
* ### Unstructured control flow
*
* Unstructured control flow is modeled in the obvious way: `break` and `continue` statements
* have as their successor the next statement that is executed after the jump; `throw`
* statements have the nearest enclosing `catch` clause as their successor, or the exit node
* of the enclosing container if there is no enclosing `catch`; `return` statements have the
* exit node of the enclosing container as their successor.
*
* In all cases, the control flow may be intercepted by an intervening `finally` block. For
* instance, consider the following code snippet:
*
* <pre>
* try {
* if (x)
* return;
* s
* } finally {
* t
* }
* u
* </pre>
*
* Here, the successor of `return` is not the exit node of the enclosing container, but instead
* the `finally` block. The last statement of the `finally` block (here, `t`) has two successors:
* `u` to model the case where `finally` was entered from `s`, and the exit node of the enclosing
* container to model the case where the `return` is resumed after the `finally` block.
*
* Note that `finally` blocks can lead to imprecise control flow modeling since the `finally`
* block resumes the action of _all_ statements it intercepts: in the above example, the CFG
* not only models the executions `return` → `finally` → `t` → `exit` and
* `s` → `finally` → `t` → `u`, but also allows the path `return` →
* `finally` → `t` → `u`, which does not correspond to any actual execution.
*
* The CFG also models the fact that certain kinds of expressions (calls, `new` expressions,
* property accesses and `await` expressions) can throw exceptions, but _only_ if there is
* an enclosing `try`-`catch` statement.
*
* ### Function preambles
*
* The CFG of a function starts with its entry node, followed by a _preamble_, which is a part of
* the CFG that models parameter passing and function hoisting. The preamble is followed by the
* function body, which in turn is followed by the exit node.
*
* For function expressions, the preamble starts with the function name, if any, to reflect the
* fact that the function object is bound to that name inside the scope of the function. Next,
* for both function expressions and function declarations, the parameters are executed in sequence
* to represent parameter passing. If a parameter has a default value, that value is visited before
* the parameter itself. Finally, the CFG nodes corresponding to the names of all hoisted functions
* inside the outer function body are visited in lexical order. This reflects the fact that hoisted
* functions are initialized before the body starts executing, but _after_ parameters have been
* initialized.
*
* For instance, consider the following function declaration:
*
* <pre>
* function outer(x, y = 42) {
* s
* function inner() {}
* t
* }
* </pre>
*
* Its CFG is
*
* <pre>
* [entry] → [x] → [42] → [y] → [inner] → [s] → [function inner() {}] → [t] → [exit]
* </pre>
*
* Note that the function declaration `[function inner() {}]` as a whole is part of the CFG of the
* body of `outer`, while its function identifier `inner` is part of the preamble.
*
* ### Toplevel preambles
*
* Similar to functions, toplevels (that is, modules, scripts or event handlers) also have a
* preamble. For ECMAScript 2015 modules, all import specifiers are traversed first, in lexical
* order, reflecting the fact that imports are resolved before execution of the module itself
* begins; next, for all toplevels, the names of hoisted functions are traversed in lexical order
* (as for functions). Afterwards, the CFG continues with the body of the toplevel, and ends
* with the exit node.
*
* As an example, consider the following module:
*
* ```
* s
* import x as y from 'foo';
* function f() {}
* t
* ```
*
* Its CFG is
*
* <pre>
* [entry] → [x as y] → [f] → [s] → [import x as y from 'foo';] → [function f() {}] → [t] → [exit]
* </pre>
*
* Note that the `import` statement as a whole is part of the CFG of the body, while its single
* import specifier `x as y` forms part of the preamble.
*/
overlay[local?]
module;
import javascript
private import internal.StmtContainers
/**
* A node in the control flow graph, which is an expression, a statement,
* or a synthetic node.
*/
class ControlFlowNode extends @cfg_node, Locatable, NodeInStmtContainer {
/** Gets a node succeeding this node in the CFG. */
ControlFlowNode getASuccessor() { successor(this, result) }
/** Gets a node preceding this node in the CFG. */
ControlFlowNode getAPredecessor() { this = result.getASuccessor() }
/** Holds if this is a node with more than one successor. */
predicate isBranch() { strictcount(this.getASuccessor()) > 1 }
/** Holds if this is a node with more than one predecessor. */
predicate isJoin() { strictcount(this.getAPredecessor()) > 1 }
/**
* Holds if this is a start node, that is, the CFG node where execution of a
* toplevel or function begins.
*/
predicate isStart() { this = any(StmtContainer sc).getStart() }
/**
* Holds if this is a final node of `container`, that is, a CFG node where execution
* of that toplevel or function terminates.
*/
predicate isAFinalNodeOfContainer(StmtContainer container) {
this.getASuccessor().(SyntheticControlFlowNode).isAFinalNodeOfContainer(container)
}
/**
* Holds if this is a final node, that is, a CFG node where execution of a
* toplevel or function terminates.
*/
final predicate isAFinalNode() { this.isAFinalNodeOfContainer(_) }
/**
* Holds if this node is unreachable, that is, it has no predecessors in the CFG.
* Entry nodes are always considered reachable.
*
* Note that in a block of unreachable code, only the first node is unreachable
* in this sense. For instance, in
*
* ```
* function foo() { return; s1; s2; }
* ```
*
* `s1` is unreachable, but `s2` is not.
*/
predicate isUnreachable() {
forall(ControlFlowNode pred | pred = this.getAPredecessor() |
pred.(SyntheticControlFlowNode).isUnreachable()
)
// note the override in ControlFlowEntryNode below
}
/** Gets the basic block this node belongs to. */
BasicBlock getBasicBlock() { this = result.getANode() }
/**
* For internal use.
*
* Gets a string representation of this control-flow node that can help
* distinguish it from other nodes with the same `toString` value.
*/
string describeControlFlowNode() {
if this = any(MethodDeclaration mem).getBody()
then result = "function in " + any(MethodDeclaration mem | mem.getBody() = this)
else
if this instanceof @decorator_list
then result = "parameter decorators of " + this.(AstNode).getParent().(Function).describe()
else result = this.toString()
}
}
/**
* A synthetic CFG node that does not correspond to a statement or expression;
* examples include guard nodes and entry/exit nodes.
*/
class SyntheticControlFlowNode extends @synthetic_cfg_node, ControlFlowNode { }
/** A synthetic CFG node marking the entry point of a function or toplevel script. */
class ControlFlowEntryNode extends SyntheticControlFlowNode, @entry_node {
override predicate isUnreachable() { none() }
override string toString() {
result = "entry node of " + pragma[only_bind_out](this.getContainer()).toString()
}
}
/** A synthetic CFG node marking the exit of a function or toplevel script. */
class ControlFlowExitNode extends SyntheticControlFlowNode, @exit_node {
override predicate isAFinalNodeOfContainer(StmtContainer container) {
exit_cfg_node(this, container)
}
override string toString() {
result = "exit node of " + pragma[only_bind_out](this.getContainer()).toString()
}
}
/**
* A synthetic CFG node recording that some condition is known to hold
* at this point in the program.
*/
class GuardControlFlowNode extends SyntheticControlFlowNode, @guard_node {
/** Gets the expression that this guard concerns. */
Expr getTest() { guard_node(this, _, result) }
/**
* Holds if this guard dominates basic block `bb`, that is, the guard
* is known to hold at `bb`.
*/
predicate dominates(ReachableBasicBlock bb) {
this = bb.getANode()
or
exists(ReachableBasicBlock prev | prev.strictlyDominates(bb) | this = prev.getANode())
}
}
/**
* A guard node recording that some condition is known to be truthy or
* falsy at this point in the program.
*/
class ConditionGuardNode extends GuardControlFlowNode, @condition_guard {
/** Gets the value recorded for the condition. */
boolean getOutcome() {
guard_node(this, 0, _) and result = false
or
guard_node(this, 1, _) and result = true
}
override string toString() { result = "guard: " + this.getTest() + " is " + this.getOutcome() }
}
/**
* A CFG node corresponding to a program element, that is, a CFG node that is
* not a `SyntheticControlFlowNode`.
*/
class ConcreteControlFlowNode extends ControlFlowNode {
ConcreteControlFlowNode() { not this instanceof SyntheticControlFlowNode }
}