Add operators during parsing

Question

I try a bit the parser generators with Haskell, using Happy here. I used to use parser combinators before, such as Parsec, and one thing I can't achieve now with that is the dynamic addition (during execution) of new externally defined operators. For example, Haskell has some basic operators, but we can add more, giving them precedence and fixity. So I would like to know how to reproduce this with Happy, following the Haskell design (view example code bellow to be parsed), if it is not trivially feasible, or if it should perhaps be done through the parser combinators.

-- Adding the new operator
infixl 5 ++

(++) :: [a] -> [a] -> [a]
[]     ++ ys = ys
(x:xs) ++ ys = x : xs ++ ys

-- Using the new operator taking into consideration fixity and precedence during parsing
example = "Hello, " ++ "world!"

Hint: you can inspect GHC's grammar: github.com/ghc/ghc/blob/master/compiler/parser/Parser.y — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Jul 10, 2019 at 11:05
IIRC, GHC parses all infix operators ignoring fixities, and then later on transforms the AST according to the fixities. Essentially, precedence and associativity are fixed after parsing. I don't know if this is actually easier -- maybe it is. — chi
– chi, Commented Jul 10, 2019 at 11:43
Well, it happens during type checking, at the "right" time. The general idea is: GHC already infers 10 :: Num a => a so, if we annotate 10 in that way, we do not tell GHC anything it does not already know -- it's a no-op. Instead, the type of f is inferred (because of the MR) to something else, so annotating that matters. The full explanation is a bit tricky, and requires digging inside the type system, GHC Core, MR, and some other gory details. — chi
– chi, Commented Jul 10, 2019 at 20:11

rici · Accepted Answer · 2019-07-10 15:35:32Z

Haskell only allows a few precedence levels. So you don't strictly need a dynamic grammar; you could just write out the grammar using precedence-level token classes instead of individual operators, leaving the lexer with the problem of associating a given symbol with a given precedence level.

In effect, that moves the dynamic addition of operators to the lexer. That's a slightly uncomfortable design decision, although in some cases it may not be too difficult to implement. It's uncomfortable design because it requires semantic feedback to the lexer; at a minimum, the lexer needs to consult the symbol table to figure out what type of token it is looking at. In the case of Haskell, at least, this is made more uncomfortable by the fact that fixity declarations are scoped, so in order to track fixity information, the lexer would also need to understand scoping rules.

In practice, most languages which allow program text to define operators and operator precedence work in precisely the same way the Haskell compiler does: expressions are parsed by the grammar into a simple list of items (where parenthesized subexpressions count as a single item), and in a later semantic analysis the list is rearranged into an actual tree taking into account precedence and associativity rules, using a simple version of the shunting yard algorithm. (It's a simple version because it doesn't need to deal with parenthesized subconstructs.)

There are several reasons for this design decision:

As mentioned above, for the lexer to figure out what the precedence of a symbol is (or even if the symbol is an operator with precedence) requires a close collaboration between the lexer and the parser, which many would say violates separation of concerns. Worse, it makes it difficult or impossible to use parsing technologies without a small fixed lookahead, such as GLR parsers.
Many languages have more precedence levels than Haskell. In some cases, even the number of precedence levels is not defined by the grammar. In Swift, for example, you can declare your own precedence levels, and you define a level not with a number but with a comparison to another previously defined level, leading to a partial order between precedence levels.

IMHO, that's actually a better design decision than Haskell, in part because it avoids the ambiguity of a precedence level having both left- and right-associative operators, but more importantly because the relative precedence declarations both avoid magic numbers and allow the parser to flag the ambiguous use of operators from different modules. In other words, it does not force a precedence declaration to mechanically apply to any pair of totally unrelated operators; in this sense it makes operator declarations easier to compose.
The grammar is much simpler, and arguably easier to understand since most people anyway rely on precedence tables rather than analysing grammar productions to figure out how operators interact with each other. In that sense, having precedence set by the grammar is more a distraction than documentation. See the C++ grammar as a good example of why precedence tables are easier to read than grammars.

On the other hand, as the C++ grammar also illustrates, a grammar is a lot more general than simple precedence declarations because it can express asymmetric precedences. (The grammar doesn't always express these gracefully, but they can be expressed.) A classic example of an asymmetric precedence is a lambda construct (λ ID expr) which binds very loosely to the right and very tightly to the left: the expected parse of a ∘ λ b b ∘ a does not ever consult the associativity of ∘ because the λ comes between them.

In practice, there is very little cost to building the tree later. The algorithm to build the tree is well-known, simple and cheap.

I really appreciate the way Swift is solving this problem, thank you! If I want to try to implement it, then do you think I should also communicate the parser and lexer, as you describe in your first part of the answer?
@foxy: no, as I suggest in the answer the best route is to make the expression tree after the parse, pethsps in the semantic action for a parenthesised list of items.

Collectives™ on Stack Overflow

Add operators during parsing

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related