LR(1) parsing table with epsilon productions

Question

I'm having trouble building the collection of sets of items for LR(1) parsers with a grammar containing epsilon productions. For example, given the following grammar (where eps stands for epsilon)

S -> a S U
U -> b
  |  eps

State0 would be

S' -> .S, $
S -> .a S U, $

Moving with 'a' from State0 would give the following state, let's call it State2

S -> a .S U, $
S -> .a S U, $/???

In order to have the lookahead for the second item of State2 I need to calculate FIRST(U$). I know that FIRST(U) = {'b', eps}. My first question is: the lookaheads of the second item of State2 are $ and 'b'? Since U can be eps, my brain tells me that I can have $ as a lookahead as well, not just 'b'. It would have been just 'b' if FIRST(U) would have been just {'b'}. Is that correct?

Second question: at some point I will have a state as the following one

S -> a S .U, $
U -> .b, $
U -> .eps, $

What do I do here? Do I need to move with eps and have a set with the item U -> eps., $? What if I have another terminal as lookahead, i.e. X -> .eps, a/$? And if I move, ending up having a set of the form X -> eps., $, do I reduce?

And more: do I need to insert eps in the parse table as a symbol?

Thanks

What are you using as a textbook? It surprises me that there are no examples with nullable non-terminals. — rici
– rici, Commented May 17, 2018 at 16:24
The Dragon Book. I don't really know if I missed some parts, but I didn't read about how to handle these kind of situations specifically... — Astinog
– Astinog, Commented May 17, 2018 at 19:57
Yup. I dug out my copy of the Dragon Book and indeed there is no worked example of an LR automaton with an ε-production. Still, the book is clear that ε represents the empty sequence of symbols, not a special symbol. (Indeed, the book always uses Greek letters for sequences of symbols and Roman letters for individual symbols, which has become a standard convention.)... — rici
– rici, Commented May 19, 2018 at 2:05
Also, its definition of FIRST(α) is, in effect, "the set of possible first characters in a derivation of α plus ε if α could derive the empty set, which means that it is a subset of Σ ∪ ε. But in the case of FIRST(β$) it is clear that Β$ cannot derive the empty sequence, so the FIRST set must be a subset of Σ. — rici
– rici, Commented May 19, 2018 at 2:05

rici · Accepted Answer · 2019-09-20 02:48:09Z

7

FIRST(U$) means "the set of symbols which could be first in a derivation of U$". Clearly, if U can derive the empty string, $ must be part of this set. The end-of-input marker $ ensures that we never have to worry about epsilons in the FIRST sets. (If we were doing LR(k) instead of LR(1), we would use k end markers so that all the strings in FIRST_k had length k.
The item associated with U → (or with U → ε if you insist) is U → • . In other words, it is reducible and should trigger a reduce action on matching lookahead.
ε is not a symbol; we only use it (sometimes) to make the empty string visible. But the empty string is empty.

edited Sep 20, 2019 at 2:48

answered May 17, 2018 at 16:24

rici

243k30 gold badges263 silver badges364 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

LR(1) parsing table with epsilon productions

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related