6

I'm having trouble building the collection of sets of items for LR(1) parsers with a grammar containing epsilon productions. For example, given the following grammar (where eps stands for epsilon)

S -> a S U
U -> b
  |  eps

State0 would be

S' -> .S, $
S -> .a S U, $

Moving with 'a' from State0 would give the following state, let's call it State2

S -> a .S U, $
S -> .a S U, $/???

In order to have the lookahead for the second item of State2 I need to calculate FIRST(U$). I know that FIRST(U) = {'b', eps}. My first question is: the lookaheads of the second item of State2 are $ and 'b'? Since U can be eps, my brain tells me that I can have $ as a lookahead as well, not just 'b'. It would have been just 'b' if FIRST(U) would have been just {'b'}. Is that correct?

Second question: at some point I will have a state as the following one

S -> a S .U, $
U -> .b, $
U -> .eps, $

What do I do here? Do I need to move with eps and have a set with the item U -> eps., $? What if I have another terminal as lookahead, i.e. X -> .eps, a/$? And if I move, ending up having a set of the form X -> eps., $, do I reduce?

And more: do I need to insert eps in the parse table as a symbol?

Thanks

4
  • 1
    What are you using as a textbook? It surprises me that there are no examples with nullable non-terminals. Commented May 17, 2018 at 16:24
  • 1
    The Dragon Book. I don't really know if I missed some parts, but I didn't read about how to handle these kind of situations specifically... Commented May 17, 2018 at 19:57
  • 1
    Yup. I dug out my copy of the Dragon Book and indeed there is no worked example of an LR automaton with an ε-production. Still, the book is clear that ε represents the empty sequence of symbols, not a special symbol. (Indeed, the book always uses Greek letters for sequences of symbols and Roman letters for individual symbols, which has become a standard convention.)... Commented May 19, 2018 at 2:05
  • Also, its definition of FIRST(α) is, in effect, "the set of possible first characters in a derivation of α plus ε if α could derive the empty set, which means that it is a subset of Σ ∪ ε. But in the case of FIRST(β$) it is clear that Β$ cannot derive the empty sequence, so the FIRST set must be a subset of Σ. Commented May 19, 2018 at 2:05

1 Answer 1

7
  1. FIRST(U$) means "the set of symbols which could be first in a derivation of U$". Clearly, if U can derive the empty string, $ must be part of this set. The end-of-input marker $ ensures that we never have to worry about epsilons in the FIRST sets. (If we were doing LR(k) instead of LR(1), we would use k end markers so that all the strings in FIRSTk had length k.

  2. The item associated with U → (or with U → ε if you insist) is U → • . In other words, it is reducible and should trigger a reduce action on matching lookahead.

  3. ε is not a symbol; we only use it (sometimes) to make the empty string visible. But the empty string is empty.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.