0

I'm trying to write a simple grammar to parse text with multi-line sections.. I'm not able to wrap my head around how to do it. Here's the grammar that I've written so far - would appreciate any help here.

ps: I realize that lark is overkill for this problem but this is just a very simplified version of what I'm trying to parse.

from unittest import TestCase
from lark import Lark

text = '''
[section 1]
line 1.1
line 1.2

[section 2]
line 2.1
'''

class TestLexer(TestCase):

    def test_basic(self):
        p = Lark(r"""

            _LB: "["
            _RB: "]"
            _NL: /\n/+
            name: /[^]]+/
            content: /.+/s

            section: _NL* _LB name _RB _NL* content
            doc: section*

        """, parser='lalr', start='doc')


        parsed = p.parse(text)

1 Answer 1

2

The problem is that your content regex can match anywhere with any length, meaning that the rest of the grammar can't work correctly. Instead you should have a terminal restricted to a single line and give it a lower priority then the rest.

p = Lark(r"""

    _NL: /\n/+
    name: /[^]]+/
    content: (ANY_LINE _NL)+
    ANY_LINE.-1: /.+/

    section: _NL* "[" name "]" _NL* content
    doc: section*

""", parser='lalr', start='doc')

You may need some extra work now to convert the content rule into exactly what you want, but since you claim that this isn't actually your exact problem I wont bother with that here.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.