recognizing multi-line sections with lark grammar

Question

I'm trying to write a simple grammar to parse text with multi-line sections.. I'm not able to wrap my head around how to do it. Here's the grammar that I've written so far - would appreciate any help here.

ps: I realize that lark is overkill for this problem but this is just a very simplified version of what I'm trying to parse.

from unittest import TestCase
from lark import Lark

text = '''
[section 1]
line 1.1
line 1.2

[section 2]
line 2.1
'''

class TestLexer(TestCase):

    def test_basic(self):
        p = Lark(r"""

            _LB: "["
            _RB: "]"
            _NL: /\n/+
            name: /[^]]+/
            content: /.+/s

            section: _NL* _LB name _RB _NL* content
            doc: section*

        """, parser='lalr', start='doc')


        parsed = p.parse(text)

MegaIng · Accepted Answer · 2022-01-17 14:56:52Z

2

The problem is that your content regex can match anywhere with any length, meaning that the rest of the grammar can't work correctly. Instead you should have a terminal restricted to a single line and give it a lower priority then the rest.

p = Lark(r"""

    _NL: /\n/+
    name: /[^]]+/
    content: (ANY_LINE _NL)+
    ANY_LINE.-1: /.+/

    section: _NL* "[" name "]" _NL* content
    doc: section*

""", parser='lalr', start='doc')

You may need some extra work now to convert the content rule into exactly what you want, but since you claim that this isn't actually your exact problem I wont bother with that here.

edited Jan 17, 2022 at 14:56

answered Jan 17, 2022 at 7:32

MegaIng

7,9542 gold badges24 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

recognizing multi-line sections with lark grammar

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related