1

I'm using lark but can't figure out how I could match all players name as they can be complex with lark rules?

Example pattern is "Seat {number}: {player} ({chips} in chips)" and I want all the values from each line as well.

from lark import Lark, lexer
gram = r"""
start: seats+

seats: "Seat " seat_position player table_chips is_sitting_out? SPACE? NL
seat_position: NUMBER
is_sitting_out: " is sitting out"
table_chips: "(" chips " in chips"  (", " chips " bounty")? ")"
player: STRING_INNER
chips: "$"? (DECIMAL | NUMBERS)

NUMBERS: /[0-9]/+
NUMBER: /[0-9]/
DECIMAL: NUMBERS "." NUMBERS
SPACE: " "+
STRING_INNER: ": " /.*?/ " " 

CR : /\r/
LF : /\n/
NL: (CR? LF)+
"""
data = r"""Seat 1: Ruzzka(Rus) (1200 in chips)
Seat 1: Dladik Rzs38 (1200 in chips)
Seat 1: slum ^o_o^ (1200 in chips)
Seat 1: é=mc² (1200 in chips)
Seat 1: {’O_0`}/(nh) (1200 in chips)
Seat 1: °ÆND0c42Z4y° (1200 in chips)
Seat 1: $ salesovish (1200 in chips)
"""
parser = Lark(gram)
tree = parser.parse(data)
print(tree.pretty())
1
  • I guess the solution should be: player: / .*(?=\ \(\$?[0-9]+\.?[0-9]* in chips\))/ Commented Nov 23, 2021 at 15:04

1 Answer 1

1

The problem is that there is apparently no real rule where names end, making it very hard to parse it, since lark is mostly non backtracking for high speed.

I would actually guess that just using a regex on each line directly will be easier unless you also need to parse a more complicated structure than what you showed here. But lark is able to deal with this kind of arbitrary content, example here, but at a large performance penalty.

Here a solution without lark:

import re

regex = re.compile(r"Seat\s*(?P<number>\d+)\s*:\s*(?P<player>[^\n]+?)\s+\((?P<chips>\d+) in chips\)")

seats = []
for line in data.splitlines():
    match = regex.match(line)
    if match is not None:
        values = match.groupdict()
        seats.append((values["number"], values["player"], values["chips"]))

print(seats)

It appears from your grammar that you actually need to extract a bit more information (e.g. is_sitting_out and bounty). For that you can slightly change the regex to this:

Seat\s*(?P<number>\d+)\s*:\s*(?P<player>[^\n]+?)\s+\((?P<chips>\d+) in chips\s*(?:,\s*(?P<bounty>\d+)\s*bounty)?\)(?P<is_sitting_out> is sitting out)?

You can check if a player is sitting out via values['is_sitting_out'] is not None, and values['bounty'] will be None if there is no bounty.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.