2

I would like to use the very good package pyparsing to parse the following kind of strings.

atomname * and atomindex 1,2,3

atomname xxx,yyy or atomtype rrr,sss

thiol

not atomindex 1,2,3

not (atomindex 4,5,6) or atomname *

Based on this parsing, I will link the matches to specific function calls that will perform a selection of atoms.

All the selection keywords (atomname,atomindex,thiol ...) are stored in a list (i.e. selkwds).

I tried this but it failed:

keyword = oneOf(selkwds,caseless=True).setParseAction(self.__parse_keyword)

func_call = Forward()

func_call << (keyword + commaSeparatedList).setParseAction(self.__parse_expression)

func_call = operatorPrecedence(func_call, [(NOT, 1, opAssoc.RIGHT, self.__not),
                                           (AND, 2, opAssoc.LEFT , self.__and),
                                           (OR , 2, opAssoc.LEFT , self.__or)])

where self._and, self._or, self._not, self._parse_keyword, self._parse_expression are method that will modify the token for a future eval of the transformed string.

Would you have any idea how to solve this ?

thanks a lot

Eric

1 Answer 1

2

See embedded comments in this modified version of your parser:

from pyparsing import *

selkwds = "atomname atomindex atomtype thiol".split()
func_name = MatchFirst(map(CaselessKeyword, selkwds))
NOT,AND,OR = map(CaselessKeyword,"NOT AND OR".split())
keyword = func_name | NOT | AND | OR

func_call = Forward()

integer = Word(nums).setParseAction(lambda t: int(t[0]))
alphaword = Word(alphas,alphanums)

# you have to be specific about what kind of things can be an arg,
# otherwise, an argless function call might process the next
# keyword or boolean operator as an argument;
# this kind of lookahead is commonly overlooked by those who
# assume that the parser will try to do some kind of right-to-left
# backtracking in order to implicitly find a token that could be
# mistaken for the current repetition type; pyparsing is purely
# left-to-right, and only does lookahead if you explicitly tell it to
# I assume that a func_call could be a function argument, otherwise
# there is no point in defining it as a Forward
func_arg = ~keyword + (integer | func_call | alphaword)

# add Groups to give structure to your parsed data - otherwise everything
# just runs together - now every function call parses as exactly two elements:
# the keyword and a list of arguments (which may be an empty list, but will
# still be a list)
func_call << Group(func_name + Group(Optional(delimitedList(func_arg) | '*')))

# don't name this func_call, its confusing with what you've 
# already defined above
func_call_expr = operatorPrecedence(func_call, [(NOT, 1, opAssoc.RIGHT),
                                           (AND, 2, opAssoc.LEFT),
                                           (OR , 2, opAssoc.LEFT)])

Let's test it out:

tests = """\
    atomname * and atomindex 1,2,3
    atomname xxx,yyy or atomtype rrr,sss
    thiol
    not atomindex 1,2,3
    not (atomindex 4,5,6) or atomname *""".splitlines()

for test in tests:
    print test.strip()
    print func_call_expr.parseString(test).asList()
    print

prints:

atomname * and atomindex 1,2,3
[[['atomname', ['*']], 'AND', ['atomindex', [1, 2, 3]]]]

atomname xxx,yyy or atomtype rrr,sss
[[['atomname', ['xxx', 'yyy']], 'OR', ['atomtype', ['rrr', 'sss']]]]

thiol
[['thiol', []]]

not atomindex 1,2,3
[['NOT', ['atomindex', [1, 2, 3]]]]

not (atomindex 4,5,6) or atomname *
[[['NOT', ['atomindex', [4, 5, 6]]], 'OR', ['atomname', ['*']]]]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.