0

I'm trying to create a logical boolean expression parser that evaluates if the words in the expression correlate to the content of the document.

After many many hours of research (I had no idea about parsers and all their theory before this), I find myself unable to successfully create a valid grammar that does not suffer from left hand recursion, or just straight up something that works correctly, and I'm getting quite frustrated as none of this theory was explained to us in class.

Basically, I need to parse expressions of the likes of: ({w1 w2 w3} & !w4) | (w5 & "mark likes food"), where {} represents a set of words that must be in the document, and "" a string literal that must be in the document.

I came up with the tokens [AND, OR, NOT, LPAREN, RPAREN, LSET, RSET, LSEQ, RSEQ, WRD]. So that the expression (w1 & {w2 w2}) would become [LPAREN WRD AND LSET WRD WRD RSET RPAREN].

But I'm having trouble with the grammar that makes it possible to be parsed. On my very first try I came up with:

S -> E

E -> T AND T | T OR T | T

T -> LSET W RSET | LSEQ W RSEQ | LPAREN E RPAREN |NOT E |WRD

W -> WRD* [don't really know how to write this formally, but this only accepts an array of WRD tokens until RPAREN or RSEQ depending on which one it started with.]

This obviously doesn't work at all because the expression isn't evaluated entirely (it stops after first return), the parenthesis are not correctly taken care of, among other problems. It has been a couple of days I can't seem to come up with something useful, pls help.

For the code I took inspiration in this but I think it doesn't really fit my problem.

code (I've tested the tokenizer and it works correctly):

public class BoolExprParser {

    private final String expression;
    private final BoolExprTokenizer tokenizer;
    private BoolExprTokenizer.Token currentToken;

    private void advance() {
        currentToken = tokenizer.getNext();
    }

    private boolean currentEquals(BoolExprTokenizer.Token t) {
        return currentToken == t;
    }

    private boolean parse(Document doc) {
        advance();
        boolean val = expr(doc);
        if (!currentEquals(BoolExprTokenizer.Token.END)) {
            // error
        }

        return val;
    }

    private boolean expr(Document doc) {
        boolean leftExpr = subExpr(doc);
        switch (currentToken) {
            case AND:
                advance();
                boolean rightExpr = subExpr(doc);
                return leftExpr && rightExpr;
            case OR:
                advance();
                rightExpr = subExpr(doc);
                return leftExpr || rightExpr;
            case END:
                return leftExpr;
            default:
                //error
        }

        return false;
    }

    private boolean subExpr(Document doc) {
        switch (currentToken) {
            case NOT:
                advance();
                boolean result = expr(doc);
                return !result;
            case WRD:
                advance();
                return doc.isWord(tokenizer.getWord());
            case LSet:
                advance();
                boolean wordsInsideSet = wordsSet(doc);
                if (!currentEquals(BoolExprTokenizer.Token.RSet)) {
                    // error
                } else {
                    advance();
                }
                return wordsInsideSet;
            case LP:
                advance();
                boolean exprInside = expr(doc);
                if (!currentEquals(BoolExprTokenizer.Token.RP)) {
                    // error
                } else {
                    advance();
                }
                return exprInside;
            default:
                // error
        }

        return false;
    }

    private boolean wordsSet(Document doc) {
        boolean validToken = currentEquals(BoolExprTokenizer.Token.WRD);
        boolean isInDoc = true;

        while (validToken) {
            if (isInDoc) isInDoc = doc.isWord(tokenizer.getWord());
            advance();
            validToken = currentEquals(BoolExprTokenizer.Token.WRD);
        }

        if (!currentEquals(BoolExprTokenizer.Token.RSet)) {
            // error
        }

        return isInDoc;
    }
}

`

4
  • I suspect that a parser is more technology than your project needs. If I understand your example correctly, ({w1 w2 w3} & !w4) | (w5 & "mark likes food") translates to; get five words from some list. The document must either contain w1, w2, and w3 in any order and not w4 or w5 and the exact phrase "mark likes food". If my understanding is correct, you just parse the String and put the words from some list into a java.util.List and taking one word or phrase at a time, search the document for the presence or absence of that word or phrase. Commented Nov 6, 2022 at 15:35
  • Thanks for the response. I have thought about this, but I don't see any other way. How would I be able to correctly evaluate very complex expressions in the correct order, following the right logic, nested parentheses, etc., if not with a parser? Commented Nov 6, 2022 at 16:22
  • You may not have covered this in class, but one way to handle complex expressions is with Reverse Polish notation or postfix notation. Commented Nov 7, 2022 at 0:33
  • Managed to implement it with a shunting yard, didn't know it existed before. You saved my ass buddy! Commented Nov 8, 2022 at 16:34

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.