0

I've started tackling coding problems to try and improve my skills.

I'm working on the 'Alien Dictionary' coding problem which, when given a sorted list of 'Alien Words' you need to determine the 'Alien Alphabet'. The alien alphabet is made up of Latin characters but in a different order than ours.

I've since learned there are more optimised ways of solving this which I will look into, but I want to see my instinctual approach through to completion.

My code does compile with c++20 and outputs the correct alphabet, however, I had to implement a 'hack' to cover an edge case which I explain in a code comment.

I can't quite wrap my head around why I needed the hack, or how to fix my code to not require it.

#include <iostream> // std::cout
#include <map> // std::map
#include <vector> // std::vector
#include <algorithm> // std::sort

/*
    Input: words[] = ["pbb", "bpku", "bpkb", "kbp", "kbu"];
    Expected Output: ['p', 'u', 'b', 'k'];
*/

typedef std::vector<std::string> WordList;
WordList alienWords = {"pbb", "bpku", "bpkb", "kbp", "kbu"};

typedef std::map<std::pair<char, char>, char> RuleBook;
RuleBook alienRuleBook;

typedef std::vector<char> Alphabet;
Alphabet alienAlphabet;

void populateAlphabet(Alphabet& alphabet, WordList& wordList) {
    alphabet.clear();
    for (int word = 0; word < wordList.size(); word++) {
        for (int letter = 0; letter < wordList[word].size(); letter++) {
            if(std::find(alphabet.begin(), alphabet.end(), wordList[word][letter]) == alphabet.end()) {
                alphabet.push_back(wordList[word][letter]);
            }
        }
    }
}

void generateRules(RuleBook& ruleBook, WordList& wordList){
    for (int firstWord = 0; firstWord < wordList.size(); firstWord++) {
        for (int secondWord = firstWord + 1; secondWord < wordList.size(); secondWord++) {
            if (secondWord == wordList.size()) break; 
            
            int letter = 0;

            for (; letter < wordList[firstWord].size(); letter++) {
                if (wordList[firstWord][letter] == wordList[secondWord][letter]) continue;
                
                ruleBook[{wordList[firstWord][letter], wordList[secondWord][letter]}] = '<';
                ruleBook[{wordList[secondWord][letter], wordList[firstWord][letter]}] = '>';
                break;
            }
        }
    }
}

// needs to return TRUE if 'l' should come before 'r'.
bool getRule(char l, char r) {
    switch(alienRuleBook[{l, r}]) {
        case '>': return false;
        case '<': return true;
    }
    std::cout << "ERROR! No rule found for: '" << l << "' vs '" << r << "'\n\n";
    
    // The below is a hack because I don't understand to fix the case of {'u', 'k'}
    // There's no 'discovered' rule saying 'u' comes before 'k' or 'k' comes after 'u'
    // even though we KNOW 'u' comes before 'b' and we know that 'b' comes before 'k'.
    return true;
}

void printAlphabet(Alphabet& alphabet){
    std::cout << "=== Alphabet ===" << "\n ";
    for(const auto it : alphabet)
        std::cout << it << " ";
    std::cout << "\n================\n\n";
}

void printRuleBook(RuleBook& ruleBook){
    std::cout << "=== Discovered Rules ===" << "\n";
    for(const auto it : ruleBook)
        std::cout << " " << it.first.first << " " << it.second << " " << it.first.second << '\n';
    std::cout << "================\n\n";
}

int main() {
    populateAlphabet(alienAlphabet, alienWords);
    
    generateRules(alienRuleBook, alienWords);
    
    std::sort(alienAlphabet.begin(), alienAlphabet.end(), getRule);
    
    printRuleBook(alienRuleBook);
    
    printAlphabet(alienAlphabet);
    
    return 0;
}
5
  • That's really more of a fundamental problem with how you generate your lookup - you only ever insert rules for comparisons you've already encountered, but I'd expect an "alphabet" lookup to be something like a std::unordered_map<char, int> - I input a character and get its "index" in the alphabet. Then you'd look up both characters and compare which one has the lower index Commented Aug 3, 2022 at 11:30
  • That is a fun problem, (not a real answer) but I had a go at it myself and approached it from a different angle : onlinegdb.com/LmMOGj2pP. At least it shows a different approach. Commented Aug 3, 2022 at 12:02
  • I'm not sure your solution works, as in some cases (like you mentioned) rule is defined implicitly. In this case you won't find it in the RuleBook. I would suggest to look at the problem as a directional graph (p->b->k; p->u->b; p->u) and then walk the tree starting from node that has no incoming connections + cut connections + continue from the next node that has no incoming connections Commented Aug 3, 2022 at 14:23
  • Your "corner case" is in fact a fundamental problem with your approach. I don't recommend pursuing it. Commented Aug 4, 2022 at 18:11
  • It does appear that I painted myself into a corner with this attempt. I thank you all for your guidance. Commented Aug 5, 2022 at 21:13

1 Answer 1

0

In order to implement the getRule function if there is no implicit rule for {a, b}, you should search for {a, x} = '>' where {x, b} = '>' or {a, x} = '<' where {x, b} = '<'

// in case if a > b, you search for "a > x and x > b". 
// In other words, if a is greater than x and x is greater than b,
// then a is greater than b.
a ... x ... b

// the similar is for case when a < b
b ... x ... a

In case if you cannot find {a, x} and {x, b}, you should search for {a, x} + {x, y} + {y, b}. The search depth can increase. I'm not sure that is good solution.

I would suggest to look at the problem as a directed graph:

  1. First walk through the ordered list of words and make a graph: p->b->k and p->u->b and (duplicated) p->u
  2. Then find the node that has no incoming connections (e.g. p). That will be the first char of the alphabet
  3. Then iterate its outgoing connections and find the node, that has no incoming connections except p. That will give you the second char (u).
  4. Then iterate over all connections of the second char u and find the node that has no other incoming connections except p and u. That will give you b
  5. And so on, and so on
Sign up to request clarification or add additional context in comments.

1 Comment

A directed graph sounds like a better approach and one I've seen in other people's answers and explanations of this problem. I'll do some learning and have another crack at it. Thanks for taking the time to help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.