Sorting 'Alphabetically' (Alien Dictionary Code Problem)

Question

I've started tackling coding problems to try and improve my skills.

I'm working on the 'Alien Dictionary' coding problem which, when given a sorted list of 'Alien Words' you need to determine the 'Alien Alphabet'. The alien alphabet is made up of Latin characters but in a different order than ours.

I've since learned there are more optimised ways of solving this which I will look into, but I want to see my instinctual approach through to completion.

My code does compile with c++20 and outputs the correct alphabet, however, I had to implement a 'hack' to cover an edge case which I explain in a code comment.

I can't quite wrap my head around why I needed the hack, or how to fix my code to not require it.

#include <iostream> // std::cout
#include <map> // std::map
#include <vector> // std::vector
#include <algorithm> // std::sort

/*
    Input: words[] = ["pbb", "bpku", "bpkb", "kbp", "kbu"];
    Expected Output: ['p', 'u', 'b', 'k'];
*/

typedef std::vector<std::string> WordList;
WordList alienWords = {"pbb", "bpku", "bpkb", "kbp", "kbu"};

typedef std::map<std::pair<char, char>, char> RuleBook;
RuleBook alienRuleBook;

typedef std::vector<char> Alphabet;
Alphabet alienAlphabet;

void populateAlphabet(Alphabet& alphabet, WordList& wordList) {
    alphabet.clear();
    for (int word = 0; word < wordList.size(); word++) {
        for (int letter = 0; letter < wordList[word].size(); letter++) {
            if(std::find(alphabet.begin(), alphabet.end(), wordList[word][letter]) == alphabet.end()) {
                alphabet.push_back(wordList[word][letter]);
            }
        }
    }
}

void generateRules(RuleBook& ruleBook, WordList& wordList){
    for (int firstWord = 0; firstWord < wordList.size(); firstWord++) {
        for (int secondWord = firstWord + 1; secondWord < wordList.size(); secondWord++) {
            if (secondWord == wordList.size()) break; 
            
            int letter = 0;

            for (; letter < wordList[firstWord].size(); letter++) {
                if (wordList[firstWord][letter] == wordList[secondWord][letter]) continue;
                
                ruleBook[{wordList[firstWord][letter], wordList[secondWord][letter]}] = '<';
                ruleBook[{wordList[secondWord][letter], wordList[firstWord][letter]}] = '>';
                break;
            }
        }
    }
}

// needs to return TRUE if 'l' should come before 'r'.
bool getRule(char l, char r) {
    switch(alienRuleBook[{l, r}]) {
        case '>': return false;
        case '<': return true;
    }
    std::cout << "ERROR! No rule found for: '" << l << "' vs '" << r << "'\n\n";
    
    // The below is a hack because I don't understand to fix the case of {'u', 'k'}
    // There's no 'discovered' rule saying 'u' comes before 'k' or 'k' comes after 'u'
    // even though we KNOW 'u' comes before 'b' and we know that 'b' comes before 'k'.
    return true;
}

void printAlphabet(Alphabet& alphabet){
    std::cout << "=== Alphabet ===" << "\n ";
    for(const auto it : alphabet)
        std::cout << it << " ";
    std::cout << "\n================\n\n";
}

void printRuleBook(RuleBook& ruleBook){
    std::cout << "=== Discovered Rules ===" << "\n";
    for(const auto it : ruleBook)
        std::cout << " " << it.first.first << " " << it.second << " " << it.first.second << '\n';
    std::cout << "================\n\n";
}

int main() {
    populateAlphabet(alienAlphabet, alienWords);
    
    generateRules(alienRuleBook, alienWords);
    
    std::sort(alienAlphabet.begin(), alienAlphabet.end(), getRule);
    
    printRuleBook(alienRuleBook);
    
    printAlphabet(alienAlphabet);
    
    return 0;
}

That's really more of a fundamental problem with how you generate your lookup - you only ever insert rules for comparisons you've already encountered, but I'd expect an "alphabet" lookup to be something like a std::unordered_map<char, int> - I input a character and get its "index" in the alphabet. Then you'd look up both characters and compare which one has the lower index — UnholySheep
– UnholySheep, Commented Aug 3, 2022 at 11:30
That is a fun problem, (not a real answer) but I had a go at it myself and approached it from a different angle : onlinegdb.com/LmMOGj2pP. At least it shows a different approach. — Pepijn Kramer
– Pepijn Kramer, Commented Aug 3, 2022 at 12:02
I'm not sure your solution works, as in some cases (like you mentioned) rule is defined implicitly. In this case you won't find it in the RuleBook. I would suggest to look at the problem as a directional graph (p->b->k; p->u->b; p->u) and then walk the tree starting from node that has no incoming connections + cut connections + continue from the next node that has no incoming connections — Dmytro Ovdiienko
– Dmytro Ovdiienko, Commented Aug 3, 2022 at 14:23
Your "corner case" is in fact a fundamental problem with your approach. I don't recommend pursuing it. — n. m. could be an AI
– n. m. could be an AI, Commented Aug 4, 2022 at 18:11
It does appear that I painted myself into a corner with this attempt. I thank you all for your guidance. — Steven Martin
– Steven Martin, Commented Aug 5, 2022 at 21:13

Dmytro Ovdiienko · Accepted Answer · 2022-08-04 17:25:07Z

0

In order to implement the getRule function if there is no implicit rule for {a, b}, you should search for {a, x} = '>' where {x, b} = '>' or {a, x} = '<' where {x, b} = '<'

// in case if a > b, you search for "a > x and x > b". 
// In other words, if a is greater than x and x is greater than b,
// then a is greater than b.
a ... x ... b

// the similar is for case when a < b
b ... x ... a

In case if you cannot find {a, x} and {x, b}, you should search for {a, x} + {x, y} + {y, b}. The search depth can increase. I'm not sure that is good solution.

I would suggest to look at the problem as a directed graph:

First walk through the ordered list of words and make a graph: p->b->k and p->u->b and (duplicated) p->u
Then find the node that has no incoming connections (e.g. p). That will be the first char of the alphabet
Then iterate its outgoing connections and find the node, that has no incoming connections except p. That will give you the second char (u).
Then iterate over all connections of the second char u and find the node that has no other incoming connections except p and u. That will give you b
And so on, and so on

edited Aug 4, 2022 at 17:25

answered Aug 4, 2022 at 14:27

Dmytro Ovdiienko

1,21611 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Steven Martin Over a year ago

A directed graph sounds like a better approach and one I've seen in other people's answers and explanations of this problem. I'll do some learning and have another crack at it. Thanks for taking the time to help.

Collectives™ on Stack Overflow

Sorting 'Alphabetically' (Alien Dictionary Code Problem)

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related