0

I'm learning Python coming from some beginner-level experience with Java. It all makes sense for the most part, but one of the exercises kind of made me wonder what actually happens within Python.

import string

def ispangram(str1, alphabet=string.ascii_lowercase):
    str1 = set(str1.lower().replace(' ',''))
    alphabet = set(alphabet)

    print(str1)
    print(alphabet)
    
    return str1 == alphabet

The goal of this function is to determine whether or not a string passed as a parameter is a pangram (uses every letter of the alphabet). Obviously the print statements aren't necessary, but I included them to try and figure out what was going on.

Input:

ispangram("The quick brown fox jumps over the lazy dog")

Output:

{'d', 'e', 'r', 'b', 't', 'i', 'l', 'a', 'y', 'o', 'v', 'p', 'z', 'c', 'g', 'n', 'f', 'q', 'x', 'm', 'h', 'w', 'k', 's', 'u', 'j'} 

{'d', 'e', 'r', 'b', 't', 'i', 'l', 'a', 'y', 'o', 'v', 'p', 'z', 'c', 'g', 'f', 'n', 'q', 'x', 'm', 'h', 'w', 'k', 's', 'u', 'j'} 

returns True

It prints out the exact same set, which is what is confusing to me. I know that the order of sets doesn't matter when comparing them, which makes sense, but why would str1 and alphabet be the exact same? Is it because this order is the optimal way to store these 26 lowercase letters in memory?

Finally, with more complex set comparisons, does having two sets being automatically sorted make it more efficient? How does Python do it?

2
  • 2
    Don't they differ by the order of n and f? Commented Jul 28 at 20:54
  • 1
    ... and Set equality does not require the same apparent order which is maybe why you missed that the print statements do show differences. Commented Jul 28 at 21:55

3 Answers 3

0

Well, first, these two sets aren't identical in ordering. At a glance, they flip the ordering of 'n' and 'f'.

Beyond that, while set ordering isn't guaranteed in standard sets in Python as a language, individual implementations may implement some ordering type. Whether that's a reliable contract will ultimately be a function of how much you trust that specific implementation and their promise to offer that as a stable behaviour.

Based on CPython's set, (of which the meat and potatoes of the insertion implementation lives here), it looks like there's no particular care taken to preserve any specific ordering, nor is there any specific care taken to randomize the order beyond using object hashes, which are stable for any object's lifetime and tend to be stable globally for certain special values (like integers below 256, and individual bytes from the ASCII range in string data).

The same can be said for the implementation of set's __repr__, (here), which makes no special effort to randomize or stabilize the order in which items are presented.

Emphatically, though, these are implementation details of CPython. You shouldn't rely on this unless you positively have to, and even then, I'd step back and reevaluate why you're in that position.

Sign up to request clarification or add additional context in comments.

Comments

0

It's an implementation detail. Likely you use CPython, where sets are mostly ordered by hash value modulo capacity because they're hash sets and iterate over their hash table. Hash remainder collisions can cause deviations like your swapped 'f' and 'n'.

Demo code (Attempt This Online!):

import string

print(*set(string.ascii_lowercase))
print(*sorted(string.ascii_lowercase, key=lambda c: hash(c) % 2**7))

Sample outputs of three runs:

b j w t o p g h d i v u y z m l f x c r n s a e q k
b j w t o p g h d i v u y z m l f x c r n s a e q k

p j f s h a b w y k l n m c g v z i o x u d q r t e
p j f s h a b w y k l n m c g v z i o x u d q r t e

v i s p g t f r w y u q n x m d b h k o a c l z j e
v i s p g t f r y w u q n x m d b h k o a c l z j e

The order changes from run to run because Python randomizes string hashes.

Comments

-1

The answer to the question posted in your post's title is in the set documentation, https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset :

Both set and frozenset support set to set comparisons. Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other). A set is less than another set if and only if the first set is a proper subset of the second set (is a subset, but is not equal). A set is greater than another set if and only if the first set is a proper superset of the second set (is a superset, but is not equal).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.