Skip to content

Perhaps prefer a set to a dict of empty values #174

@enginoid

Description

@enginoid

The section "Create a length-N list of lists" includes a discussion on the O(n) performance of looking up a performance versus O(1) for a searching the keys of a dictionary that has empty values for each key. Specifically, the following code is presented:

d = {'s': [], 'p': [], 'a': [], 'm': []}
l = ['s', 'p', 'a', 'm']

def lookup_dict(d):
    return 's' in d

def lookup_list(l):
    return 's' in l

The text proceeds to recommend the use of this kind of dict over a list when looking through a set of values, for performance reasons:

Even though both functions look identical, because lookup_dict is utilizing the fact that dictionaries in python are hashtables, the lookup performance between the two is very different. Python will have to go through each item in the list to find a matching case, which is time consuming. By analysing the hash of the dictionary finding keys in the dict can be done very quickly.

Here are my potential beefs with this recommendation:

  1. It may be tempting to continue this as a discussion of lists of lists, but I don't believe this belongs under "Idioms." Perhaps it rather belongs in a section about data structures or performance.
  2. If the developer intends to only use the values (but no keys), I'd suggest recommend suggesting sets, a structure that is a much more fitting alternative for this use case. To be honest, I don't how the performance of the two compare, but even if sets are an inferior alternative (I doubt this when I think it over, but I have an insufficient understanding and no data to back it), it's a lot clearer. Assuming sets aren't vastly inferior, is there really a need to introduce an idiom when we have a competent data structure for the same thing?
  3. If you wish to suggest other structures than a list for performance, you might want to qualify that discussion with indicators as to when that would be appropriate. Without a caveat, this section looks like an endorsement of using lists for every case where you'll be searching the data structure. This is potentially justified (again, I lack an understanding about the underlying data structure), but my immediate hunch is that the overhead of hashing the values exceeds the time it takes to create a simpler element and search it sequentially. If you want to endorse a different data structure for performance reasons perhaps you'll want to say, "if you'll be looking this up multiple times, then you might want this."

Reading this over it all sounds far more critical than I mean it to be – I'm sort of cramming all the thoughts I have about this section into a single, almost exhaustive, discourse of retorts that I could imagine surfacing in a back-and-forth conversation about this. To compensate for my strident opposition, I should note that I think you're doing a great job.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions