Dictionary within a list using Counter

Question

I wanted to write a function which lists the Counter of dictionary items that appear for at least the number of times df in all other dictionaries.

example:

prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]
prune(([{'a': 1, 'b': 10}, {'a': 2}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 2})]

As we can see that 'a' occurs twice in two dictionaries it gets listed in the output.

My approach:

from collections import Counter
def prune(dicto,df=2):
   new = Counter()
   for d in dicto:
       new += Counter(d.keys())
   x = {}
   for key,value in new.items():
       if value >= df:
           x[key] = value
   print Counter(x)

Output:

Counter({'a': 2})

This gives the output as a combined Counter. As we can see, term 'a' appears 2 times on the whole and hence it satisfies the df condition and gets listed in the output. Now, Can anyone correct me to get the desired output.

In your expected output you have two counters. What does each counter signify? Why is having just the one counter not useful? — Martijn Pieters
– Martijn Pieters, Commented Apr 14, 2015 at 22:21
@MartijnPieters: I think OP wants to list the key value pairs that appear in every dictionary, such that each printed key appears in at least df many dictionaries — inspectorG4dget
– inspectorG4dget, Commented Apr 14, 2015 at 22:36
these two dictionaries are like two different documents with word counts in tat specific document — Wolf
– Wolf, Commented Apr 14, 2015 at 22:36
@inspectorG4dget: I'd like the OP to make that explicit, rather than have us guess. — Martijn Pieters
– Martijn Pieters, Commented Apr 14, 2015 at 22:37
@MartijnPieters: normally, I'd agree with you (and would have directed my clarification at OP), but I have a feeling that OP's first language is not English and thought this would help — inspectorG4dget
– inspectorG4dget, Commented Apr 14, 2015 at 22:38

JuniorCompressor · Accepted Answer · 2015-04-15 21:14:00Z

5

I would suggest:

from collections import Counter
def prune(dicto, min_df=2):
    # Create all counters
    counters = [Counter(d.keys()) for d in dicto]

    # Sum all counters
    total = sum(counters, Counter()) 

    # Create set with keys of high frequency
    keys = set(k for k, v in total.items() if v >= min_df)

    # Reconstruct counters using high frequency keys
    counters = (Counter({k: v for k, v in d.items() if k in keys}) for d in dicto)

    # With filter(None, ...) we take only the non empty counters.
    return filter(None, counters)

Result:

>>> prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]

edited Apr 15, 2015 at 21:14

answered Apr 14, 2015 at 22:43

JuniorCompressor

20.1k4 gold badges33 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Padraic Cunningham Over a year ago

You should make counters a generator expression if you are going to filter it.

Padraic Cunningham · Accepted Answer · 2015-04-15 09:45:05Z

1

chain the keys and keep the keys from each dict that satisfy the condition.

from itertools import chain

def prune(l, min_df=0):
    # count how many times every key appears
    count = Counter(chain.from_iterable(l))
    # create Counter dicts using keys that appear at least  min_df times
    return filter(None,(Counter(k for k in d if count.get(k) >= min_df) for d in l))

In [14]: prune([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
Out[14]: [Counter({'a': 1}), Counter({'a': 1})]

You can avoid the filter but I am not sure it will be any more efficient:

def prune(l, min_df=0):
        count = Counter(chain.from_iterable(l))
        res = []
        for d in l:
            cn = Counter(k for k in d if count.get(k) >= min_df)
            if cn:
                res.append(cn)
        return res

The loop is pretty much on a par:

In [31]: d = [{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}]    
In [32]: d = [choice(d) for _ in range(1000)]   
In [33]: timeit chain_prune_loop(d, min_df=2)
100 loops, best of 3: 5.49 ms per loop    
In [34]: timeit prune(d, min_df=2)
100 loops, best of 3: 11.5 ms per loop
In [35]: timeit set_prune(d, min_df=2)
100 loops, best of 3: 13.5 ms per loop

edited Apr 15, 2015 at 9:45

answered Apr 14, 2015 at 23:27

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

1 Comment

Padraic Cunningham Over a year ago

@Shashank, yes, I was originally doing something differently, forgot to remove the generator expression

inspectorG4dget · Accepted Answer · 2015-04-14 22:43:24Z

0

This will print out all the values of each key that appears in at least df dictionaries.

def prune(dicts, df):
    counts = {}
    for d in dicts:  # for each dictionary
        for k,v in d.items():  # for each key,value pair in the dictionary
            if k not in counts:  # if we haven't seen this key before
                counts[k] = []
            counts[k].append(v)  # append this value to this key

    for k,vals in counts.items():
        if len(vals) < df:
            continue  # take only the keys that have at least `df` values (that appear in at least `df` dictionaries)
        for val in vals:
            print(k, ":", val)

answered Apr 14, 2015 at 22:43

inspectorG4dget

115k30 gold badges159 silver badges253 bronze badges

Collectives™ on Stack Overflow

Dictionary within a list using Counter

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related