9

I need to store a large list of numbers in memory. I will then need to check for membership. Arrays are better than lists for memory efficiency. Sets are better than lists for membership checking. I need both! So my questions are:

1) How much more memory efficient are arrays than sets? (For the converse, see my results below). 2) Is there a data structure which strikes a better balance between sets and arrays? Something like a set with a signed integer type? Or some numpy construct?

I checked out the membership timing difference with the script below. (I know timeit is better but the variance is low enough for time to be fine):

import array
import time 

class TimerContext:
    def __enter__(self):
        self.t0 = time.time()
    def __exit__(self, *args, **kwargs):
        print(time.time()-self.t0)

SIZE = 1000000

l = list([i for i in range(SIZE)])
a = array.array('I', l)
s = set(l)

print(type(l))
print(type(a))
print(type(s))

with TimerContext():
    x = 99999 in l
with TimerContext():
    x = 99999 in a
with TimerContext():
    x = 99999 in s

Results:

<class 'list'>
<class 'array.array'>
<class 'set'>
0.0012176036834716797
0.0024595260620117188
1.430511474609375e-06

So sets are a LOT faster for membership checking (please note the scientific notation). So if their memory footprint isn't that different to the array, I'll prefer to use a set. But I don't know how to check the memory footprint.

I should also add that there are a lot of questions comparing sets and lists. But I didn't see any good answers comparing arrays and sets.

11
  • 1
    There may be many other and better solutions, if one would know your real problem. Commented Apr 19, 2019 at 18:40
  • 3
    sys.getsizeof showed that for your example the set is ~8 times larger than the array. docs.python.org/3/library/sys.html#sys.getsizeof. Commented Apr 19, 2019 at 18:43
  • 3
    list and array membership tests are both O(n), sets are O(1)... there are also approximate algorithms for this, that might help (e.g. Bloom filters). if you have any other constraints that you can put on the problem you might have more options Commented Apr 19, 2019 at 18:58
  • 1
    google suggests that pypi.org/project/intset might help... Commented Apr 19, 2019 at 19:00
  • also Python comes with a timeit module that does similar things to your TimerContext, Jupyter/IPython wraps this with a nice %timeit magic Commented Apr 19, 2019 at 19:03

1 Answer 1

4

If it's possible in your case, bisect performance comes close to set for membership checks (both with list and array). See results below

import array
from bisect import bisect
import sys
import time


class TimerContext:
    def __enter__(self):
        self.t0 = time.time()

    def __exit__(self, *args, **kwargs):
        print(time.time() - self.t0)


def get_size_in_megabytes(iterable):
    return round(sys.getsizeof(iterable) / (1024 ** 2), 2)


SIZE = 1000000

l = list([i for i in range(SIZE)])
a = array.array("I", l)
s = set(l)

print(type(l), get_size_in_megabytes(l))
print(type(a), get_size_in_megabytes(a))
print(type(s), get_size_in_megabytes(s))

with TimerContext():
    x = 99999 in l
with TimerContext():
    x = 99999 in a
with TimerContext():
    x = 99999 in s

print("list bisect")
with TimerContext():
    bisect(l, 99999)

print("array bisect")
with TimerContext():
    bisect(a, 99999)

Results:

<class 'list'> 8.58
<class 'array.array'> 3.81
<class 'set'> 32.0
0.0024390220642089844
0.0053005218505859375
3.814697265625e-06
list bisect
9.298324584960938e-06
array bisect
6.198883056640625e-06

Credits for sys.getsizeof ussage to @CristiFati.

Sign up to request clarification or add additional context in comments.

1 Comment

bisect is O(log_n)... also, getsizeof doesn't count the size of any referred elements, just the object itself... there are recipes that show how to recurse into objects. an int is 28 bytes, so a list of 1M elements should be 35MB

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.