Reserve memory for list in Python? [duplicate]

Question

When programming in Python, is it possible to reserve memory for a list that will be populated with a known number of items, so that the list will not be reallocated several times while building it? I've looked through the docs for a Python list type, and have not found anything that seems to do this. However, this type of list building shows up in a few hotspots of my code, so I want to make it as efficient as possible.

Edit: Also, does it even make sense to do something like this in a language like Python? I'm a fairly experienced programmer, but new to Python and still getting a feel for its way of doing things. Does Python internally allocate all objects in separate heap spaces, defeating the purpose of trying to minimize allocations, or are primitives like ints, floats, etc. stored directly in lists?

@ironfroggy: The point is that this showed up in hotspots. In these places, list building was causing a significant, real-world bottleneck, the kind you should optimize. — dsimcha
– dsimcha, Commented Jan 31, 2010 at 16:36

jfs · Accepted Answer · 2009-02-11 15:47:29Z

55

Here's four variants:

an incremental list creation
"pre-allocated" list
array.array()
numpy.zeros()

python -mtimeit -s"N=10**6" "a = []; app = a.append;"\
    "for i in xrange(N):  app(i);"
10 loops, best of 3: 390 msec per loop

python -mtimeit -s"N=10**6" "a = [None]*N; app = a.append;"\
    "for i in xrange(N):  a[i] = i"
10 loops, best of 3: 245 msec per loop

python -mtimeit -s"from array import array; N=10**6" "a = array('i', [0]*N)"\
    "for i in xrange(N):" "  a[i] = i"
10 loops, best of 3: 541 msec per loop

python -mtimeit -s"from numpy import zeros; N=10**6" "a = zeros(N,dtype='i')"\
    "for i in xrange(N):" "  a[i] = i"
10 loops, best of 3: 353 msec per loop

It shows that [None]*N is the fastest and array.array is the slowest in this case.

answered Feb 11, 2009 at 15:47

jfs

417k210 gold badges1k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Mikhail Korobov Over a year ago

I think array.array is used in a suboptimal way here, see my answer.

jfs Over a year ago

@MikhailKorobov: good find. array('i', [0])*n along is 10 times faster than array('i', [0]*n) though it is still slower than [0]*n variant if you add the initialization loop. The point of the answer: measure first. The code examples are from other answers at the time.

Matt Krause Over a year ago

This seems a little unfair to numpy and array since you're including the import time, which would presumably be amortized over a lot of calls. @MikhailKorobov's results seem to suggest that numpy, once imported, is a lot faster.

jfs Over a year ago

@MattKrause: the import is not included, notice -s

jfs Over a year ago

@AchyutRastogi it is not how Python lists work. Only a[0] is changed. Try it.

|

SilentGhost · Accepted Answer · 2009-02-11 14:49:25Z

20

you can create list of the known length like this:

>>> [None] * known_number

answered Feb 11, 2009 at 14:49

SilentGhost

322k67 gold badges312 silver badges294 bronze badges

Comments

Mikhail Korobov · Accepted Answer · 2012-12-13 17:52:13Z

13

Take a look at this:

In [7]: %timeit array.array('f', [0.0]*4000*1000)
1 loops, best of 3: 306 ms per loop

In [8]: %timeit array.array('f', [0.0])*4000*1000
100 loops, best of 3: 5.96 ms per loop

In [11]: %timeit np.zeros(4000*1000, dtype='f')
100 loops, best of 3: 6.04 ms per loop

In [9]: %timeit [0.0]*4000*1000
10 loops, best of 3: 32.4 ms per loop

So don't ever use array.array('f', [0.0]*N), use array.array('f', [0.0])*N or numpy.zeros.

answered Dec 13, 2012 at 17:52

Mikhail Korobov

22.3k8 gold badges75 silver badges66 bronze badges

4 Comments

Mike Over a year ago

If you will be setting the array elements rather than adding to them, you probably don't need zeros, just some reserved space for each element. In this case, the way to go is np.empty in place of np.zeros. With your test, that's three times faster on my computer.

lodo Over a year ago

Isn't your second approach wrong? Multiplying a numpy array by a number does not give you a longer array. It does element-wise multiplication of the elements that are already in the array.

Mikhail Korobov Over a year ago

@lodo the second example doesn't use numpy arrays, it uses stdlib array module.

user2357112 Over a year ago

Note, [0.0]*4000*1000 builds a 4000-element list and repeats it 1000 times, rather than repeating a 1-element list 4000000 times like [0.0]*4000000 would. [0.0]*4000000 turns out to be significantly faster in my tests.

Glorfindel · Accepted Answer · 2022-12-12 19:01:40Z

4

If you're wanting to manipulate numbers efficiently in Python then have a look at NumPy ( Link). It let's you do things extremely fast while still getting to use Python.

To do what your asking in NumPy you'd do something like

import numpy as np
myarray = np.zeros(4000)

which would give you an array of floating point numbers initialized to zero. You can then do very cool things like multiply whole arrays by a single factor or by other arrays and other stuff (kind of like in Matlab if you've ever used that) which is very fast (most of the actual work is happening in the highly optimized C part of the NumPy library).

If it's not arrays of numbers your after then you're probably not going to find a way to do what you want in Python. A Python list of objects is a list of points to objects internally (I think so anyway, I'm not an expert of Python internals) so it would still be allocating each of its members as you create them.

edited Dec 12, 2022 at 19:01

Glorfindel

22.8k13 gold badges97 silver badges124 bronze badges

answered Feb 11, 2009 at 15:35

Thomas Parslow

6,1244 gold badges29 silver badges35 bronze badges

1 Comment

Mike Over a year ago

As I said on @Mikhail Korobov's answer, np.empty is preferable unless you really need your array to start out with zeros, giving triple the speed on my computer.

Alexander Lebedev · Accepted Answer · 2009-02-11 15:00:53Z

3

In most of everyday code you won't need such optimization.

However, when list efficiency becomes an issue, the first thing you should do is replace generic list with typed one from array module which is much more efficient.

Here's how list of 4 million floating point numbers cound be created:

import array
lst = array.array('f', [0.0]*4000*1000)

answered Feb 11, 2009 at 15:00

Alexander Lebedev

6,0541 gold badge22 silver badges31 bronze badges

2 Comments

jfs Over a year ago

What do you mean by "much more efficient"? array.array might require less memory but a Python list is faster in most (meaning those I've tried) cases.

Georg Schölly Over a year ago

In this case it even creates first a list and then from the list an array. This is not efficient.

mechanical_meat · Accepted Answer · 2012-04-03 17:38:57Z

2

In Python, all objects are allocated on the heap.
But Python uses a special memory allocator so malloc won't be called every time you need a new object.
There are also some optimizations for small integers (and the like) which are cached; however, which types, and how, is implementation dependent.

edited Apr 3, 2012 at 17:38

mechanical_meat

170k25 gold badges238 silver badges231 bronze badges

answered Feb 11, 2009 at 15:19

David Cournapeau

81.1k9 gold badges69 silver badges72 bronze badges

Comments

Vitaly Fadeev · Accepted Answer · 2019-05-12 05:33:53Z

for Python3:

import timeit
from numpy import zeros
from array import array

def func1():
    N=10**6
    a = []
    app = a.append
    for i in range(N):
        app(i)

def func2():
    N=10**6
    a = [None]*N
    app = a.append
    for i in range(N):
        a[i] = i

def func3():
    N=10**6
    a = array('i', [0]*N)
    for i in range(N):
        a[i] = i

def func4():
    N=10**6
    a = zeros(N,dtype='i')
    for i in range(N):
        a[i] = i

start_time = timeit.default_timer()
func1()
print(timeit.default_timer() - start_time)

start_time = timeit.default_timer()
func2()
print(timeit.default_timer() - start_time)

start_time = timeit.default_timer()
func3()
print(timeit.default_timer() - start_time)

start_time = timeit.default_timer()
func4()
print(timeit.default_timer() - start_time)

result:

0.1655518
0.10920069999999998
0.1935983
0.15213890000000002

append()
[None]*N
using module array
using module numpy

Collectives™ on Stack Overflow

Reserve memory for list in Python? [duplicate]

7 Answers 7

8 Comments

Comments

4 Comments

1 Comment

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

8 Comments

Comments

4 Comments

1 Comment

2 Comments

Comments

Comments

Linked

Related