Skip to content

gh-91146: Reduce allocation size of list from str.split()/rsplit()#95473

Merged
corona10 merged 2 commits intopython:mainfrom
corona10:gh-91146
Jul 31, 2022
Merged

gh-91146: Reduce allocation size of list from str.split()/rsplit()#95473
corona10 merged 2 commits intopython:mainfrom
corona10:gh-91146

Conversation

@corona10
Copy link
Copy Markdown
Member

@corona10 corona10 commented Jul 30, 2022

Memory size

AS-IS

>>> import sys
>>> s = "1 2".split()
>>> sys.getsizeof(s)
152
>>> s = "12345".split()
>>> sys.getsizeof(s)
152
>>> s = "1 2 3 4 5".split()
>>> sys.getsizeof(s)
152
>>> s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
>>> sys.getsizeof(s)
280

TO-BE (Allocation is reduced & No regression)


>>> import sys
>>> s = "1 2".split()
>>> sys.getsizeof(s)
88
>>> s = "12345".split()
>>> sys.getsizeof(s)
104
>>> s = "1 2 3 4 5".split()
>>> sys.getsizeof(s)
136
>>> s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
>>> sys.getsizeof(s)
280

Performance: No regression

Script

import pyperf

def bench_split_tiny():
    s = "1 2".split()

def bench_split_small():
    s = "1 2 3 4 5".split()

def bench_split_larger():
    s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()

def bench_split_tiny_append():
    s = "1 2".split()
    for e in range(100):
        s.append("9")

def bench_split_small_append():
    s = "1 2 3 4 5".split()
    for e in range(100):
        s.append("9")

def bench_split_larger_append():
    s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
    for e in range(100):
        s.append("9")

runner = pyperf.Runner()
runner.bench_func('bench_split_tiny', bench_split_tiny)
runner.bench_func('bench_split_small', bench_split_small)
runner.bench_func('bench_split_larger', bench_split_larger)
runner.bench_func('bench_split_tiny_append', bench_split_tiny_append)
runner.bench_func('bench_split_small_append', bench_split_small_append)
runner.bench_func('bench_split_larger_append', bench_split_larger_append)

Result

Benchmark base opt
bench_split_small (consistently 1-2% faster) 63.5 ns 62.7 ns: 1.01x faster
bench_split_tiny_append (unstable) 1.60 us 1.57 us: 1.02x faster
bench_split_small_append (unstable) 1.61 us 1.57 us: 1.03x faster
Geometric mean (ref) 1.01x faster

Benchmark hidden because not significant (2): bench_split_tiny, bench_split_larger

@corona10 corona10 changed the title gh-91146: Reduce allocation size of list from str.split() gh-91146: Reduce allocation size of list from str.split()/rsplit() Jul 31, 2022
@corona10 corona10 added the performance Performance or resource usage label Jul 31, 2022
@methane
Copy link
Copy Markdown
Member

methane commented Jul 31, 2022

preallocation size can be smarter.

if (sub == NULL) {
    if (maxsplit < 0) {
        maxsplit = (len1+1) / 2;
    }
}
...
if (maxsplit < 0) {
    maxsplit = len1 / len2 + 1;
}

But this pull request is better than status quo and very simple.

@corona10
Copy link
Copy Markdown
Member Author

preallocation size can be smarter.

Thank you Naoki san, I like your suggestion too!
we can try the approach if we need the smarter way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants