Skip to content

Conversation

@asishm
Copy link
Member

@asishm asishm commented Nov 28, 2022

# setup from referenced issue
In [2]: %timeit combined = pd.read_html(StringIO(combined_html))
324 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # <- PR

In [3]: %timeit combined = pd.read_html(StringIO(combined_html)) # <- main
44.3 s ± 4.28 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

@asishm asishm marked this pull request as draft November 28, 2022 20:34
@asishm asishm marked this pull request as ready for review November 29, 2022 00:20
@mroeschke mroeschke added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Nov 29, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. Could you add a whatsnew note in 2.0.0.rst? (Performance improvement section)?

@asishm
Copy link
Member Author

asishm commented Nov 29, 2022

Added whatsnew!

@mroeschke mroeschke added this to the 2.0 milestone Nov 29, 2022
@mroeschke mroeschke merged commit c0bde88 into pandas-dev:main Nov 29, 2022
@mroeschke
Copy link
Member

Thanks @asishm

@asishm asishm deleted the read_html_perf branch December 13, 2022 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO HTML read_html, to_html, Styler.apply, Styler.applymap

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: read_html: Reading one HTML file with multiple tables is much slower than loading each table separatly

2 participants