Skip to content

Conversation

@ceberam
Copy link
Contributor

@ceberam ceberam commented Nov 4, 2025

This PR addresses the problem of low conversion throughput of html documents introduced after the release 2.54.0, in addition of other improvements:

  • Simplify the parsing of table cells with just text (no rich cell) and remove the addition and deletion of text items.

Resolves #2509

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

@ceberam ceberam self-assigned this Nov 4, 2025
@ceberam ceberam added bug Something isn't working html issue related to html backend labels Nov 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

DCO Check Passed

Thanks @ceberam, all your commits are properly signed off. 🎉

@ceberam ceberam changed the title fix(html): simplify parsing of simple table cells fix(html): slow table parsing Nov 4, 2025
@mergify
Copy link

mergify bot commented Nov 4, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
@ceberam ceberam force-pushed the fix/html-tables-2509 branch from 2b007ea to 50af4f3 Compare November 5, 2025 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working html issue related to html backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bee] docling becomes 10x slower and prints tons of info log on docx and html input files with tables since v2.55.0

2 participants