Skip to content

Conversation

@Mulgyeol
Copy link
Contributor

@Mulgyeol Mulgyeol commented Nov 4, 2025

Fixes TypeError when TesseractOcrOptions is initialized with an explicit PSM parameter.

Changes

  • Use integer PSM value directly instead of calling tesserocr.PSM() constructor
  • Fixed in both main_psm initialization (line 100) and script_readers initialization (line 198)
  • tesserocr.PSM is a class with integer constants, not a callable enum
  • Added regression test with TesseractOcrOptions(psm=3) to prevent future issues

Background

The bug was introduced in v2.56.0 when PSM configurability was added. The code incorrectly attempted to construct a PSM enum from an integer, but tesserocr.PSM constants are already integers and should be used directly.

Issue resolved by this Pull Request:
Resolves #2576

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

- Use integer psm value directly instead of calling tesserocr.PSM()
- Fixed in both main_psm and script_readers initialization
- tesserocr.PSM is a class with integer constants, not an enum

Fixes docling-project#2576
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

DCO Check Passed

Thanks @Mulgyeol, all your commits are properly signed off. 🎉

@dosubot
Copy link

dosubot bot commented Nov 4, 2025

Related Documentation

Checked 3 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify
Copy link

mergify bot commented Nov 4, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

I, mulgyeol <mulgyeoljung@gmail.com>, hereby add my Signed-off-by to this commit: da63a17

Signed-off-by: mulgyeol <mulgyeoljung@gmail.com>
@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@cau-git cau-git requested a review from Copilot November 4, 2025 18:31
@cau-git cau-git merged commit 1a5146a into docling-project:main Nov 4, 2025
30 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes type handling for the Tesseract Page Segmentation Mode (PSM) parameter by removing unnecessary tesserocr.PSM() enum wrapper calls.

  • Removes tesserocr.PSM() wrapper around self.options.psm when initializing Tesseract readers
  • Adds test coverage for using PSM parameter with TesseractOcrOptions

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
docling/models/tesseract_ocr_model.py Removes tesserocr.PSM() wrapper calls, allowing integer PSM values to be passed directly to the API
tests/test_e2e_ocr_conversion.py Adds test case for TesseractOcrOptions(psm=3) to verify PSM parameter handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tesserocr.PSM(self.options.psm)
if self.options.psm is not None
else tesserocr.PSM.AUTO
self.options.psm if self.options.psm is not None else tesserocr.PSM.AUTO
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type inconsistency: self.options.psm is an Optional[int] but is being mixed with tesserocr.PSM.AUTO enum value. The fallback should use an integer value (e.g., 3 for AUTO) or self.options.psm should be converted to the enum type. This creates a type mismatch where the ternary expression returns different types depending on the condition.

Suggested change
self.options.psm if self.options.psm is not None else tesserocr.PSM.AUTO
tesserocr.PSM(self.options.psm) if self.options.psm is not None else tesserocr.PSM.AUTO

Copilot uses AI. Check for mistakes.
Comment on lines +196 to 198
psm=self.options.psm
if self.options.psm is not None
else tesserocr.PSM.AUTO,
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type inconsistency: Same issue as line 100 - self.options.psm (an Optional[int]) is being mixed with tesserocr.PSM.AUTO enum value. The ternary expression returns different types, which could cause issues depending on how the tesserocr API handles type checking.

Suggested change
psm=self.options.psm
if self.options.psm is not None
else tesserocr.PSM.AUTO,
psm=tesserocr.PSM(self.options.psm) if self.options.psm is not None else tesserocr.PSM.AUTO,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError when using custom PSM value in TesseractOcrOptions

2 participants