markitdown-ocr process  non-text layer PDFs generate an MD document containing only page numbers

markitdown-ocr  process non-text layer PDFs, such as those converted from images or generated through scanning,It will generate an MD document containing only page numbers
the reason is , In the file "pdf_converter_with_ocr.py",first， by  "markdown_content.append(f"\n"## Page{page_num}\n)"  set a page Message,
but In the subsequent code logic,  whether to perform full-page OCR based on whether the content is empty。

Due to the page number information in markdown_content, markdown_content is not empty, so it will skip OCR，finally，generate an MD document containing only page numbers


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

markitdown-ocr process non-text layer PDFs generate an MD document containing only page numbers #1863

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

markitdown-ocr process non-text layer PDFs generate an MD document containing only page numbers #1863

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions