Bug: do_formula_enrichment=True produces garbled text (e.g., /C0 apod) and generate_picture_images=True creates empty folders & `` placeholders

**Environment**
Docling Version: 2.60.0

OS: Windows

Python: 3.12

**Bug Description**
I am testing Docling to process complex documents and am enabling do_formula_enrichment and generate_picture_images. On test documents, both features fail, resulting in corrupted data output.

Formula Failure: Instead of extracting LaTeX, the text is replaced with garbled code (e.g., a ¼ Δ f rep f rep ¼ 2 : 12 /C3 10 /C0 622 and SNRtime corr : /C0 apod : = 3112). This is a significant regression from standard text extraction. This appears related to another reported issue (on formula spacing), but my output is completely garbled, not just spaced out.

Image Failure: Instead of exporting images and linking them, Docling only inserts `` placeholders. My script creates the target directory, but Docling fails to save any image files into it (the directory remains empty). This appears to be the same bug as reported in Issue #2560.

**Steps to Reproduce**
Install docling==2.60.0.

Download a public PDF known to contain formulas and images (e.g., "Attention Is All You Need": https://arxiv.org/pdf/1706.03762v7).

Run the Python script below.

**Minimal Python Code**
Python

import docling
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat
from pathlib import Path

# --- 1. Configure Docling ---
print("ℹ️ Initializing Docling Converter...")
pipeline_options = PdfPipelineOptions()

# Bug 1: Enable Formula Enrichment
pipeline_options.do_formula_enrichment = True

# Bug 2: Enable Image Generation
pipeline_options.generate_picture_images = True 

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})
print("✅ Docling Converter initialized.")

# --- 2. Define Paths ---
# Using a public test document (e.g., "Attention Is All You Need")
# Download this file locally and place it in the same directory.
pdf_path = "1706.03762v7.pdf" 
image_output_dir = Path("./test_images/doc_01")
image_output_dir.mkdir(parents=True, exist_ok=True) # Create output dir

print(f"🔄 Processing {pdf_path}...")

# --- 3. Convert ---
try:
    result = converter.convert(pdf_path)
    doc = result.document
    
    # 4. Export
    # Per Issue #2560, export_to_markdown() does not accept image_dir or include_annotations
    markdown_text = doc.export_to_markdown()

    print("\n--- MARKDOWN RESULT (snippet) ---")
    # Look for the famous formula
    attention_formula_index = markdown_text.find("Attention(Q, K, V)")
    if attention_formula_index != -1:
        print(markdown_text[attention_formula_index : attention_formula_index + 200] + "...")
    else:
        print("Could not find 'Attention(Q, K, V)' snippet.")

    print("\n--- IMAGE PLACEHOLDERS ---")
    print(f"Found '' placeholder: {'' in markdown_text}")
    
    print("\n--- IMAGE DIRECTORY CONTENT ---")
    print(f"Checking directory: {image_output_dir.resolve()}")
    # This check will show an empty list, proving no images were saved.
    print(list(image_output_dir.glob('*'))) 

except Exception as e:
    print(f"❌ Error during Docling extraction: {e}")


**Expected Behavior**
Formulas: Formulas should be extracted as clean LaTeX (e.g., $$Attention(Q, K, V) = \text{softmax}(\frac{QK^{T}}{\sqrt{d_{k}}})V$$).

Images: generate_picture_images=True should save image files (e.g., img-0.png) to the output directory.

Markdown: The `` placeholders should be replaced with valid Markdown image links (e.g., ![...](img-0.png)).

**Actual Behavior**
Formulas: Formulas are destroyed and replaced with garbled text (see snippets below).

Images: No image files are saved to disk. The directory specified (or any other directory) remains empty.

Markdown: The file only contains `` placeholders.


Thank you for your work on this promising project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: do_formula_enrichment=True produces garbled text (e.g., /C0 apod) and generate_picture_images=True creates empty folders & `` placeholders #2568

--- 1. Configure Docling ---

Bug 1: Enable Formula Enrichment

Bug 2: Enable Image Generation

--- 2. Define Paths ---

Using a public test document (e.g., "Attention Is All You Need")

Download this file locally and place it in the same directory.

--- 3. Convert ---

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: do_formula_enrichment=True produces garbled text (e.g., /C0 apod) and generate_picture_images=True creates empty folders & `` placeholders #2568

Description

--- 1. Configure Docling ---

Bug 1: Enable Formula Enrichment

Bug 2: Enable Image Generation

--- 2. Define Paths ---

Using a public test document (e.g., "Attention Is All You Need")

Download this file locally and place it in the same directory.

--- 3. Convert ---

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions