🚀 Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

This project explores core challenges in information extraction from layout-rich documents, including:

✅ Input representation
✅ Chunking
✅ Prompting
✅ Selection of LLMs and multimodal models

It benchmarks the outcomes of different design choices against LayoutLMv3, ERNIE-Layout and GPT-4o Vision.

⚙️ Setup

1️⃣ Install the Project

Tested with Python 3.11.6 and Conda on a Linux server (Ubuntu 5.15.0-124-generic).

# Clone the repository
git clone git@github.com:gayecolakoglu/LayIE-LLM.git
cd LayIE-LLM

# Create and activate a Conda environment
conda create -n LayIE-LLM python=3.11
conda activate LayIE-LLM

# Install dependencies
pip install -r requirements.txt
pip install -e .

2️⃣ Dataset 📂

The vrdu2 folder contains required files for testing LLaMA, GPT-3.5, and GPT-4o.
This project specifically tests registration-form data (filtered as explained in Appendix A.3 of the paper).
Full dataset available at: VRDU Dataset.

3️⃣ API Keys 🔑

Create a keys.env file in the same directory as config.py with the following format:

api_key_llama="YOUR_API_KEY"
api_key_gpt="YOUR_API_KEY"

▶️ How to Run

📌 Input Types

main.ipynb → Runs all three models (LLaMA 3, GPT-3.5, GPT-4o) with OCR input.
main_md.ipynb → Runs the same models with Markdown input.
main-gpt4-Image.ipynb → Runs GPT-4o Vision with Image input.

📌 Selecting LLM

Modify the Arrange working dirs section in the main scripts to change the model as shown in the attached example

MODEL_gpt_3  # Other options: MODEL_llama, MODEL_gpt_4

📌 Other Notebooks

Notebook	Purpose
main-gpt4-Markdown_batch.ipynb	Converts documents to Markdown format
llama-token-calc.ipynb	Estimates token usage for LLaMA
llama-postprocess.ipynb	Tests updated post-processing methods without re-running models
error_markdown_files.ipynb	Updates files that caused errors during model interaction
editing_scripts.ipynb	Various scripts for analyzing model outputs

📌 Output Folders

Folder	Contents
llama3_70b_outputs	Output for LLaMA-3 (OCR input)
gpt4_outputs	Output for GPT-4 (OCR input)
gpt3.5_outputs	Output for GPT-3.5 (OCR input)
gpt4_Markdown_Llama3_outputs	Markdown input - LLaMA-3
gpt4_Markdown_gpt4_outputs	Markdown input - GPT-4
gpt4_Markdown_gpt3_outputs	Markdown input - GPT-3.5
gpt4_outputs_Image	Image input - GPT-4o Vision
gpt4_Markdown_outputs	Markdown conversion of documents

🏆 Conclusion

This project provides a comprehensive evaluation of LLM-based information extraction from layout-rich documents, comparing different input formats, models, and processing techniques.

💡 Feel free to contribute, suggest improvements, or report issues! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
gpt3.5_outputs/reg		gpt3.5_outputs/reg
gpt4_Markdown_Llama3_outputs		gpt4_Markdown_Llama3_outputs
gpt4_Markdown_gpt3_outputs/reg		gpt4_Markdown_gpt3_outputs/reg
gpt4_Markdown_gpt4_outputs/reg		gpt4_Markdown_gpt4_outputs/reg
gpt4_Markdown_outputs/reg		gpt4_Markdown_outputs/reg
gpt4_outputs/reg		gpt4_outputs/reg
gpt4_outputs_Image/reg		gpt4_outputs_Image/reg
llama3_70b_outputs		llama3_70b_outputs
modules		modules
vrdu2/registration-form		vrdu2/registration-form
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
config_layoutlmv3.py		config_layoutlmv3.py
editing_scripts.ipynb		editing_scripts.ipynb
error_markdown_files.ipynb		error_markdown_files.ipynb
layoutlmv3-original-data-w-custom-test.ipynb		layoutlmv3-original-data-w-custom-test.ipynb
layoutlmv3-w-custom-data.ipynb		layoutlmv3-w-custom-data.ipynb
llama-postprocess.ipynb		llama-postprocess.ipynb
llama-token-calc.ipynb		llama-token-calc.ipynb
main-gpt4-Image.ipynb		main-gpt4-Image.ipynb
main-gpt4-Markdown_batch.ipynb		main-gpt4-Markdown_batch.ipynb
main.ipynb		main.ipynb
main_md.ipynb		main_md.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

⚙️ Setup

1️⃣ Install the Project

2️⃣ Dataset 📂

3️⃣ API Keys 🔑

▶️ How to Run

📌 Input Types

📌 Selecting LLM

📌 Other Notebooks

📌 Output Folders

🏆 Conclusion

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

gayecolakoglu/LayIE-LLM

Folders and files

Latest commit

History

Repository files navigation

🚀 Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

⚙️ Setup

1️⃣ Install the Project

2️⃣ Dataset 📂

3️⃣ API Keys 🔑

▶️ How to Run

📌 Input Types

📌 Selecting LLM

📌 Other Notebooks

📌 Output Folders

🏆 Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages