This document describes the technical implementation of the PDF viewer component in the Unstract frontend application. The PdfViewer component is responsible for rendering PDF documents and overlaying coordinate-based highlights to visually indicate extracted data regions. This viewer is primarily used within Prompt Studio's DocumentManager to display documents alongside extraction results.
For information about the broader document management and indexing pipeline that prepares documents for viewing, see Document Management and Indexing. For details about the highlight data structure and coordinate system used in extraction, see PDF Viewer and Document Highlighting.
The PDF viewer is built on the @react-pdf-viewer library ecosystem, which provides a React wrapper around the PDF.js rendering engine. The system uses multiple specialized plugins to achieve its functionality.
Component Hierarchy
| Layer | Component/Library | Purpose |
|---|---|---|
| Top-level | PdfViewer | Main component coordinating all plugins and props |
| Rendering | @react-pdf-viewer/core.Viewer | Core PDF rendering using PDF.js |
| Worker | pdfjs-dist/build/pdf.worker.min.js | PDF.js worker for document parsing |
| UI Plugin | defaultLayoutPlugin | Provides toolbar, sidebar, and layout controls |
| Navigation Plugin | pageNavigationPlugin | Programmatic page navigation API |
| Highlight Plugin | highlightPlugin | Renders overlays on PDF pages |
| Custom Plugin | RenderHighlights | Optional plugin for coordinate-based highlights |
Sources: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx1-123 frontend/package.json9-12
The PDF viewer uses a Web Worker to offload PDF parsing and rendering operations from the main thread. This prevents UI blocking during document loading.
The worker URL is configured using Vite's asset import syntax: frontend/src/helpers/pdfWorkerConfig.js1-5
The ?url suffix instructs Vite to return the asset's URL rather than importing it directly. This URL is then passed to the Worker component: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx102-112
Worker Lifecycle:
PDF_WORKER_URL from pdfWorkerConfig.js<Worker workerUrl={PDF_WORKER_URL}> creates Web Worker instance<Viewer> communicates with worker for PDF operationsSources: frontend/src/helpers/pdfWorkerConfig.js1-5 frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx102-112
The PdfViewer component receives highlight data as an array of coordinate arrays. Each coordinate array represents a bounding box on a specific page using the format: [pageNumber, x1, y1, x2, y2, confidence?].
The processing function filters and normalizes highlight data: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx26-44
Processing Steps:
[pageNumber, x1, y1, x2, y2] arrays[[0, 0, 0, 0]] as fallbackCoordinate Format:
| Index | Value | Description |
|---|---|---|
| 0 | pageNumber | Zero-indexed page number |
| 1 | x1 | Left X coordinate (normalized 0-1) |
| 2 | y1 | Top Y coordinate (normalized 0-1) |
| 3 | x2 | Right X coordinate (normalized 0-1) |
| 4 | y2 | Bottom Y coordinate (normalized 0-1) |
| 5 | confidence | Optional confidence score (stripped during processing) |
Sources: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx26-48
The PDF viewer uses a plugin architecture where multiple plugins extend the base Viewer functionality. The highlight plugin conditionally uses custom rendering logic when highlight data is available.
Plugin Creation Strategy:
The component creates both highlight plugin variants unconditionally to maintain React hook order: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx64-78
Dynamic Plugin Import:
The RenderHighlights plugin is loaded dynamically at module initialization: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx12-18
This top-level async import allows the component to function even when the plugin file doesn't exist, enabling graceful degradation for installations without custom highlight rendering.
Current Highlight Selection:
When currentHighlightIndex is provided, only that specific highlight is rendered: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx50-62
| Condition | Behavior |
|---|---|
currentHighlightIndex === null | Render all highlights |
currentHighlightIndex in range | Render only processedHighlightData[currentHighlightIndex] |
currentHighlightIndex out of range | Render all highlights |
Sources: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx12-18 frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx50-78
The viewer automatically navigates to the page containing the current highlight whenever the highlight data or index changes. This ensures the relevant document section is visible when extraction results are updated.
Navigation Logic: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx80-98
The useEffect hook triggers on three dependencies:
highlightData: Raw highlight coordinate arraysjumpToPage: Navigation function from pageNavigationPlugincurrentHighlightIndex: Selected highlight index (for multi-highlight navigation)Navigation Steps:
removeZerosAndDeleteIfAllZero() to get valid coordinatescurrentHighlightIndex if provided and in range, otherwise default to 0cleanedHighlightData[index][0]setTimeout(100ms) before calling jumpToPage() to ensure PDF rendering completesjumpToPage(pageNumber) with zero-indexed page numberTiming Considerations:
The 100ms delay is critical because:
Multi-Highlight Navigation:
When navigating between multiple highlights on the same document:
currentHighlightIndex propSources: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx80-98 frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx22-23
The PdfViewer component accepts three props that control document rendering and highlight behavior:
Prop Definitions:
| Prop | Type | Required | Description |
|---|---|---|---|
fileUrl | any | Yes | URL or data URI for the PDF document to display |
highlightData | array | No | Array of highlight coordinates in format [[page, x1, y1, x2, y2, confidence?], ...] |
currentHighlightIndex | number | No | Zero-indexed position in highlightData array to render as active highlight |
Prop Validation: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx116-120
Usage Patterns:
Document-only viewing: Provide only fileUrl
Document with highlights: Provide fileUrl and highlightData
Navigation between highlights: Add currentHighlightIndex
Sources: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx20 frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx116-120
The component applies custom styling through two CSS files:
Base highlight styles: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx8
@react-pdf-viewer/highlight/lib/styles/index.cssCustom highlight styles: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx9
./Highlight.css.doc-manager-body container classThe component wraps the viewer in a div with ref={parentRef} and class doc-manager-body: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx101-112
This container structure enables:
Sources: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx8-9 frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx101-112
The PdfViewer component is primarily used within Prompt Studio's DocumentManager to display documents alongside extraction results. The integration flow connects extraction outputs to visual highlights on the source document.
Integration Points:
fileUrl from uploaded or indexed documentsusePromptOutputStore are transformed into coordinate arrayscurrentHighlightIndex to focus on specific extractionsFor more details on the DocumentManager component and its role in Prompt Studio, see Frontend IDE Components. For information about how extraction coordinates are generated by the backend, see Structure Tool and Extraction Pipeline.
Sources: frontend/src/components/custom-tools/pdf-viewer/PdfViewer.jsx1-123
Refresh this wiki