This project uses Puppeteer to convert a web page to a PDF file which renders selectable text instead of flattened image.
When you use a browser's "Print to PDF" feature, it's using a print driver to interpret the rendered page. Sometimes, especially with complex layouts, fonts, or CSS, the driver might "flatten" the page into an image to ensure the visual output is exactly what you see on screen. This process is called rasterization, and it results in non-selectable text.
Puppeteer, however, doesn't simulate the print dialog. It directly accesses Chromium's internal PDF rendering engine. This engine is specifically designed to translate the web page's structure (the DOM) into a structured PDF document. It creates the PDF by defining text objects, vector shapes, and images directly. This method preserves the underlying text information, making it selectable, searchable, and accessible in the final PDF file.
- Ensure you have Node.js installed.
- Install dependencies:
npm install
Run the script from your command line, providing the URL to convert and the desired output file name.
npm start -- --url <your-url> --output <output-filename.pdf>--url,-u: The URL to convert to PDF. (Required)--output,-o: The output PDF file name. (Required)
npm start -- --url "https://example.com" --output "example.pdf"You can adjust the PDF output settings in convert_to_pdf.js, such as:
format: 'A4', 'Letter', etc.margin: Margins for the PDF.printBackground: Whether to print background graphics.