diff CHANGES.txt @ 8491:520075b29474

feat: support justhtml parsing library to convert email to plain text justhtml is an pure python, fast, HTML5 compliant parser. It is now an option for converting html only emails to plain text. Its output format differs slightly from dehtml or beautifulsoup. Mostly by removing extra blank lines. dehtml.py: Using the stream parser of justhtml. Unable to get the full document parser to successfully strip script and style blocks. If I can fix this and use the standard parser, I can in theory generate markdown from the DOM tree generated by justhtml. Updated test case to include inline elements that should not cause a line break when they are encountered. Running dehtml as: `python roundup/dehtml.py foo.html` will load foo.html and parse it using all available parsers. configuration.py: justhtml is available as an option. docs: updated CHANGES.txt, doc/tracker_config.txt added beautifulsoup and justhtml to the optional software section of doc/installtion.txt. test_mailgw.py, .github/workflows/ci-test Updated tests and install justhtml as part of CI.
author John Rouillard <rouilj@ieee.org>
date Sun, 14 Dec 2025 22:40:46 -0500
parents 918792e35e0c
children 1976dedb3319
line wrap: on
line diff
--- a/CHANGES.txt	Sat Dec 13 23:02:53 2025 -0500
+++ b/CHANGES.txt	Sun Dec 14 22:40:46 2025 -0500
@@ -64,6 +64,10 @@
   config.ini. (John Rouillard)
 - issue2551152 - added basic PGP setup/use info to admin_guide. (John
   Rouillard)
+- add support for the 'justhtml' html 5 parser library. It is written
+  in pure Python. Used to convert html emails into plain text. Faster
+  then beautifulsoup4 and it passes the html 5 standard browser test
+  suite. Beautifulsoup is still supported. (John Rouillard)
 
 2025-07-13 2.5.0
 

Roundup Issue Tracker: http://roundup-tracker.org/