comparison .github/workflows/ci-test.yml @ 8491:520075b29474

feat: support justhtml parsing library to convert email to plain text justhtml is an pure python, fast, HTML5 compliant parser. It is now an option for converting html only emails to plain text. Its output format differs slightly from dehtml or beautifulsoup. Mostly by removing extra blank lines. dehtml.py: Using the stream parser of justhtml. Unable to get the full document parser to successfully strip script and style blocks. If I can fix this and use the standard parser, I can in theory generate markdown from the DOM tree generated by justhtml. Updated test case to include inline elements that should not cause a line break when they are encountered. Running dehtml as: `python roundup/dehtml.py foo.html` will load foo.html and parse it using all available parsers. configuration.py: justhtml is available as an option. docs: updated CHANGES.txt, doc/tracker_config.txt added beautifulsoup and justhtml to the optional software section of doc/installtion.txt. test_mailgw.py, .github/workflows/ci-test Updated tests and install justhtml as part of CI.
author John Rouillard <rouilj@ieee.org>
date Sun, 14 Dec 2025 22:40:46 -0500
parents 4e0944649af7
children 2741b3de4432
comparison
equal deleted inserted replaced
8490:918792e35e0c 8491:520075b29474
238 run: | 238 run: |
239 sudo apt-get install swig gpg gpgsm libgpgme-dev 239 sudo apt-get install swig gpg gpgsm libgpgme-dev
240 # pygments for markdown2 to highlight code blocks 240 # pygments for markdown2 to highlight code blocks
241 pip install markdown2 pygments 241 pip install markdown2 pygments
242 # docutils for ReStructuredText 242 # docutils for ReStructuredText
243 pip install beautifulsoup4 brotli docutils jinja2 \ 243 pip install beautifulsoup4 justhtml brotli docutils jinja2 \
244 mistune==0.8.4 pyjwt pytz whoosh 244 mistune==0.8.4 pyjwt pytz whoosh
245 # gpg on PyPi is currently broken with newer OS platform 245 # gpg on PyPi is currently broken with newer OS platform
246 # ubuntu 24.04 246 # ubuntu 24.04
247 # used for newer Python versions. Temporarily use the 247 # used for newer Python versions. Temporarily use the
248 # testing index, which contains a newer version of the 248 # testing index, which contains a newer version of the

Roundup Issue Tracker: http://roundup-tracker.org/