Mercurial > p > roundup > code
comparison .github/workflows/ci-test.yml @ 8491:520075b29474
feat: support justhtml parsing library to convert email to plain text
justhtml is an pure python, fast, HTML5 compliant parser. It is now an
option for converting html only emails to plain text. Its output
format differs slightly from dehtml or beautifulsoup. Mostly by
removing extra blank lines.
dehtml.py:
Using the stream parser of justhtml. Unable to get the full
document parser to successfully strip script and style blocks.
If I can fix this and use the standard parser, I can in theory
generate markdown from the DOM tree generated by justhtml.
Updated test case to include inline elements that should not cause a
line break when they are encountered. Running dehtml as: `python
roundup/dehtml.py foo.html` will load foo.html and parse it using
all available parsers.
configuration.py: justhtml is available as an option.
docs: updated CHANGES.txt, doc/tracker_config.txt added beautifulsoup
and justhtml to the optional software section of doc/installtion.txt.
test_mailgw.py, .github/workflows/ci-test Updated tests and install
justhtml as part of CI.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Sun, 14 Dec 2025 22:40:46 -0500 |
| parents | 4e0944649af7 |
| children | 2741b3de4432 |
comparison
equal
deleted
inserted
replaced
| 8490:918792e35e0c | 8491:520075b29474 |
|---|---|
| 238 run: | | 238 run: | |
| 239 sudo apt-get install swig gpg gpgsm libgpgme-dev | 239 sudo apt-get install swig gpg gpgsm libgpgme-dev |
| 240 # pygments for markdown2 to highlight code blocks | 240 # pygments for markdown2 to highlight code blocks |
| 241 pip install markdown2 pygments | 241 pip install markdown2 pygments |
| 242 # docutils for ReStructuredText | 242 # docutils for ReStructuredText |
| 243 pip install beautifulsoup4 brotli docutils jinja2 \ | 243 pip install beautifulsoup4 justhtml brotli docutils jinja2 \ |
| 244 mistune==0.8.4 pyjwt pytz whoosh | 244 mistune==0.8.4 pyjwt pytz whoosh |
| 245 # gpg on PyPi is currently broken with newer OS platform | 245 # gpg on PyPi is currently broken with newer OS platform |
| 246 # ubuntu 24.04 | 246 # ubuntu 24.04 |
| 247 # used for newer Python versions. Temporarily use the | 247 # used for newer Python versions. Temporarily use the |
| 248 # testing index, which contains a newer version of the | 248 # testing index, which contains a newer version of the |
