| changeset | 9c3ec0a5c7fc |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | John Rouillard <rouilj@ieee.org> |
| description | chore: remove __future print_funcion from code. Not needed as of Python 3. |
| files |
| changeset | 520075b29474 |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | John Rouillard <rouilj@ieee.org> |
| description | feat: support justhtml parsing library to convert email to plain text justhtml is an pure python, fast, HTML5 compliant parser. It is now an option for converting html only emails to plain text. Its output format differs slightly from dehtml or beautifulsoup. Mostly by removing extra blank lines. dehtml.py: Using the stream parser of justhtml. Unable to get the full document parser to successfully strip script and style blocks. If I can fix this and use the standard parser, I can in theory generate markdown from the DOM tree generated by justhtml. Updated test case to include inline elements that should not cause a line break when they are encountered. Running dehtml as: `python roundup/dehtml.py foo.html` will load foo.html and parse it using all available parsers. configuration.py: justhtml is available as an option. docs: updated CHANGES.txt, doc/tracker_config.txt added beautifulsoup and justhtml to the optional software section of doc/installtion.txt. test_mailgw.py, .github/workflows/ci-test Updated tests and install justhtml as part of CI. |
| files |
| changeset | b68a1d8fd5d9 |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | John Rouillard <rouilj@ieee.org> |
| description | chore(lint): use ternary, ignore unused param |
| files |
| changeset | 6079440ac023 |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | John Rouillard <rouilj@ieee.org> |
| description | chore(lint): doublequote strings, no yoda conitionals, sort imports... |
| files |
| changeset | 07ce4e4110f5 |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | John Rouillard <rouilj@ieee.org> |
| description | flake8 fixes: whitespace, remove unused imports |
| files |
| changeset | ef0975b4291b |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | John Rouillard <rouilj@ieee.org> |
| description | Explicitly set parser when calling beautiful soup. Quiets warning in to be committed tests. |
| files |
| changeset | af81e7a4302f |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | Christof Meerwald <cmeerw@cmeerw.org> |
| description | don't get confused by python-future making Python 3 package names available under Python 2 (but only with Python 2 functionality) |
| files |
| changeset | 1700542408f3 |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | John Rouillard <rouilj@ieee.org> |
| description | flake8 cleanups dehtml.py Note you need to disable long lines as there is a test example that requires really long lines of htmlized output. |
| files |
| changeset | b74f0b50bef1 |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | John Rouillard <rouilj@ieee.org> |
| description | Fix CI deprication warning for HTMLParser convert_charrefs under py3. /home/travis/build/roundup-tracker/roundup/roundup/dehtml.py:81: DeprecationWarning: The value of convert_charrefs will become True in 3.5. You are encouraged to set the value explicitly. parser = DumbHTMLParser() |
| files |
| changeset | c749d6795bc2 |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | Joseph Myers <jsm@polyomino.org.uk> |
| description | Python 3 preparation: unichr. |
| files |
| changeset | 56c9bcdea47f |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | Joseph Myers <jsm@polyomino.org.uk> |
| description | Python 3 preparation: unicode. This patch introduces roundup/anypy/strings.py, which has a comment explaining the string representations generally used and common functions to handle the required conversions. Places in the code that explicitly reference the "unicode" type / built-in function are generally changed to use the new functions (or, in a few places where those new functions don't seem to fit well, other approaches such as references to type(u'') or use of the codecs module). This patch does not generally attempt to address text conversions in any places not currently referencing the "unicode" type (although scripts/import_sf.py is made to use binary I/O in places as fixing the "unicode" reference didn't seem coherent otherwise). |
| files |
| changeset | 9c6d98bf79db |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | Joseph Myers <jsm@polyomino.org.uk> |
| description | Python 3 preparation: update HTMLParser / htmlentitydefs imports. Manual patch. |
| files |
| changeset | 64b05e24dbd8 |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | Joseph Myers <jsm@polyomino.org.uk> |
| description | Python 3 preparation: convert print to a function. Tool-assisted patch. It is possible that some "from __future__ import print_function" are not in fact needed, if a file only uses print() with a single string as an argument and so would work fine in Python 2 without that import. |
| files |
| changeset | e20f472fde7d |
|---|---|
| branch | |
| bookmark | |
| tag | |
| user | John Rouillard <rouilj@ieee.org> |
| description | issue2550799: provide basic support for handling html only emails Initial implementation and testing with the dehtml html converter done. The use of beautifulsoup 4 is not tested. My test system breaks when running dehtml.py using beautiful soup. I don't get the failures when running under the test harness, but the text output is significantly different (different line breaks, number of newlines etc.) The tests for dehtml need to be generated for beautiful soup and the expected output changed. Since I have a wonky install of beautiful soup, I don't trust my output as the standard to test against. Also since beautiful soup is optional, the test harness needs to skip the beautifulsoup tests if import bs4 fails. Again something outside of my expertise. I deleted the work I had done to implement that. I could not get it working and wanted to get this feature in in some form. |
| files |