Mercurial > p > roundup > code
annotate roundup/dehtml.py @ 7752:b2dbab2b34bc
fix(refactor): multiple fixups using ruff linter; more testing.
Converting to using the ruff linter and its rulesets. Fixed a number
of issues.
admin.py:
sort imports
use immutable tuples as default value markers for parameters where a
None value is valid.
reduced some loops to list comprehensions for performance
used ternary to simplify some if statements
named some variables to make them less magic
(e.g. _default_savepoint_setting = 1000)
fixed some tests for argument counts < 2 becomes != 2 so 3 is an
error.
moved exception handlers outside of loops for performance where
exception handler will abort loop anyway.
renamed variables called 'id' or 'dir' as they shadow builtin
commands.
fix translations of form _("string %s" % value) -> _("string %s") %
value so translation will be looked up with the key before
substitution.
end dicts, tuples with a trailing comma to reduce missing comma
errors if modified
simplified sorted(list(self.setting.keys())) to
sorted(self.setting.keys()) as sorted consumes whole list.
in if conditions put compared variable on left and threshold condition
on right. (no yoda conditions)
multiple noqa: suppression
removed unneeded noqa as lint rulesets are a bit different
do_get - refactor output printing logic: Use fast return if not
special formatting is requested; use isinstance with a tuple
rather than two isinstance calls; cleaned up flow and removed
comments on algorithm as it can be easily read from the code.
do_filter, do_find - refactor output printing logic. Reduce
duplicate code.
do_find - renamed variable 'value' that was set inside a loop. The
loop index variable was also named 'value'.
do_pragma - added hint to use list subcommand if setting was not
found. Replaced condition 'type(x) is bool' with 'isinstance(x,
bool)' for various types.
test_admin.py
added testing for do_list
better test coverage for do_get includes: -S and -d for multilinks,
error case for -d with non-link.
better testing for do_find including all output modes
better testing for do_filter including all output modes
fixed expected output for do_pragma that now includes hint to use
pragma list if setting not found.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Fri, 01 Mar 2024 14:53:18 -0500 |
| parents | 07ce4e4110f5 |
| children | 6079440ac023 |
| rev | line source |
|---|---|
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
1 |
|
5376
64b05e24dbd8
Python 3 preparation: convert print to a function.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5305
diff
changeset
|
2 from __future__ import print_function |
|
5417
c749d6795bc2
Python 3 preparation: unichr.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5416
diff
changeset
|
3 from roundup.anypy.strings import u2s, uchr |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
4 |
|
6110
af81e7a4302f
don't get confused by python-future making Python 3 package names available under Python 2 (but only with Python 2 functionality)
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5997
diff
changeset
|
5 import sys |
|
af81e7a4302f
don't get confused by python-future making Python 3 package names available under Python 2 (but only with Python 2 functionality)
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5997
diff
changeset
|
6 _pyver = sys.version_info[0] |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
7 |
|
7228
07ce4e4110f5
flake8 fixes: whitespace, remove unused imports
John Rouillard <rouilj@ieee.org>
parents:
6669
diff
changeset
|
8 |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
9 class dehtml: |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
10 def __init__(self, converter): |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
11 if converter == "none": |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
12 self.html2text = None |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
13 return |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
14 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
15 try: |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
16 if converter == "beautifulsoup": |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
17 # Not as well tested as dehtml. |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
18 from bs4 import BeautifulSoup |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
19 |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
20 def html2text(html): |
|
6669
ef0975b4291b
Explicitly set parser when calling beautiful soup.
John Rouillard <rouilj@ieee.org>
parents:
6110
diff
changeset
|
21 soup = BeautifulSoup(html, "html.parser") |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
22 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
23 # kill all script and style elements |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
24 for script in soup(["script", "style"]): |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
25 script.extract() |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
26 |
|
5416
56c9bcdea47f
Python 3 preparation: unicode.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5411
diff
changeset
|
27 return u2s(soup.get_text('\n', strip=True)) |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
28 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
29 self.html2text = html2text |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
30 else: |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
31 raise ImportError |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
32 except ImportError: |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
33 # use the fallback below if beautiful soup is not installed. |
|
5411
9c6d98bf79db
Python 3 preparation: update HTMLParser / htmlentitydefs imports.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5376
diff
changeset
|
34 try: |
|
9c6d98bf79db
Python 3 preparation: update HTMLParser / htmlentitydefs imports.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5376
diff
changeset
|
35 # Python 3+. |
|
9c6d98bf79db
Python 3 preparation: update HTMLParser / htmlentitydefs imports.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5376
diff
changeset
|
36 from html.parser import HTMLParser |
|
9c6d98bf79db
Python 3 preparation: update HTMLParser / htmlentitydefs imports.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5376
diff
changeset
|
37 from html.entities import name2codepoint |
|
9c6d98bf79db
Python 3 preparation: update HTMLParser / htmlentitydefs imports.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5376
diff
changeset
|
38 except ImportError: |
|
9c6d98bf79db
Python 3 preparation: update HTMLParser / htmlentitydefs imports.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5376
diff
changeset
|
39 # Python 2. |
|
9c6d98bf79db
Python 3 preparation: update HTMLParser / htmlentitydefs imports.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5376
diff
changeset
|
40 from HTMLParser import HTMLParser |
|
9c6d98bf79db
Python 3 preparation: update HTMLParser / htmlentitydefs imports.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5376
diff
changeset
|
41 from htmlentitydefs import name2codepoint |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
42 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
43 class DumbHTMLParser(HTMLParser): |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
44 # class attribute |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
45 text = "" |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
46 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
47 # internal state variable |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
48 _skip_data = False |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
49 _last_empty = False |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
50 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
51 def handle_data(self, data): |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
52 if self._skip_data: # skip data in script or style block |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
53 return |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
54 |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
55 if (data.strip() == ""): |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
56 # reduce multiple blank lines to 1 |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
57 if (self._last_empty): |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
58 return |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
59 else: |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
60 self._last_empty = True |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
61 else: |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
62 self._last_empty = False |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
63 |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
64 self.text = self.text + data |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
65 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
66 def handle_starttag(self, tag, attrs): |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
67 if (tag == "p"): |
|
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
68 self.text = self.text + "\n" |
|
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
69 if (tag in ("style", "script")): |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
70 self._skip_data = True |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
71 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
72 def handle_endtag(self, tag): |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
73 if (tag in ("style", "script")): |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
74 self._skip_data = False |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
75 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
76 def handle_entityref(self, name): |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
77 if self._skip_data: |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
78 return |
|
5417
c749d6795bc2
Python 3 preparation: unichr.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5416
diff
changeset
|
79 c = uchr(name2codepoint[name]) |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
80 try: |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
81 self.text = self.text + c |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
82 except UnicodeEncodeError: |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
83 # print a space as a placeholder |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
84 self.text = self.text + ' ' |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
85 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
86 def html2text(html): |
|
6110
af81e7a4302f
don't get confused by python-future making Python 3 package names available under Python 2 (but only with Python 2 functionality)
Christof Meerwald <cmeerw@cmeerw.org>
parents:
5997
diff
changeset
|
87 if _pyver == 3: |
|
5838
b74f0b50bef1
Fix CI deprication warning for HTMLParser convert_charrefs under py3.
John Rouillard <rouilj@ieee.org>
parents:
5417
diff
changeset
|
88 parser = DumbHTMLParser(convert_charrefs=True) |
|
b74f0b50bef1
Fix CI deprication warning for HTMLParser convert_charrefs under py3.
John Rouillard <rouilj@ieee.org>
parents:
5417
diff
changeset
|
89 else: |
|
b74f0b50bef1
Fix CI deprication warning for HTMLParser convert_charrefs under py3.
John Rouillard <rouilj@ieee.org>
parents:
5417
diff
changeset
|
90 parser = DumbHTMLParser() |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
91 parser.feed(html) |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
92 parser.close() |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
93 return parser.text |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
94 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
95 self.html2text = html2text |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
96 |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
97 |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
98 if "__main__" == __name__: |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
99 html = ''' |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
100 <body> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
101 <script> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
102 this must not be in output |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
103 </script> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
104 <style> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
105 p {display:block} |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
106 </style> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
107 <div class="header"><h1>Roundup</h1> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
108 <div id="searchbox" style="display: none"> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
109 <form class="search" action="../search.html" method="get"> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
110 <input type="text" name="q" size="18" /> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
111 <input type="submit" value="Search" /> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
112 <input type="hidden" name="check_keywords" value="yes" /> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
113 <input type="hidden" name="area" value="default" /> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
114 </form> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
115 </div> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
116 <script type="text/javascript">$('#searchbox').show(0);</script> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
117 </div> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
118 <ul class="current"> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
119 <li class="toctree-l1"><a class="reference internal" href="../index.html">Home</a></li> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
120 <li class="toctree-l1"><a class="reference external" href="http://pypi.python.org/pypi/roundup">Download</a></li> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
121 <li class="toctree-l1 current"><a class="reference internal" href="../docs.html">Docs</a><ul class="current"> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
122 <li class="toctree-l2"><a class="reference internal" href="features.html">Roundup Features</a></li> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
123 <li class="toctree-l2 current"><a class="current reference internal" href="">Installing Roundup</a></li> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
124 <li class="toctree-l2"><a class="reference internal" href="upgrading.html">Upgrading to newer versions of Roundup</a></li> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
125 <li class="toctree-l2"><a class="reference internal" href="FAQ.html">Roundup FAQ</a></li> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
126 <li class="toctree-l2"><a class="reference internal" href="user_guide.html">User Guide</a></li> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
127 <li class="toctree-l2"><a class="reference internal" href="customizing.html">Customising Roundup</a></li> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
128 <li class="toctree-l2"><a class="reference internal" href="admin_guide.html">Administration Guide</a></li> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
129 </ul> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
130 <div class="section" id="prerequisites"> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
131 <h2><a class="toc-backref" href="#id5">Prerequisites</a></h2> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
132 <p>Roundup requires Python 2.5 or newer (but not Python 3) with a functioning |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
133 anydbm module. Download the latest version from <a class="reference external" href="http://www.python.org/">http://www.python.org/</a>. |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
134 It is highly recommended that users install the latest patch version |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
135 of python as these contain many fixes to serious bugs.</p> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
136 <p>Some variants of Linux will need an additional “python dev” package |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
137 installed for Roundup installation to work. Debian and derivatives, are |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
138 known to require this.</p> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
139 <p>If you’re on windows, you will either need to be using the ActiveState python |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
140 distribution (at <a class="reference external" href="http://www.activestate.com/Products/ActivePython/">http://www.activestate.com/Products/ActivePython/</a>), or you’ll |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
141 have to install the win32all package separately (get it from |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
142 <a class="reference external" href="http://starship.python.net/crew/mhammond/win32/">http://starship.python.net/crew/mhammond/win32/</a>).</p> |
|
5838
b74f0b50bef1
Fix CI deprication warning for HTMLParser convert_charrefs under py3.
John Rouillard <rouilj@ieee.org>
parents:
5417
diff
changeset
|
143 <script> |
|
b74f0b50bef1
Fix CI deprication warning for HTMLParser convert_charrefs under py3.
John Rouillard <rouilj@ieee.org>
parents:
5417
diff
changeset
|
144 < HELP > |
|
b74f0b50bef1
Fix CI deprication warning for HTMLParser convert_charrefs under py3.
John Rouillard <rouilj@ieee.org>
parents:
5417
diff
changeset
|
145 </script> |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
146 </div> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
147 </body> |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
148 ''' |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
149 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
150 html2text = dehtml("dehtml").html2text |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
151 if html2text: |
|
5376
64b05e24dbd8
Python 3 preparation: convert print to a function.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5305
diff
changeset
|
152 print(html2text(html)) |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
153 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
154 try: |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
155 # trap error seen if N_TOKENS not defined when run. |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
156 html2text = dehtml("beautifulsoup").html2text |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
157 if html2text: |
|
5376
64b05e24dbd8
Python 3 preparation: convert print to a function.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5305
diff
changeset
|
158 print(html2text(html)) |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
159 except NameError as e: |
|
5997
1700542408f3
flake8 cleanups dehtml.py
John Rouillard <rouilj@ieee.org>
parents:
5838
diff
changeset
|
160 print("captured error %s" % e) |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
161 |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
162 html2text = dehtml("none").html2text |
|
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
163 if html2text: |
|
5376
64b05e24dbd8
Python 3 preparation: convert print to a function.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5305
diff
changeset
|
164 print("FAIL: Error, dehtml(none) is returning a function") |
|
5305
e20f472fde7d
issue2550799: provide basic support for handling html only emails
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
165 else: |
|
5376
64b05e24dbd8
Python 3 preparation: convert print to a function.
Joseph Myers <jsm@polyomino.org.uk>
parents:
5305
diff
changeset
|
166 print("PASS: dehtml(none) is returning None") |
