Mercurial > p > roundup > code
view roundup/dehtml.py @ 5710:0b79bfcb3312
Add support for making an idempotent POST. This allows retrying a POST
that was interrupted. It involves creating a post once only (poe) url
/rest/data/<class>/@poe/<random_token>. This url acts the same as a
post to /rest/data/<class>. However once the @poe url is used, it
can't be used for a second POST.
To make these changes:
1) Take the body of post_collection into a new post_collection_inner
function. Have post_collection call post_collection_inner.
2) Add a handler for POST to rest/data/class/@poe. This will return a
unique POE url. By default the url expires after 30 minutes. The
POE random token is only good for a specific user and is stored in
the session db.
3) Add a handler for POST to rest/data/<class>/@poe/<random token>.
The random token generated in 2 is validated for proper class (if
token is not generic) and proper user and must not have expired.
If everything is valid, call post_collection_inner to process the
input and generate the new entry.
To make recognition of 2 stable (so it's not confused with
rest/data/<:class_name>/<:item_id>), removed @ from
Routing::url_to_regex.
The current Routing.execute method stops on the first regular
expression to match the URL. Since item_id doesn't accept a POST, I
was getting 405 bad method sometimes. My guess is the order of the
regular expressions is not stable, so sometime I would get the right
regexp for /data/<class>/@poe and sometime I would get the one for
/data/<class>/<item_id>. By removing the @ from the url_to_regexp,
there was no way for the item_id case to match @poe.
There are alternate fixes we may need to look at. If a regexp matches
but the method does not, return to the regexp matching loop in
execute() looking for another match. Only once every possible match
has failed should the code return a 405 method failure.
Another fix is to implement a more sophisticated mechanism so that
@Routing.route("/data/<:class_name>/<:item_id>/<:attr_name>", 'PATCH')
has different regexps for matching <:class_name> <:item_id> and
<:attr_name>. Currently the regexp specified by url_to_regex is used
for every component.
Other fixes:
Made failure to find any props in props_from_args return an empty
dict rather than throwing an unhandled error.
Make __init__ for SimulateFieldStorageFromJson handle an empty json
doc. Useful for POSTing to rest/data/class/@poe with an empty
document.
Testing:
added testPostPOE to test/rest_common.py that I think covers
all the code that was added.
Documentation:
Add doc to rest.txt in the "Client API" section titled: Safely
Re-sending POST". Move existing section "Adding new rest endpoints" in
"Client API" to a new second level section called "Programming the
REST API". Also a minor change to the simple rest client moving the
header setting to continuation lines rather than showing one long
line.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Sun, 14 Apr 2019 21:07:11 -0400 |
| parents | c749d6795bc2 |
| children | b74f0b50bef1 |
line wrap: on
line source
from __future__ import print_function from roundup.anypy.strings import u2s, uchr class dehtml: def __init__(self, converter): if converter == "none": self.html2text = None return try: if converter == "beautifulsoup": # Not as well tested as dehtml. from bs4 import BeautifulSoup def html2text(html): soup = BeautifulSoup(html) # kill all script and style elements for script in soup(["script", "style"]): script.extract() return u2s(soup.get_text('\n', strip=True)) self.html2text = html2text else: raise ImportError # use except ImportError: # use the fallback below if beautiful soup is not installed. try: # Python 3+. from html.parser import HTMLParser from html.entities import name2codepoint except ImportError: # Python 2. from HTMLParser import HTMLParser from htmlentitydefs import name2codepoint class DumbHTMLParser(HTMLParser): # class attribute text="" # internal state variable _skip_data = False _last_empty = False def handle_data(self, data): if self._skip_data: # skip data if in script or style block return if ( data.strip() == ""): # reduce multiple blank lines to 1 if ( self._last_empty ): return else: self._last_empty = True else: self._last_empty = False self.text=self.text + data def handle_starttag(self, tag, attrs): if (tag == "p" ): self.text= self.text + "\n" if (tag in ("style", "script")): self._skip_data = True def handle_endtag(self, tag): if (tag in ("style", "script")): self._skip_data = False def handle_entityref(self, name): if self._skip_data: return c = uchr(name2codepoint[name]) try: self.text= self.text + c except UnicodeEncodeError: # print a space as a placeholder pass def html2text(html): parser = DumbHTMLParser() parser.feed(html) parser.close() return parser.text self.html2text = html2text if "__main__" == __name__: html=''' <body> <script> this must not be in output </script> <style> p {display:block} </style> <div class="header"><h1>Roundup</h1> <div id="searchbox" style="display: none"> <form class="search" action="../search.html" method="get"> <input type="text" name="q" size="18" /> <input type="submit" value="Search" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </div> <script type="text/javascript">$('#searchbox').show(0);</script> </div> <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="../index.html">Home</a></li> <li class="toctree-l1"><a class="reference external" href="http://pypi.python.org/pypi/roundup">Download</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../docs.html">Docs</a><ul class="current"> <li class="toctree-l2"><a class="reference internal" href="features.html">Roundup Features</a></li> <li class="toctree-l2 current"><a class="current reference internal" href="">Installing Roundup</a></li> <li class="toctree-l2"><a class="reference internal" href="upgrading.html">Upgrading to newer versions of Roundup</a></li> <li class="toctree-l2"><a class="reference internal" href="FAQ.html">Roundup FAQ</a></li> <li class="toctree-l2"><a class="reference internal" href="user_guide.html">User Guide</a></li> <li class="toctree-l2"><a class="reference internal" href="customizing.html">Customising Roundup</a></li> <li class="toctree-l2"><a class="reference internal" href="admin_guide.html">Administration Guide</a></li> </ul> <div class="section" id="prerequisites"> <h2><a class="toc-backref" href="#id5">Prerequisites</a></h2> <p>Roundup requires Python 2.5 or newer (but not Python 3) with a functioning anydbm module. Download the latest version from <a class="reference external" href="http://www.python.org/">http://www.python.org/</a>. It is highly recommended that users install the latest patch version of python as these contain many fixes to serious bugs.</p> <p>Some variants of Linux will need an additional “python dev” package installed for Roundup installation to work. Debian and derivatives, are known to require this.</p> <p>If you’re on windows, you will either need to be using the ActiveState python distribution (at <a class="reference external" href="http://www.activestate.com/Products/ActivePython/">http://www.activestate.com/Products/ActivePython/</a>), or you’ll have to install the win32all package separately (get it from <a class="reference external" href="http://starship.python.net/crew/mhammond/win32/">http://starship.python.net/crew/mhammond/win32/</a>).</p> </div> </body> ''' html2text = dehtml("dehtml").html2text if html2text: print(html2text(html)) try: # trap error seen if N_TOKENS not defined when run. html2text = dehtml("beautifulsoup").html2text if html2text: print(html2text(html)) except NameError as e: print("captured error %s"%e) html2text = dehtml("none").html2text if html2text: print("FAIL: Error, dehtml(none) is returning a function") else: print("PASS: dehtml(none) is returning None")
