Mercurial > p > roundup > code
annotate detectors/emailauditor.py @ 5096:e74c3611b138
- issue2550636, issue2550909: Added support for Whoosh indexer.
Also adds new config.ini setting called indexer to select
indexer. See ``doc/upgrading.txt`` for details. Initial patch
done by David Wolever. Patch modified (see ticket or below for
changes), docs updated and committed.
I have an outstanding issue with test/test_indexer.py. I have to
comment out all imports and tests for indexers I don't have (i.e.
mysql, postgres) otherwise no tests run.
With that change made, dbm, sqlite (rdbms), xapian and whoosh indexes
are all passing the indexer tests.
Changes summary:
1) support native back ends dbm and rdbms. (original patch only fell
through to dbm)
2) Developed whoosh stopfilter to not index stopwords or words outside
the the maxlength and minlength limits defined in index_common.py.
Required to pass the extremewords test_indexer test. Also I
removed a call to .lower on the input text as the tokenizer I chose
automatically does the lowercase.
3) Added support for max/min length to find. This was needed to pass
extremewords test.
4) Added back a call to save_index in add_text. This allowed all but
two tests to pass.
5) Fixed a call to:
results = searcher.search(query.Term("identifier", identifier))
which had an extra parameter that is an error under current whoosh.
6) Set limit=None in search call for find() otherwise it only return
10 items. This allowed it to pass manyresults test
Also due to changes in the roundup code removed the call in
indexer_whoosh to
from roundup.anypy.sets_ import set
since we use the python builtin set.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Sat, 25 Jun 2016 20:10:03 -0400 |
| parents | 6b32e9dac625 |
| children | 0942fe89e82e |
| rev | line source |
|---|---|
|
4627
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
1 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
2 def eml_to_mht(db, cl, nodeid, newvalues): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
3 '''This auditor fires whenever a new file entity is created. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
4 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
5 If the file is of type message/rfc822, we tack onthe extension .eml. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
6 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
7 The reason for this is that Microsoft Internet Explorer will not open |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
8 things with a .eml attachment, as they deem it 'unsafe'. Worse yet, |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
9 they'll just give you an incomprehensible error message. For more |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
10 information, please see: |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
11 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
12 http://support.microsoft.com/default.aspx?scid=kb;EN-US;825803 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
13 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
14 Their suggested work around is (excerpt): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
15 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
16 WORKAROUND |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
17 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
18 To work around this behavior, rename the .EML file that the URL |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
19 links to so that it has a .MHT file name extension, and then update |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
20 the URL to reflect the change to the file name. To do this: |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
21 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
22 1. In Windows Explorer, locate and then select the .EML file that |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
23 the URL links. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
24 2. Right-click the .EML file, and then click Rename. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
25 3. Change the file name so that the .EML file uses a .MHT file name |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
26 extension, and then press ENTER. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
27 4. Updated the URL that links to the file to reflect the new file |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
28 name extension. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
29 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
30 So... we do that. :)''' |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
31 if newvalues.get('type', '').lower() == "message/rfc822": |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
32 if not newvalues.has_key('name'): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
33 newvalues['name'] = 'email.mht' |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
34 return |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
35 name = newvalues['name'] |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
36 if name.endswith('.eml'): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
37 name = name[:-4] |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
38 newvalues['name'] = name + '.mht' |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
39 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
40 def init(db): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
41 db.file.audit('create', eml_to_mht) |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
42 |
