Mercurial > p > roundup > code
annotate detectors/emailauditor.py @ 5305:e20f472fde7d
issue2550799: provide basic support for handling html only emails
Initial implementation and testing with the dehtml html converter
done.
The use of beautifulsoup 4 is not tested. My test system breaks when
running dehtml.py using beautiful soup. I don't get the failures when
running under the test harness, but the text output is significantly
different (different line breaks, number of newlines etc.)
The tests for dehtml need to be generated for beautiful soup and the
expected output changed. Since I have a wonky install of beautiful
soup, I don't trust my output as the standard to test against. Also
since beautiful soup is optional, the test harness needs to skip the
beautifulsoup tests if import bs4 fails. Again something outside of my
expertise. I deleted the work I had done to implement that. I could
not get it working and wanted to get this feature in in some form.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Fri, 13 Oct 2017 21:46:59 -0400 |
| parents | 6b32e9dac625 |
| children | 0942fe89e82e |
| rev | line source |
|---|---|
|
4627
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
1 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
2 def eml_to_mht(db, cl, nodeid, newvalues): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
3 '''This auditor fires whenever a new file entity is created. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
4 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
5 If the file is of type message/rfc822, we tack onthe extension .eml. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
6 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
7 The reason for this is that Microsoft Internet Explorer will not open |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
8 things with a .eml attachment, as they deem it 'unsafe'. Worse yet, |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
9 they'll just give you an incomprehensible error message. For more |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
10 information, please see: |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
11 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
12 http://support.microsoft.com/default.aspx?scid=kb;EN-US;825803 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
13 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
14 Their suggested work around is (excerpt): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
15 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
16 WORKAROUND |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
17 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
18 To work around this behavior, rename the .EML file that the URL |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
19 links to so that it has a .MHT file name extension, and then update |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
20 the URL to reflect the change to the file name. To do this: |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
21 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
22 1. In Windows Explorer, locate and then select the .EML file that |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
23 the URL links. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
24 2. Right-click the .EML file, and then click Rename. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
25 3. Change the file name so that the .EML file uses a .MHT file name |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
26 extension, and then press ENTER. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
27 4. Updated the URL that links to the file to reflect the new file |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
28 name extension. |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
29 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
30 So... we do that. :)''' |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
31 if newvalues.get('type', '').lower() == "message/rfc822": |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
32 if not newvalues.has_key('name'): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
33 newvalues['name'] = 'email.mht' |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
34 return |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
35 name = newvalues['name'] |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
36 if name.endswith('.eml'): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
37 name = name[:-4] |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
38 newvalues['name'] = name + '.mht' |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
39 |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
40 def init(db): |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
41 db.file.audit('create', eml_to_mht) |
|
6b32e9dac625
Restore sample detectors removed by 07c5d833dcb2 (issue2550574)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff
changeset
|
42 |
