|
| 1 | +# Spam Assassin |
| 2 | + |
| 3 | +> The [Spam Assassin][spam-assassin] public mail corpus. |
| 4 | +
|
| 5 | + |
| 6 | +<section class="intro"> |
| 7 | + |
| 8 | +</section> |
| 9 | + |
| 10 | +<!-- /.intro --> |
| 11 | + |
| 12 | + |
| 13 | +<section class="usage"> |
| 14 | + |
| 15 | +## Usage |
| 16 | + |
| 17 | +``` javascript |
| 18 | +var corpus = require( '@stdlib/datasets/spam-assassin' ); |
| 19 | +``` |
| 20 | + |
| 21 | +#### corpus() |
| 22 | + |
| 23 | +Returns the [Spam Assassin][spam-assassin] public mail corpus. |
| 24 | + |
| 25 | +``` javascript |
| 26 | +var data = corpus(); |
| 27 | +// returns [{...},{...},...] |
| 28 | +``` |
| 29 | + |
| 30 | +Each `array` element has the following fields: |
| 31 | + |
| 32 | +* `id`: message id (relative to message `group`) |
| 33 | +* `group`: message group |
| 34 | +* `checksum`: object containing checksum info |
| 35 | +* `text`: message text (including headers) |
| 36 | + |
| 37 | +The message `group` may be one of the following: |
| 38 | + |
| 39 | +* `easy-ham-1`: easier to detect non-spam e-mails (2500 messages) |
| 40 | +* `easy-ham-2`: easier to detect non-spam e-mails collected at a later date (1400 messages) |
| 41 | +* `hard-ham-1`: harder to detect non-spam e-mails (250 messages) |
| 42 | +* `spam-1`: spam e-mails (500 messages) |
| 43 | +* `spam-2`: spam e-mails collected at a later date (1396 messages) |
| 44 | + |
| 45 | +The `checksum` object contains the following fields: |
| 46 | + |
| 47 | +* `type`: checksum type (e.g., MD5) |
| 48 | +* `value`: checksum value |
| 49 | + |
| 50 | +</section> |
| 51 | + |
| 52 | +<!-- /.usage --> |
| 53 | + |
| 54 | + |
| 55 | +<section class="examples"> |
| 56 | + |
| 57 | +## Examples |
| 58 | + |
| 59 | +<!-- TODO: better example. Possibly a spam classifier. --> |
| 60 | + |
| 61 | +``` javascript |
| 62 | +var corpus = require( '@stdlib/datasets/spam-assassin' ); |
| 63 | + |
| 64 | +var data; |
| 65 | +var i; |
| 66 | + |
| 67 | +data = corpus(); |
| 68 | +for ( i = 0; i < data.length; i++ ) { |
| 69 | + console.log( 'Character Count: %d', data[ i ].text.length ); |
| 70 | +} |
| 71 | +``` |
| 72 | + |
| 73 | +</section> |
| 74 | + |
| 75 | +<!-- /.examples --> |
| 76 | + |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +<section class="cli"> |
| 81 | + |
| 82 | +## CLI |
| 83 | + |
| 84 | +<section class="usage"> |
| 85 | + |
| 86 | +### Usage |
| 87 | + |
| 88 | +``` bash |
| 89 | +Usage: spam-assassin [options] |
| 90 | + |
| 91 | +Options: |
| 92 | + |
| 93 | + -h, --help Print this message. |
| 94 | + -V, --version Print the package version. |
| 95 | + --format fmt Output format: 'txt' or 'ndjson'. |
| 96 | +``` |
| 97 | + |
| 98 | +</section> |
| 99 | + |
| 100 | +<!-- /.usage --> |
| 101 | + |
| 102 | + |
| 103 | +<section class="notes"> |
| 104 | + |
| 105 | +### Notes |
| 106 | + |
| 107 | +* The CLI supports two output formats: plain text (`txt`) and newline-delimited JSON ([NDJSON][ndjson]). The default output format is `txt`. |
| 108 | + |
| 109 | +</section> |
| 110 | + |
| 111 | +<!-- /.notes --> |
| 112 | + |
| 113 | + |
| 114 | +<section class="examples"> |
| 115 | + |
| 116 | +### Examples |
| 117 | + |
| 118 | +``` bash |
| 119 | +$ spam-assassin |
| 120 | +``` |
| 121 | + |
| 122 | +</section> |
| 123 | + |
| 124 | +<!-- /.examples --> |
| 125 | + |
| 126 | +</section> |
| 127 | + |
| 128 | +<!-- /.cli --> |
| 129 | + |
| 130 | + |
| 131 | +<!-- <license> --> |
| 132 | + |
| 133 | +--- |
| 134 | + |
| 135 | +## License |
| 136 | + |
| 137 | +The data files (databases) are licensed under an [Open Data Commons Public Domain Dedication & License 1.0][pddl-1.0] and their contents are licensed under [Creative Commons Zero v1.0 Universal][cc0]. The software is licensed under [Apache License, Version 2.0][apache-license]. |
| 138 | + |
| 139 | +<!-- </license> --> |
| 140 | + |
| 141 | + |
| 142 | +<section class="links"> |
| 143 | + |
| 144 | +[pddl-1.0]: http://opendatacommons.org/licenses/pddl/1.0/ |
| 145 | +[cc0]: https://creativecommons.org/publicdomain/zero/1.0 |
| 146 | +[apache-license]: https://www.apache.org/licenses/LICENSE-2.0 |
| 147 | + |
| 148 | +[ndjson]: http://specs.frictionlessdata.io/ndjson/ |
| 149 | + |
| 150 | +[spam-assassin]: http://spamassassin.apache.org/old/publiccorpus/readme.html |
| 151 | + |
| 152 | +</section> |
| 153 | + |
| 154 | +<!-- /.links --> |
0 commit comments