Skip to content

Commit 84baf2c

Browse files
committed
Add README
1 parent cd5c671 commit 84baf2c

File tree

1 file changed

+154
-0
lines changed
  • lib/node_modules/@stdlib/datasets/spam-assassin

1 file changed

+154
-0
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Spam Assassin
2+
3+
> The [Spam Assassin][spam-assassin] public mail corpus.
4+
5+
6+
<section class="intro">
7+
8+
</section>
9+
10+
<!-- /.intro -->
11+
12+
13+
<section class="usage">
14+
15+
## Usage
16+
17+
``` javascript
18+
var corpus = require( '@stdlib/datasets/spam-assassin' );
19+
```
20+
21+
#### corpus()
22+
23+
Returns the [Spam Assassin][spam-assassin] public mail corpus.
24+
25+
``` javascript
26+
var data = corpus();
27+
// returns [{...},{...},...]
28+
```
29+
30+
Each `array` element has the following fields:
31+
32+
* `id`: message id (relative to message `group`)
33+
* `group`: message group
34+
* `checksum`: object containing checksum info
35+
* `text`: message text (including headers)
36+
37+
The message `group` may be one of the following:
38+
39+
* `easy-ham-1`: easier to detect non-spam e-mails (2500 messages)
40+
* `easy-ham-2`: easier to detect non-spam e-mails collected at a later date (1400 messages)
41+
* `hard-ham-1`: harder to detect non-spam e-mails (250 messages)
42+
* `spam-1`: spam e-mails (500 messages)
43+
* `spam-2`: spam e-mails collected at a later date (1396 messages)
44+
45+
The `checksum` object contains the following fields:
46+
47+
* `type`: checksum type (e.g., MD5)
48+
* `value`: checksum value
49+
50+
</section>
51+
52+
<!-- /.usage -->
53+
54+
55+
<section class="examples">
56+
57+
## Examples
58+
59+
<!-- TODO: better example. Possibly a spam classifier. -->
60+
61+
``` javascript
62+
var corpus = require( '@stdlib/datasets/spam-assassin' );
63+
64+
var data;
65+
var i;
66+
67+
data = corpus();
68+
for ( i = 0; i < data.length; i++ ) {
69+
console.log( 'Character Count: %d', data[ i ].text.length );
70+
}
71+
```
72+
73+
</section>
74+
75+
<!-- /.examples -->
76+
77+
78+
---
79+
80+
<section class="cli">
81+
82+
## CLI
83+
84+
<section class="usage">
85+
86+
### Usage
87+
88+
``` bash
89+
Usage: spam-assassin [options]
90+
91+
Options:
92+
93+
-h, --help Print this message.
94+
-V, --version Print the package version.
95+
--format fmt Output format: 'txt' or 'ndjson'.
96+
```
97+
98+
</section>
99+
100+
<!-- /.usage -->
101+
102+
103+
<section class="notes">
104+
105+
### Notes
106+
107+
* The CLI supports two output formats: plain text (`txt`) and newline-delimited JSON ([NDJSON][ndjson]). The default output format is `txt`.
108+
109+
</section>
110+
111+
<!-- /.notes -->
112+
113+
114+
<section class="examples">
115+
116+
### Examples
117+
118+
``` bash
119+
$ spam-assassin
120+
```
121+
122+
</section>
123+
124+
<!-- /.examples -->
125+
126+
</section>
127+
128+
<!-- /.cli -->
129+
130+
131+
<!-- <license> -->
132+
133+
---
134+
135+
## License
136+
137+
The data files (databases) are licensed under an [Open Data Commons Public Domain Dedication & License 1.0][pddl-1.0] and their contents are licensed under [Creative Commons Zero v1.0 Universal][cc0]. The software is licensed under [Apache License, Version 2.0][apache-license].
138+
139+
<!-- </license> -->
140+
141+
142+
<section class="links">
143+
144+
[pddl-1.0]: http://opendatacommons.org/licenses/pddl/1.0/
145+
[cc0]: https://creativecommons.org/publicdomain/zero/1.0
146+
[apache-license]: https://www.apache.org/licenses/LICENSE-2.0
147+
148+
[ndjson]: http://specs.frictionlessdata.io/ndjson/
149+
150+
[spam-assassin]: http://spamassassin.apache.org/old/publiccorpus/readme.html
151+
152+
</section>
153+
154+
<!-- /.links -->

0 commit comments

Comments
 (0)