Skip to content

Commit 64ddd04

Browse files
committed
Initial import
0 parents  commit 64ddd04

File tree

6 files changed

+139
-0
lines changed

6 files changed

+139
-0
lines changed

README.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
2+
# Reducer
3+
4+
A simple tool developed in Haskell to reduce a sequence of elements, represented by strings,
5+
to a simple expression by finding repeating patterns.
6+
7+
## Example
8+
9+
### Using ghci
10+
11+
Using ghci, to reduce the sequence ["a","a","b","c","a","b","c"]:
12+
13+
*Reducer> let (expr,patterns) = reduce ["a","a","b","c","a","b","c"]
14+
15+
The final reduced expression is:
16+
17+
*Reducer> expr
18+
["a","P0","P0"]
19+
20+
And the patterns found are:
21+
22+
*Reducer> patterns
23+
[("P0",["a","b","c"])]
24+
25+
### Using the command line
26+
27+
You can use the main.hs program to build a GraphViz representation of the reduced
28+
expression. The input expression shoud be stored in a JSON file:
29+
30+
$ cat ex/input.json
31+
["a","a","b","c","a","b","c"]
32+
33+
run the program using for example runhaskell, giving as argument the input file,
34+
and saving the output to a new file:
35+
36+
$ runhaskell main.hs ex/input.json > ex/expr.dot
37+
38+
Now use the dot tool from GrpahViz to create a png file for example:
39+
40+
$ dot -Tpng ex/expr.dot > ex/expr.png
41+
42+
You should see something similar to the image below:
43+
44+
![Example](ex/expr.png)

Reducer.hs

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
2+
module Reducer (reduce,toDot) where
3+
4+
import Data.List
5+
6+
-- n-grams
7+
8+
ngramsSet 0 _ = []
9+
ngramsSet _ [] = []
10+
ngramsSet n xs
11+
| length ngram == n = ngram : ngramsSet n (tail xs)
12+
| otherwise = []
13+
where
14+
ngram = take n xs
15+
16+
ngramsCount l = [ (x,length(filter (==x) l)) | x <- nub l ]
17+
18+
ngrams n l = ngramsCount $ ngramsSet n l
19+
20+
allGrams l = [ (n, ngrams n l) | n <- reverse [2 .. div (length l) 2] ]
21+
22+
filtGrams l = filter ( (>1).snd ) l -- remove n-grams with occurences == 1
23+
24+
-- Replace sub lists of tokens in a list
25+
26+
replace _ _ [] = []
27+
replace pat sub l = if (pat == take (length pat) l) then
28+
sub : replace pat sub (drop (length pat) l)
29+
else
30+
(head l) : replace pat sub (tail l)
31+
32+
-- Count the number of replace sub lists of tokens in a list
33+
34+
replaceCount _ [] = 0
35+
replaceCount pat l
36+
| pat == take (length pat) l = 1 + replaceCount pat (drop (length pat) l)
37+
| otherwise = 0 + replaceCount pat (tail l)
38+
39+
-- main reduce function
40+
41+
reduce l = reduceAux l []
42+
43+
reduceAux l ps = let a = allGrams l
44+
f = [ (fst i, filtGrams (snd i)) | i <- a ]
45+
f2 = filter ( (/=[]).snd ) f
46+
flat = map fst (concat (map snd f2))
47+
(x, y) = replaceAll l flat 0 []
48+
in
49+
if y == [] then (x, ps)
50+
else reduceAux x (ps++y)
51+
52+
replaceAll l [] _ ps = (l, ps)
53+
replaceAll l (x:xs) c ps =
54+
let cnt = replaceCount x l
55+
pid = nextId c
56+
in
57+
if cnt>1 then replaceAll (replace x pid l) xs (c+1) (ps++[(pid, x)])
58+
else replaceAll l xs c ps
59+
60+
nextId c = "P" ++ show c -- create next pattern id
61+
62+
-- debugging
63+
64+
getP p [] = []
65+
getP p (x:xs)
66+
| p == (fst x) = snd x
67+
| otherwise = getP p xs
68+
69+
toDot (expr,_) =
70+
let header = ["digraph auto {","rankdir=LR;", "size=\"7,5\";", "node [shape = circle];"]
71+
edges = [ toEdge e | e <- zip [1..] (ngramsSet 2 expr) ]
72+
in unlines $ concat [header,edges,["}"]]
73+
74+
toEdge (n,[f,t]) = concat ["\"",f,"\" -> \"",t,"\" [ label =\"",show n,"\" ];"]
75+

ex/expr.dot

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
digraph auto {
2+
rankdir=LR;
3+
size="7,5";
4+
node [shape = circle];
5+
"a" -> "P0" [ label ="1" ];
6+
"P0" -> "P0" [ label ="2" ];
7+
}

ex/expr.png

5.13 KB
Loading

ex/input.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["a","a","b","c","a","b","c"]

main.hs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
2+
import Reducer
3+
import System.Environment
4+
import qualified Data.ByteString.Lazy as BS
5+
import qualified Data.Aeson as A
6+
7+
main = do
8+
[file] <- getArgs
9+
content <- BS.readFile file
10+
let Just input = A.decode content :: Maybe [String]
11+
putStr $ toDot $ reduce input
12+

0 commit comments

Comments
 (0)