2

I am stuck here with a parsing problem on python 2.7, let me explain:

I am parsing events from the incapsula API. The goal is to make them readable in an excel table, for making stats and graph.

On the signature field, you can read the type of event/attack and a number. The number includes the number of attacks, so I decided to multiply each line by its corresponding sum of attack's number after the 'signature=' field.

Like this capture :

 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3}

So far everything goes as expected, I got the right count of attacks.

BUT

On some rare events, they are multiple values on the signature field, like this capture :

 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
 visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
 visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
 visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}

I still got the right count of attacks on those rare lines too, but I want to arrange the signature field from this :

signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}

To this:

signature={api.threats.sql_injection}
signature={api.threats.sql_injection}
signature={api.threats.sql_injection}
signature={api.threats.bot_access_control}
signature={api.threats.illegal_resource_access}
signature={api.threats.cross_site_scripting}
signature={api.threats.bot_access_control}
signature={api.threats.illegal_resource_access}
signature={api.threats.illegal_resource_access}
signature={api.threats.illegal_resource_access}

(the first six lines are the first event duplicated 6 times (3+1+1+1 =6), the last 4 are the second event duplicated 4 times (1+3=4)

My current source code:

#count the number of attack per line
f = open('monthlyLogShort.txt','r')
g = open("count.txt", 'w')
kensu = f.readlines()
f.close()
for line in kensu:
        st = line.find('signature=')
        end = line.find('}')
        unprecise = line[st:end+1]
        #count = int(re.search(r'\d+', unprecise).group())
        count = sum(map(int,re.findall(r'[0-9]+', unprecise)))
        print >> g, count

g.close()

#replicate lines according to the number of attack            
h = open('flog.txt','w')

with open("monthlyLogShort.txt") as textfile1, open("count.txt") as textfile2:
    for x, y in izip(textfile1, textfile2):
        x = x.strip()
        y = y.strip()
        print >> h, x * int(y)
h.close()

1 Answer 1

1

If I read your requirements correctly, you are trying to emit one line for each threat occurrence while preserving the rest of the record. This solution does not output the counts directly, instead it transforms the data so that it is uniformly one threat per line.

Code:

sig_str = 'signature={'
for line in kensu:
    record, signature = line.split(sig_str)
    threats = signature.split('}')[0]
    for counts in threats.split(','):
        if '=' in counts:
            threat, count = tuple(counts.split('='))
            for i in range(int(count)):
                print '%s%s%s}' % (record, sig_str, threat.strip())

Sample data:

kensu = [x.strip() for x in """
    record=0, signature={api.threats.sql_injection=1}
    record=1, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
    record=2, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
""".split('\n')[1:-1]]

Output:

record=0, signature={api.threats.sql_injection}
record=1, signature={api.threats.sql_injection}
record=1, signature={api.threats.sql_injection}
record=1, signature={api.threats.sql_injection}
record=1, signature={api.threats.bot_access_control}
record=1, signature={api.threats.illegal_resource_access}
record=1, signature={api.threats.cross_site_scripting}
record=2, signature={api.threats.bot_access_control}
record=2, signature={api.threats.illegal_resource_access}
record=2, signature={api.threats.illegal_resource_access}
record=2, signature={api.threats.illegal_resource_access}
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you Stephen it works like a charm. Some free beers awaits you in Tokyo.
I definitly will after having 15 reputation.
Can't on my mobile, will try on a desktops.
"Thanks for the feedback! Votes cast by those with less than 15 reputation are recorded, but do not change the publicly displayed post score"
Thanks again, and good luck. Sadly I won't be in Toyko soon.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.