tag:blogger.com,1999:blog-3475407867476504331.post5176963416467216026..comments2025-05-23T00:17:32.913-07:00Comments on On Clouds, Poems, Python and more...: Celebrating the first year of IPython loggingGökhan Severhttp://www.blogger.com/profile/17112569021092421804noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-3475407867476504331.post-1283674945824383722010-06-15T09:00:06.450-07:002010-06-15T09:00:06.450-07:00GÖKHAN, you still updating your blog?
- Ed from o...GÖKHAN, you still updating your blog?<br /><br />- Ed from over the pondAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-21419416720891869172010-06-02T07:39:15.655-07:002010-06-02T07:39:15.655-07:00...forgot to say, but to procdess all individual f......forgot to say, but to procdess all individual files at once to a single output files, try:<br /><br /> $ cat *.log | perl -ne 'print unless $s{$_}++' > filenameAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-69543462992659151002010-06-02T07:35:11.223-07:002010-06-02T07:35:11.223-07:00Output unique lines sans reorder:
$ perl -ni.ori...Output unique lines sans reorder:<br /><br /> $ perl -ni.orig -e 'print unless $s{$_}++' filename<br /><br />I like Python, but Perl is handy to have around...<br /><br />Thanks for the info!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-32236029042788432872010-04-21T20:58:17.947-07:002010-04-21T20:58:17.947-07:00Thanks RL. This is the solution that I have been l...Thanks RL. This is the solution that I have been looking for. My history file has shrunk down to ~14K lines from ~38, without losing its original order and replicates removed.Gökhan Severhttps://www.blogger.com/profile/17112569021092421804noreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-804694557637563622010-04-21T20:00:20.301-07:002010-04-21T20:00:20.301-07:00It's actually easy: iterate over the file, kee...It's actually easy: iterate over the file, keep a set of the lines you've seen so far and write only those you haven't:<br /><br />seen = set()<br />with open('input.txt') as input:<br /> with open('output.txt', 'w') as out:<br /> for line in input:<br /> if line not in seen:<br /> seen.add(line)<br /> out.write(line)Ronan Lamyhttps://www.blogger.com/profile/04173981237572114389noreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-64923923914335047212010-04-18T15:16:40.319-07:002010-04-18T15:16:40.319-07:00a = open("history", "r").readl...a = open("history", "r").readlines()<br />b = set(a)<br /><br />does the same as yours. However neither with your solution nor with the set I couldn't write the result preserving the order I see on screen using IPython.<br /><br />h = open("new", "w")<br />h.writelines(D) # or h.writelines(b)<br /><br />They show the same, write the same but in a different order than what is listed. It would be nice to preserve the order in the original file.Gökhan Severhttps://www.blogger.com/profile/17112569021092421804noreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-14322968205547709392010-04-18T12:01:12.438-07:002010-04-18T12:01:12.438-07:00Duplicate line numbers, quick & dirty, conside...Duplicate line numbers, quick & dirty, considering 'older' lines as duplicates:<br /><br />D = dict()<br />dupes = list()<br /><br />for n, line in enumerate(open('pattern.txt', 'r').readlines()):<br /> if D.get(line): dupes.append(D[line])<br /> D[line] = nBenjaminhttps://www.blogger.com/profile/11979252184749589064noreply@blogger.com