tag:blogger.com,1999:blog-3475407867476504331.post5176963416467216026..comments2025-05-23T00:17:32.913-07:00Comments on On Clouds, Poems, Python and more...: Celebrating the first year of IPython loggingGökhan Severhttp://www.blogger.com/profile/17112569021092421804noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-3475407867476504331.post-1283674945824383722010-06-15T09:00:06.450-07:002010-06-15T09:00:06.450-07:00GÖKHAN, you still updating your blog? - Ed from o...GÖKHAN, you still updating your blog?<br /><br />- Ed from over the pondAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-21419416720891869172010-06-02T07:39:15.655-07:002010-06-02T07:39:15.655-07:00...forgot to say, but to procdess all individual f......forgot to say, but to procdess all individual files at once to a single output files, try:<br /><br /> $ cat *.log | perl -ne &#39;print unless $s{$_}++&#39; &gt; filenameAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-69543462992659151002010-06-02T07:35:11.223-07:002010-06-02T07:35:11.223-07:00Output unique lines sans reorder: $ perl -ni.ori...Output unique lines sans reorder:<br /><br /> $ perl -ni.orig -e &#39;print unless $s{$_}++&#39; filename<br /><br />I like Python, but Perl is handy to have around...<br /><br />Thanks for the info!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-32236029042788432872010-04-21T20:58:17.947-07:002010-04-21T20:58:17.947-07:00Thanks RL. This is the solution that I have been l...Thanks RL. This is the solution that I have been looking for. My history file has shrunk down to ~14K lines from ~38, without losing its original order and replicates removed.Gökhan Severhttps://www.blogger.com/profile/17112569021092421804noreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-804694557637563622010-04-21T20:00:20.301-07:002010-04-21T20:00:20.301-07:00It&#39;s actually easy: iterate over the file, kee...It&#39;s actually easy: iterate over the file, keep a set of the lines you&#39;ve seen so far and write only those you haven&#39;t:<br /><br />seen = set()<br />with open(&#39;input.txt&#39;) as input:<br /> with open(&#39;output.txt&#39;, &#39;w&#39;) as out:<br /> for line in input:<br /> if line not in seen:<br /> seen.add(line)<br /> out.write(line)Ronan Lamyhttps://www.blogger.com/profile/04173981237572114389noreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-64923923914335047212010-04-18T15:16:40.319-07:002010-04-18T15:16:40.319-07:00a = open(&quot;history&quot;, &quot;r&quot;).readl...a = open(&quot;history&quot;, &quot;r&quot;).readlines()<br />b = set(a)<br /><br />does the same as yours. However neither with your solution nor with the set I couldn&#39;t write the result preserving the order I see on screen using IPython.<br /><br />h = open(&quot;new&quot;, &quot;w&quot;)<br />h.writelines(D) # or h.writelines(b)<br /><br />They show the same, write the same but in a different order than what is listed. It would be nice to preserve the order in the original file.Gökhan Severhttps://www.blogger.com/profile/17112569021092421804noreply@blogger.comtag:blogger.com,1999:blog-3475407867476504331.post-14322968205547709392010-04-18T12:01:12.438-07:002010-04-18T12:01:12.438-07:00Duplicate line numbers, quick &amp; dirty, conside...Duplicate line numbers, quick &amp; dirty, considering &#39;older&#39; lines as duplicates:<br /><br />D = dict()<br />dupes = list()<br /><br />for n, line in enumerate(open(&#39;pattern.txt&#39;, &#39;r&#39;).readlines()):<br /> if D.get(line): dupes.append(D[line])<br /> D[line] = nBenjaminhttps://www.blogger.com/profile/11979252184749589064noreply@blogger.com