10

In bash I would like to extract part of many filenames and save that output to another file.

The files are formatted as coffee_{SOME NUMBERS I WANT}.freqdist.

#!/bin/sh
for f in $(find . -name 'coffee*.freqdist)

That code will find all the coffee_{SOME NUMBERS I WANT}.freqdist file. Now, how do I make an array containing just {SOME NUMBERS I WANT} and write that to file?

I know that to write to file one would end the line with the following.

  > log.txt

I'm missing the middle part though of how to filter the list of filenames.

3
  • 1
    You might want to take a look at the 'sed' command. Commented Sep 25, 2012 at 11:38
  • Actually no. I was querying Twitter for a clinical research project that involves comparing tweets from different locations. Twitter hung about 5% into searching through 40k zip codes. But, since I loaded the zipcodes as a dictionary in Python (and so unordered), I only have the output files labeled by zipcode to figure out which zip codes I already searched at. I figured this was a good reason to learn something about shell scripting rather than doing it in Python. Commented Sep 25, 2012 at 11:40
  • Actually no = in response to Piort's HW comment. Commented Sep 25, 2012 at 11:40

4 Answers 4

17

You can do it natively in bash as follows:

filename=coffee_1234.freqdist
tmp=${filename#*_}
num=${tmp%.*}
echo "$num"

This is a pure bash solution. No external commands (like sed) are involved, so this is faster.

Append these numbers to a file using:

echo "$num" >> file

(You will need to delete/clear the file before you start your loop.)

Sign up to request clarification or add additional context in comments.

Comments

7

If the intention is just to write the numbers to a file, you do not need find command:

ls coffee*.freqdist
coffee112.freqdist  coffee12.freqdist  coffee234.freqdist

The below should do it which can then be re-directed to a file:

$ ls coffee*.freqdist | sed 's/coffee\(.*\)\.freqdist/\1/'
112
12
234

Guru.

1 Comment

I meant to take out the leading underscore too so: 's/coffee_(.*)\.freqdist/\1/'.
1

The previous answers have indicated some necessary techniques. This answer organizes the pipeline in a simple way that might apply to other jobs as well. (If your sed doesn't support ‘;’ as a separator, replace ‘;’ with ‘|sed’.)

$ ls */c*; ls c*
 fee/coffee_2343.freqdist
 coffee_18z8.x.freqdist  coffee_512.freqdist  coffee_707.freqdist
$ find . -name 'coffee*.freqdist' | sed 's/.*coffee_//; s/[.].*//' > outfile
$ cat outfile 
 512
 18z8
 2343
 707

Comments

1

Expanding on this topic, let's say the filename has this format :

first_second_third_requiredText_fourth_fifth_sixth

To get only requiredText use the following :

filename=first_second_third_requiredText_fourth_fifth_sixth
removed_first_part=${filename#*_*_*_}
finalText=${removed_first_part%_*_*_*}

'#' starts from the beginning of the string, '%' from the end, '*' for any number of characters. This also works for full paths.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.