0

I am trying to extract path from a given file which meet some criteria: Example: I have a small file with contents something like :

contentsaasdf /net/super/file-1.txt othercontents...
data is in /sample/random/folder/folder2/file-2.txt  otherdata...
filename  /otherfile/other-3.txt somewording

I want to extract the path's from file which contain file-*.txt in it.

In above example, I need the below path's as output

/net/super/file-1.txt
/sample/random/folder/folder2/file-2.txt

Any suggestions with Python code ? I am trying regex. But facing issues with multiple folder's, etc. Something like:

 FileRegEx = re.compile('.*(file-\\d.txt).*', re.IGNORECASE|re.DOTALL)
1

3 Answers 3

1

You don't need .* just use character classes properly:

r'[\/\w]+file-[^.]+\.txt'

[\/\w]+ will match any combinations of word characters and /. And [^.]+ will match any combination of characters except dot.

Demo:

https://regex101.com/r/ytsZ0D/1

Note that this regex might be kind of general, In that case, if you want to exclude some cases you can use ^ within character class or another proper pattern, based on your need.

Sign up to request clarification or add additional context in comments.

2 Comments

But the path can contain spaces, digits, etc. As it is a file, sometimes the path could be absolute/relative or can contain variables like $HOME/path/file-1.txt
@KarthikJeganathan Indeed, I'm just trying to show you the door, but you're the one that has to walk through it ;-)
0

Assuming your filenames are white-space separated ...

\\s(\\S+/file-\\d+\\.txt)\\s
  • \\s - match a white-space character
  • \\S+ - matches one or more non-whitespace characters
  • \\d+ - matches one or more digits
  • \\. - turns the . into a non-interesting period, instead of a match any character

You can avoid the double backslashes using r'' strings:

r'\s(\S+/file-\d+\.txt)\s'

Comments

0

Try this:

import re

re.findall('/.+\.txt', s)
# Output: ['/net/super/file-1.txt', '/sample/random/folder/folder2/file-2.txt', '/otherfile/other-3.txt']

Output:

>>> import re
>>> 
>>> s = """contentsaasdf /net/super/file-1.txt othercontents...
... data is in /sample/random/folder/folder2/file-2.txt  otherdata...
... filename  /otherfile/other-3.txt somewording"""
>>> 
>>> re.findall('/.+\.txt', s)
['/net/super/file-1.txt', '/sample/random/folder/folder2/file-2.txt', '/otherfile/other-3.txt']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.