1

I am using regex to get specyfic information from string. Value of string would look like:

\subpath1\subpath2\subpathn\4xxxx_2xxxx\filename.extension
//there can be many subpath and x is allways number, last part of path is allways number_number            
//and it starts with 4 and last part is allways files with extension
//so I want to exclude path for example 4xxxx_xxxx/path/file.extension

So far using regex I came up wityh this construction (?<=\)(4[0-9])_([0-9]).?." but:

  • Last part takes string as it is no matter if it is "sasas" or "sasas.sas"
  • I do not know if it fills all my requirements

Any suggestions on this one?

6
  • 1
    Try (?<=[\\/])(4[0-9]*)_([0-9]*)/[^/]+\.\w+, add $ at the end of the pattern if the match is always at the end of string. Commented Jun 13, 2022 at 10:14
  • "//so I want to exclude path for example 4xxxx_xxxx/path/file.extension" how does the directory path intrude in there? you want to smuggle it inside? Commented Jun 13, 2022 at 10:25
  • 1
    nope result should be 4xxxx_xxxx/file.extension Commented Jun 13, 2022 at 10:34
  • 1
    One small change to Wiktor answer : (?<=[\\/])([^4][0-9]*)_([0-9]*)/[^/]+\.\w+ Commented Jun 13, 2022 at 10:56
  • I made a mistake in slashes direction Commented Jun 13, 2022 at 11:10

4 Answers 4

2

You can use

(?<=\\)(4[0-9]*)_([0-9]*)\\[^\\]+\.\w+

See the regex demo.

Details:

  • (?<=\\) - a positive lookbehind that requires a \ char to appear immediately to the left of the current location
  • (4[0-9]*) - Group 1: 4 and then zero or more ASCII digits
  • _ - an underscore
  • ([0-9]*) - Group 2: any zero or more ASCII digits
  • \\ - a \ char
  • [^\\]+ - one or more chars other than \
  • \. - a dot
  • \w+ - one or more word chars.
Sign up to request clarification or add additional context in comments.

Comments

1

Here is an alternative approach:

string path = "subpath1/subpath2/subpathn/41234_23456/excludePath/filename.extension";
string importantDirectory = path.Split('/').First(x => Regex.IsMatch(x, @"4\d+_\d+"));
string fileName = Path.GetFileName(path);
string result = Path.Combine(importantDirectory, fileName);
Console.WriteLine(result);

41234_23456\filename.extension

Comments

0

A. 4 Numbers = [0-9]{4} OR \d{4} OR \d\d\d\d If the number can be short or long, use + for "one or more": \d+_\d+

B. The path delimiter in the example is a backslash, and in the comment example a slash. both of them need escap with a backslash before, use [\/\\] for all format.

C. if the file name must have an extension, the expression need one or more valid file character, dot, and again one or more valid file character. such as \w+\.\w+ use \b to ensure the end of a string/path.

Note that a valid file name varies from system to system (Mac or Windows for example), And is in any case wider than \w which includes only a-zA-Z0-9_.

My suggestin:

\d+_\d+[\/\\]\w+\.\w+\b

https://regex101.com/r/Ed2H0u/1

C# code:

    var textInput = @"
\subpath1\subpath2\subpathn\4123_21253\filename.extension
\subpath2\subpathn\4123_21253\subpathn\filename.extension
";

    var matches = Regex.Matches(textInput, @"\b[\w\/\\]+[\/\\](\d+_\d+)[\/\\](\w+\.\w+)\b");
    foreach (Match element in matches)
    {
        Console.WriteLine("Path: " + element.Value);
        Console.WriteLine("Number: " + element.Groups[1].Value);
        Console.WriteLine("FileName: " + element.Groups[2].Value);
    }

https://dotnetfiddle.net/V87CKc

Comments

0

Because the pattern 4XXX_2.... is unique, just search on that. All we have to do is look for a "\4", then just ignore the "\" in the final output. Here is one way:

 \\(?<PostUrl>4[^_]+_2.+)

will get what you need into a match. We are using "Named Captures" (?<{Name Here}> ) so the match structure has this information:

Match #0
                [0]:  \4xxxx_2xxxx\Extra\filename.extension
  ["PostUrl"] → [1]:  4xxxx_2xxxx\Extra\filename.extension
        →1 Captures:  4xxxx_2xxxx\Extra\filename.extension

So we can get the match "4xxxx_2xxxx\Extra\filename.extension" by either

myMatch.Groups["PostUrl"].Value.ToString() or myMatch.Groups[1].Value.ToString()


If there is a pureist out there that says, but there could be a proceeding "\4..." pattern, then specify the regex option RightToLeft to ensure that it finds the "last", "4X" pattern.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.