0

I have the following string

[1] weight | width | depth | 5.0 cm | 6.0 mm^2 | 10.12 cm^3

From that I need to extract the name, value and units from the above string like below

name = weight
value = 5.0
unit = cm

name = width
value = 6.0
unit = cm^2

name = depth
value = 10.12
unit = cm^3

I have the following regexes for each match cases. Individually each one is working as expected. But combining the regex is needed, so it will return the expected match cases. I tried just combining them all and also using |. But not worked. Here is the working regex for individual matches

For Name : (?<name>\b\w+(?:[\w]\w+)+\b)
For Value : (?<![\^])(?<value>[+-]?[0-9]+(?:\.[0-9]+)?)(?!\S)
For Unit : \b[0-9]+(?:\.[0-9]+)?[^\S\r\n]+(?<unit>[^0-9\s]\S*)(?:[^\S\r\n]+\||$)

Can anyone help me on this. Thanks

3
  • Will your data always be in this format and sequence? Commented Jun 23, 2022 at 6:33
  • Yes. It will always contains name, value and unit fields separated by space and pipes(|) Commented Jun 23, 2022 at 6:36
  • Another sample text : [2] height | weight | 162 cm | 60 kg Commented Jun 23, 2022 at 6:37

3 Answers 3

4

If there are the same amount of pipes, you can use a capture group for name, and capture value and unit in a lookahead:

(?<!\S)(?<name>\w+)(?=(?:[^|]*\|){3}\s*\b(?<value>[0-9]+(?:\.[0-9]+)?)\s+(?<unit>[^0-9\s]\S*))

Regex demo

Sign up to request clarification or add additional context in comments.

9 Comments

To make this work for OP I think we need to think of the {3} as being {n} rather where OP would find the occurence of the '|' in the string first and devide by 2 (round up to 1st integer in case of just a single pipe).
@JvdV Yes if you have 2 occurrences of the pipe, then the n will be 2. If this is dynamic, you can use logic like you described, or use split and do afterprocessing.
Yeah, looking at the samples OP provided (in question and comment) the latter (split) may also be a good alternative.
Thanks for the answers. The format is always same. But the data group count may differ. The string I provided in the question has 3 groups. But the group count may differ. It should support any group count. Like below [1] ABC | XYZ | 6.9 mm | 194 mm^3
Enjoy wherever you are! Not too much SO during holidays OK =)
|
2

Just for reference on how you could use the pattern provided by @TheFourthBird

using System;
using System.Text.RegularExpressions;
using System.Linq;
                    
public class Program
{
    public static void Main()
    {
        string s = "[1] weight | width | depth | 5.0 cm | 6.0 mm^2 | 10.12 cm^3";
        int n = s.Split('|').Length / 2;
        string pat = @"(?<!\S)(?<name>\w+)(?=(?:[^|]*\|){" + n + @"}\s*\b(?<value>[0-9]+(?:\.[0-9]+)?)\s+(?<unit>[^0-9\s]\S*))";
        
        var ItemRegex = new Regex(pat, RegexOptions.Compiled);
        var OrderList = ItemRegex.Matches(s)
                            .Cast<Match>()
                            .Select(m => new
                            {
                                Name = m.Groups["name"].ToString(),
                                Value = Convert.ToDouble(m.Groups["value"].ToString()),
                                Unit = m.Groups["unit"].ToString(),
                            })
                            .ToList();
        Console.WriteLine(String.Join("; ", OrderList));
    }
}

Prints:

{ Name = weight, Value = 5, Unit = cm }; { Name = width, Value = 6, Unit = mm^2 }; { Name = depth, Value = 10.12, Unit = cm^3 }

Give it a go with other samples here


Note: By no means am I an c# developer. I just so happen to adjust code found here on SO to showcase how the answer given by TheFourthBird could work.

3 Comments

Thanks for sharing the code. I already have a logic for parsing the match groups(That is fixed and cannot change at this moment). And in that the regex is reading from a json file. So I can easily change the regex for supporting the text string. So is it possible to modify the regex without explicitly specifying the value of 'n'
@AneeshNarayanan, not possible through regex afaik.
Hm. Thanks for the update. I was thinking that, if the match grouping is working individually without specifying the 'n' value, combining them in any way will work ?
1

Use this regex to capture the corresponding groups

\[\d+\]\s(\w+)\s\|\s(\w+)\s\|\s(\w+)\s\|\s(\S+)\s(\S+)\s\|\s(\S+)\s(\S+)\s\|\s(\S+)\s(\S+)

Then using substitution replace with

name = $1\nvalue = $4\nunit = $5\n\nname = $2\nvalue = $6\nunit = $7\n\nname = $3\nvalue = $8\nunit = $9

See the regex demo. Also, see C# demo.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.