REG_MATCH function in Informatica - Identify a pattern

Question

Can you please help me identify a regex pattern for the below set of strings

1921 abc abc abc 1k

4320 abcs abc Apt 201b

1250 abcd Ave Apt 3c

61a abcd Ave

1b abcd Ave

39r abcd Rd

16w750 abcd Ave

abc 12a

The ask is to identify if a sentence contain a single character before, after or in between digits.

\d[A-Za-z]\d|\d[A-Za-z]|^[A-Za-z]\d

This is something I tried, but didn't work :)

Only a single character should be present.There can be many numbers.

Try ([a-zA-Z]+[0-9]|[0-9]+[a-zA-Z])[a-zA-Z0-9]* or (?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z])[a-zA-Z0-9]* — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jan 29, 2024 at 13:19
4320 abc Blvd Apt 201b - It should work 4320 abc Blvd Apt 201bb - It should not work — Boni
– Boni, Commented Jan 29, 2024 at 13:26
Just read you are probably using the string literals in Informatica, so you will need to double backslashes, i.e. \\b\\d+[a-z]\\d*\\b. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jan 29, 2024 at 13:35
You need to keep in mind how regular expressions work in Informatica. It's not the general, common way. You need to write a pattern that will always match the full input, divide it into sequences and extract the desired sequence. Read the docs: docs.informatica.com/data-integration/powercenter/10-5/… Examples provided by @WiktorStribiżew are working fine, just not the Informatica-way. The difference is, they match a part of the input. — Maciejg
– Maciejg, Commented Jan 30, 2024 at 8:56

Luis Colorado · Accepted Answer · 2024-02-02 20:28:33Z

1

Just try

\b(\d*[a-zA-Z]\d+|\d+[a-zA-Z]\d*)\b

You will need to match it and then extract the second matching group (the middle one) You can check it here.

edited Feb 2, 2024 at 20:28

answered Feb 2, 2024 at 19:54

Luis Colorado

13.2k1 gold badge19 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

dawg · Accepted Answer · 2024-01-29 13:36:20Z

0

You didn't really supply a precise definition but this does match your pattern:

\b(?=\d+[a-zA-Z][0-9]*)([0-9a-zA-Z]*)

Demo

If there is a possibility of non digits starting (which is not in your example) then use an alteration:

\b(?=(?:[a-zA-Z]\d+[a-zA-Z]*)|(?:\d+[a-zA-Z]\d*))[a-zA-Z0-9]*

Demo

edited Jan 29, 2024 at 13:36

answered Jan 29, 2024 at 13:21

dawg

105k24 gold badges143 silver badges217 bronze badges

6 Comments

Boni Over a year ago

There can be only one character and n numbers , like a1, 1a1, 11a, 123a2312, 112a

dawg Over a year ago

Remove the + then.

Boni Over a year ago

There can be non digits starting.

dawg Over a year ago

I think the edit captured you comments. Try it!

Boni Over a year ago

\b(?=(?:[a-zA-Z]\d+[a-zA-Z]*)|(?:\d+[a-zA-Z]\d*))[a-zA-Z0-9]* is picking "a123def",16w750RST also even after removing +

|

Wiktor Stribiżew · Accepted Answer · 2024-01-30 09:21:43Z

0

The REG_MATCH function needs the whole string to be matched, so you can use

REG_MATCH( subject, '.*\b\d+[a-z]\d*\b.*' )

Details:

.* - any zero or more characters other than line break chars as many as possible
\b - a word boundary
\d+ - one or more digits
[a-z] - a lowercase ASCII letter
\d* - zero or more digits
\b - a word boundary
.* - any zero or more characters other than line break chars as many as possible

See the regex demo.

answered Jan 30, 2024 at 9:21

Wiktor Stribiżew

631k41 gold badges503 silver badges633 bronze badges

1 Comment

Wiktor Stribiżew Over a year ago

When using REG_EXTRACT to actually get the value, you will need a capturing group, i.e. REG_MATCH( subject, '.*\b(\d+[a-z]\d*)\b.*' ) (by default, the first group value will be extracted, else, provide the group number as the third argument).

Collectives™ on Stack Overflow

REG_MATCH function in Informatica - Identify a pattern

3 Answers 3

Comments

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related