0

Can you please help me identify a regex pattern for the below set of strings

1921 abc abc abc 1k

4320 abcs abc Apt 201b

1250 abcd Ave Apt 3c

61a abcd Ave

1b abcd Ave

39r abcd Rd

16w750 abcd Ave

abc 12a

The ask is to identify if a sentence contain a single character before, after or in between digits.

\d[A-Za-z]\d|\d[A-Za-z]|^[A-Za-z]\d

This is something I tried, but didn't work :)

Only a single character should be present.There can be many numbers.

5
  • Try ([a-zA-Z]+[0-9]|[0-9]+[a-zA-Z])[a-zA-Z0-9]* or (?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z])[a-zA-Z0-9]* Commented Jan 29, 2024 at 13:19
  • 4320 abc Blvd Apt 201b - It should work 4320 abc Blvd Apt 201bb - It should not work Commented Jan 29, 2024 at 13:26
  • Then maybe \b\d+[a-z]\d*\b - regex101.com/r/cTUJ4y/1 Commented Jan 29, 2024 at 13:30
  • Just read you are probably using the string literals in Informatica, so you will need to double backslashes, i.e. \\b\\d+[a-z]\\d*\\b. Commented Jan 29, 2024 at 13:35
  • You need to keep in mind how regular expressions work in Informatica. It's not the general, common way. You need to write a pattern that will always match the full input, divide it into sequences and extract the desired sequence. Read the docs: docs.informatica.com/data-integration/powercenter/10-5/… Examples provided by @WiktorStribiżew are working fine, just not the Informatica-way. The difference is, they match a part of the input. Commented Jan 30, 2024 at 8:56

3 Answers 3

1

Just try

\b(\d*[a-zA-Z]\d+|\d+[a-zA-Z]\d*)\b

You will need to match it and then extract the second matching group (the middle one) You can check it here.

Sign up to request clarification or add additional context in comments.

Comments

0

You didn't really supply a precise definition but this does match your pattern:

\b(?=\d+[a-zA-Z][0-9]*)([0-9a-zA-Z]*)

Demo

If there is a possibility of non digits starting (which is not in your example) then use an alteration:

\b(?=(?:[a-zA-Z]\d+[a-zA-Z]*)|(?:\d+[a-zA-Z]\d*))[a-zA-Z0-9]*

Demo

6 Comments

There can be only one character and n numbers , like a1, 1a1, 11a, 123a2312, 112a
Remove the + then.
There can be non digits starting.
I think the edit captured you comments. Try it!
\b(?=(?:[a-zA-Z]\d+[a-zA-Z]*)|(?:\d+[a-zA-Z]\d*))[a-zA-Z0-9]* is picking "a123def",16w750RST also even after removing +
|
0

The REG_MATCH function needs the whole string to be matched, so you can use

REG_MATCH( subject, '.*\b\d+[a-z]\d*\b.*' )

Details:

  • .* - any zero or more characters other than line break chars as many as possible
  • \b - a word boundary
  • \d+ - one or more digits
  • [a-z] - a lowercase ASCII letter
  • \d* - zero or more digits
  • \b - a word boundary
  • .* - any zero or more characters other than line break chars as many as possible

See the regex demo.

1 Comment

When using REG_EXTRACT to actually get the value, you will need a capturing group, i.e. REG_MATCH( subject, '.*\b(\d+[a-z]\d*)\b.*' ) (by default, the first group value will be extracted, else, provide the group number as the third argument).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.