0

I need to find State abbreviations in column name my_column. This column can contain values like these

John Smith of AZ --> Match

John Smith of AZ(Tucson)' --> Match

AZ John Smith --> Match

John Smith AZ for Tucson --> Match

Utah Jazz --> Don't Match

Azyme --> Don't Match

'Hazy --> Don't Match

I tried using CASE expressions with CHARINDEX and LIKE to do all this matching, but it is getting super-ugly. I wonder if there is a better way.

I asked our DBA to install Full-Text Index to see if I can do something better with CONTAINS, but not sure if it really helps.

5
  • 3
    You could use a regular expression like \b(AL|AK|AZ|AR|...|WI|WY)\b. \b matches word boundaries, so it won't match Jazz or Azyme Commented Feb 6 at 23:49
  • 1
    SQL Server doesn't support Regex, @Barmar . (Unless you're from the future as SQL Server 2025 apparently will.) Commented Feb 7 at 9:28
  • 2
    Something like ' ' + column + ' ' like '%[ (),]AZ[ (),]%' for multiple abbrev you can just join a table containing those Commented Feb 7 at 11:00
  • 2016 is a pain here. Later versions do have functionality that would make this less cumbersome even in the absence of Regex (such as TRANSLATE that could be used to replace all the word boundary characters with a consistent character prior to splitting) Commented Feb 7 at 11:30
  • You'll likely find a solution that will work, but when you do it won't perform well. This isn't the kind of work database servers were meant to do. You're BEST option is to capture this data at insert/update time, and use the client language to do the parsing first, along with handling the parsing for the conversion to the fixed schema. Commented Feb 7 at 14:40

2 Answers 2

1

How about something like

declare @testcases table (testval varchar(50));

insert into @testcases 
values 
('John Smith of AZ'),
('John Smith of AZ(Tucson)'),
('AZ John Smith'),
('John Smith AZ for Tucson'),
('Utah Jazz'),
('Azyme'),
('Hazy')

select PATINDEX('%[^A-Z]AZ[^A-Z]%',testval) + PATINDEX('AZ[^A-Z]%',testval) + PATINDEX('%[^A-Z]AZ',testval)
from @testcases;

It'll match anything containing AZ, that doesn't border an alpha-numeric character. Mind you, this isn't very performant, but it will work.

Also it only works for Arizona. To find the other 49 states you'd have to scan again. I would probably try to extract the state in your application before inserting into the database.

Sign up to request clarification or add additional context in comments.

1 Comment

Hey mikkel, thank you for your answer! I ended up incorporating another idea from @john-joseph and added a space before and after my column so instead of the 3 PATINDEX, I just kept the first one from your code. ` declare @testcases table (testval varchar(50)); insert into @testcases values ('John Smith of AZ'), ('John Smith of AZ(Tucson)'), ('AZ John Smith'), ('John Smith AZ for Tucson'), ('Utah Jazz'), ('Azyme'), ('Hazy') select PATINDEX('%[^A-Z]AZ[^A-Z]%',' ' + testval + ' ') from @testcases; ` Thank you!
1

Very generally, replace every non-letter character with a space, then add a space before and after the whole string, then search for your two-letter state code by padding it with a space at the beginning and end. Your examples would turn into:

' John Smith of AZ '--> Matches with '% AZ %'

' John Smith of AZ Tucson ' --> Matches with '% AZ %'

' AZ John Smith '--> Matches with '% AZ %'

' John Smith AZ for Tucson ' --> Matches with '% AZ %'

' Utah Jazz ' --> Don't Match

' Azyme ' --> Don't Match

' Hazy ' --> Don't Match

As you can see, the state code will always be isolated in its own space.

Why go to the trouble of replacing all non-letter characters with spaces? So that the following are also handled:

'John Smith lives in AZ.'

'John Smith loves AZ!'

'AZ? John Smith lives there.'

'John Smith does not live in California. (He lives in AZ)'

'John Smith has to drive through ID, WA, OR, and CA to get to AZ.'

For help on removing the non-letter characters, see here: How to strip all non-alphabetic characters from string in SQL Server?

1 Comment

If you showed the actual SQL to accomplish that I would consider it a complete answer... as it stands I consider it helpful suggestions. (and is I assume is why the other answer was accepted).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.