-2

I have a table with around 100,000 articles.

Each article has a description.

short example:

ARTIKELNR DESCRIPTION
104009400031900 S235JRG2C+C EN 10277/10 80 x 8 mm Blanker Flachstahl
104009800002950 Werksattest nach EN 10204
105009400092360 EN 10060 Inox Rund Ø 28 mm WNr1.4034 warmgefertigt
105009400068571 90 mm WNr1.4404 warmgefertigt, EN 10060, geschmiedet
105009400004420 WNr1.4301 Inox Vierkantstahl 6 x 6 mm EN 10278, blankgezogen
105009400008800 WNr1.4301 Inox Band geschnitten 25 x 4 mm, Breitentol. 25 m
105009400068600 WNr1.4112 Inox Rund 8.00 mm nach EN 10088-3 + A+C+SL, gesch
105009400068700 WNr1.4301 Inox Rund 36.00 mm nach EN 10088-3, geschliffen, T
105009400068800 WNr1.4104 Inox Rund 25.00 mm nach EN 10088-3, geschliffen, T
105009400010035 WNr1.4404 Inox Vierkantstahl 30 x 30 mm, gezogen EN 10278

None have the same format.

What I would like to do is extract the "NORM" from the description.

So, I would like to have the following result

ARTIKELNR NORM
104009400031900 EN 10277/10
104009800002950 EN 10204
105009400092360 EN 10060
105009400068571 EN 10060
105009400004420 EN 10278
105009400008800
105009400068600 EN 10088-3
105009400068700 EN 10088-3
105009400068800 EN 10088-3
105009400010035 EN 10278
13
  • what sql system and what version do you use? Commented Mar 14, 2024 at 7:29
  • Microsoft SQL Server Standard (64-bit) v11.0.7507.2 I just do not have adminstrator rights to change the COMPATIBILITY_LEVEL Commented Mar 14, 2024 at 7:31
  • @ouboma your not altering table schema. I guess, just select doesn't require to change COMPATIBILITY_LEVEL permission. Correct me i'm wrong. Commented Mar 14, 2024 at 7:44
  • @502_Geek when i use string_split() i get this error: 'STRING_SPLIT' is not a recognized built-in function name which can be solved by changing the compatibility level. which i can't really do! Commented Mar 14, 2024 at 7:48
  • FYI, version 11 (2012) ran completely out of support almost 2 years ago, @ouboma . You really need to get on a recent, and supported, version sooner rather than later. Commented Mar 14, 2024 at 8:41

3 Answers 3

1

Please try the following solution leveraging SQL Server XML and XQuery functionality.

It will work starting from SQL Server 2012 onwards.

Notable points:

  • CROSS APPLY is tokenizing DESCRIPTION column as XML.
  • XQuery .query() method is looking for the "EN" token via FLWOR expression and the following token via XPath predicate ...r[position()=($pos, $pos + 1)].
  • XQuery .value() method gives us a desired output string.
  • REPLACE() function removes optional trailing comma.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, ARTIKELNR CHAR(15), DESCRIPTION NVARCHAR(1024));
INSERT INTO @tbl (ARTIKELNR, DESCRIPTION) VALUES
(104009400031770,N'725528 MOTORKONSOLE KM35 EN SENKKLAPPFENSTER WICSKY 3'), -- Exception case, no digits after EN
(104009400031900,N'S235JRG2C+C EN10088-3 80 x 8 mm Blanker Flachstahl'),    -- Exception case, no space after EN
(104009400031900,N'S235JRG2C+C EN 10277/10 80 x 8 mm Blanker Flachstahl'),
(104009800002950,N'Werksattest nach EN 10204'),
(105009400092360,N'EN 10060 Inox Rund Ø 28 mm WNr1.4034 warmgefertigt'),
(105009400068571,N'90 mm WNr1.4404 warmgefertigt, EN 10060, geschmiedet'),
(105009400004420,N'WNr1.4301 Inox Vierkantstahl 6 x 6 mm EN 10278, blankgezogen'),
(105009400008800,N'WNr1.4301 Inox Band geschnitten 25 x 4 mm, Breitentol. 25 m'),
(105009400068600,N'WNr1.4112 Inox Rund 8.00 mm nach EN 10088-3 + A+C+SL, gesch'),
(105009400068700,N'WNr1.4301 Inox Rund 36.00 mm nach EN 10088-3, geschliffen, T'),
(105009400068800,N'WNr1.4104 Inox Rund 25.00 mm nach EN 10088-3, geschliffen, T'),
(105009400010035,N'WNr1.4404 Inox Vierkantstahl 30 x 30 mm, gezogen EN 10278');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1);

SELECT *
    , REPLACE(c.query('
          for $x in /root/r[text()="EN"]
          let $pos := count(root/r[. << $x]) + 1
          return if (xs:int(substring((/root/r[$pos + 1]/text())[1],1,5)) instance of xs:int) then
                data(/root/r[position()=($pos, $pos + 1)])
            else data(/root/r[$pos])
        ').value('text()[1]', 'NVARCHAR(30)')
        ,',', '') AS NORM
FROM @tbl AS t
CROSS APPLY (SELECT PATINDEX('%EN[0-9][0-9][0-9][0-9][0-9]%', DESCRIPTION)) AS t2(pos)  -- to handle Exception cases
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
    REPLACE(IIF(pos > 0, STUFF(DESCRIPTION,pos + 2,0, SPACE(1)), DESCRIPTION), @separator, ']]></r><r><![CDATA[') + 
    --REPLACE(DESCRIPTION, @separator, ']]></r><r><![CDATA[') + 
    ']]></r></root>' AS XML)) AS t1(c)
ORDER BY ID;

Output

ID ARTIKELNR DESCRIPTION NORM
1 104009400031770 725528 MOTORKONSOLE KM35 EN SENKKLAPPFENSTER WICSKY 3 EN
2 104009400031900 S235JRG2C+C EN10088-3 80 x 8 mm Blanker Flachstahl EN 10088-3
3 104009400031900 S235JRG2C+C EN 10277/10 80 x 8 mm Blanker Flachstahl EN 10277/10
4 104009800002950 Werksattest nach EN 10204 EN 10204
5 105009400092360 EN 10060 Inox Rund Ø 28 mm WNr1.4034 warmgefertigt EN 10060
6 105009400068571 90 mm WNr1.4404 warmgefertigt, EN 10060, geschmiedet EN 10060
7 105009400004420 WNr1.4301 Inox Vierkantstahl 6 x 6 mm EN 10278, blankgezogen EN 10278
8 105009400008800 WNr1.4301 Inox Band geschnitten 25 x 4 mm, Breitentol. 25 m NULL
9 105009400068600 WNr1.4112 Inox Rund 8.00 mm nach EN 10088-3 + A+C+SL, gesch EN 10088-3
10 105009400068700 WNr1.4301 Inox Rund 36.00 mm nach EN 10088-3, geschliffen, T EN 10088-3
11 105009400068800 WNr1.4104 Inox Rund 25.00 mm nach EN 10088-3, geschliffen, T EN 10088-3
12 105009400010035 WNr1.4404 Inox Vierkantstahl 30 x 30 mm, gezogen EN 10278 EN 10278
Sign up to request clarification or add additional context in comments.

8 Comments

hello again. if i have a description like: (725528 MOTORKONSOLE KM35 EN SENKKLAPPFENSTER WICSKY 3) the result will be (EN SENKKLAPPFENSTER) which is wrong, any workaround for that? appreciate it!
What would be a correct result for such scenario?
it should be EN only.. I am only interested in EN followed by some digits.
I adjusted the answer. Check it out.
It is checking for the first 5 chars as digits in the following token.
|
-1

This can be a hint:

SELECT 
    ARTIKELNR,
    CASE 
        WHEN CHARINDEX('EN ', DESCRIPTION) > 0 THEN
            SUBSTRING(DESCRIPTION, CHARINDEX('EN ', DESCRIPTION), PATINDEX('%[0-9/.]%', SUBSTRING(DESCRIPTION, CHARINDEX('EN ', DESCRIPTION), LEN(DESCRIPTION))))
        ELSE
            ''
    END AS NORM
FROM 
    YourTableName

This query assumes that "EN " followed by digits, dots, or slashes uniquely identifies the start of the NORM.

3 Comments

yeah but for the first artikelnr: 104009400031900 it returned only EN 1 as norm. however, I want EN 10277/10. any idea how to achieve that? thanks
I guess this could solve your issue %[0-9]+[-/]?[0-9]*%
@502_Geek not really. both scripts give a blank for artikelnr: 105009400068800
-1

In the most recent versions of SQL Server, this is actually somewhat simple with builtin functionality. YOu can use STRING_SPLIT to split the value, and ensure you return the ordinal positions so that you can use LEAD or LAG get the next/last delimited value and filter to your EN values. Finally you can use TRIM to remove those trailing commas:

WITH CTE AS(
    SELECT V.ARTIKELNR,
           V.DESCRIPTION,
           SS.value,
           LEAD(SS.value) OVER (PARTITION BY V.ARTIKELNR ORDER BY SS.ordinal) AS NextValue
    FROM (VALUES(104009400031900,N'S235JRG2C+C EN 10277/10 80 x 8 mm Blanker Flachstahl'),
                (104009800002950,N'Werksattest nach EN 10204'),
                (105009400092360,N'EN 10060 Inox Rund Ø 28 mm WNr1.4034 warmgefertigt'),
                (105009400068571,N'90 mm WNr1.4404 warmgefertigt, EN 10060, geschmiedet'),
                (105009400004420,N'WNr1.4301 Inox Vierkantstahl 6 x 6 mm EN 10278, blankgezogen'),
                (105009400008800,N'WNr1.4301 Inox Band geschnitten 25 x 4 mm, Breitentol. 25 m'),
                (105009400068600,N'WNr1.4112 Inox Rund 8.00 mm nach EN 10088-3 + A+C+SL, gesch'),
                (105009400068700,N'WNr1.4301 Inox Rund 36.00 mm nach EN 10088-3, geschliffen, T'),
                (105009400068800,N'WNr1.4104 Inox Rund 25.00 mm nach EN 10088-3, geschliffen, T'),
                (105009400010035,N'WNr1.4404 Inox Vierkantstahl 30 x 30 mm, gezogen EN 10278'))V(ARTIKELNR,DESCRIPTION)
         CROSS APPLY STRING_SPLIT(V.DESCRIPTION,' ',1) SS)
SELECT C.ARTIKELNR,
       TRIM(',' FROM CONCAT_WS(' ',C.value,C.NextValue)) AS Norm
FROM CTE C
WHERE C.value = 'EN';

db<>fiddle

In 2012 (reminder, this became completely unsupported in 2022) it's a little harder. You can replace STRING_SPLIT with a different non-builtin splitter that returns the ordinal position, such as DelimitedN4KSplit_LEAD (N4K as your strings appear to be an nvarchar). You still have access to LEAD, but not TRIM, so you'll need to use REPLACE and RTRIM. This gives you something like this:

WITH CTE AS(
    SELECT V.ARTIKELNR,
           V.DESCRIPTION,
           DS.item,
           LEAD(DS.Item) OVER (PARTITION BY V.ARTIKELNR ORDER BY DS.ItemNumber) AS nextitem
    FROM (VALUES(104009400031900,N'S235JRG2C+C EN 10277/10 80 x 8 mm Blanker Flachstahl'),
                (104009800002950,N'Werksattest nach EN 10204'),
                (105009400092360,N'EN 10060 Inox Rund Ø 28 mm WNr1.4034 warmgefertigt'),
                (105009400068571,N'90 mm WNr1.4404 warmgefertigt, EN 10060, geschmiedet'),
                (105009400004420,N'WNr1.4301 Inox Vierkantstahl 6 x 6 mm EN 10278, blankgezogen'),
                (105009400008800,N'WNr1.4301 Inox Band geschnitten 25 x 4 mm, Breitentol. 25 m'),
                (105009400068600,N'WNr1.4112 Inox Rund 8.00 mm nach EN 10088-3 + A+C+SL, gesch'),
                (105009400068700,N'WNr1.4301 Inox Rund 36.00 mm nach EN 10088-3, geschliffen, T'),
                (105009400068800,N'WNr1.4104 Inox Rund 25.00 mm nach EN 10088-3, geschliffen, T'),
                (105009400010035,N'WNr1.4404 Inox Vierkantstahl 30 x 30 mm, gezogen EN 10278'))V(ARTIKELNR,DESCRIPTION)
         CROSS APPLY fn.DelimitedSplitN4K_LEAD(V.DESCRIPTION,' ') DS)
SELECT C.ARTIKELNR,
       RTRIM(REPLACE(CONCAT(C.item,' ',C.nextitem),',',' ')) AS Norm
FROM CTE C
WHERE C.item = 'EN';

7 Comments

thanks but DelimitedN4KSplit_LEAD is not supported
What do you mean it's not supported, @ouboma ? YOu said you're on SQL Server v11.0.7507.2, which is 2012; DelimitedN4KSplit_LEAD does work on 2012. Is your database in a lower compatibility than 110?
If you are using a lower compatibility level, @ouboma , you can use DelimitedSpitN4k if you're in level 100. If it's 90 or below, you really need to talk to your DBA; that's SQL Server 2005 compatibility levels! YOu also should have mentioned in the question that we were limited to SQL Server 2005 compatibility; some people don't have access to such environments anymore.
the compatibility of the database is 110, but when i run your code, the first error i get is 'CONCAT_WS' is not recognized as the name of an integrated function.
Huh, I thought CONCAT_WS was added in 2012 for some reason, it was 2017. Well, just switch to CONCAT then, @ouboma ; that's not a difficult change (I would have hoped you could work that one out).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.