Get unique values using STRING_AGG in SQL Server

Question

The following query returns the results shown below:

SELECT 
    ProjectID, newID.value
FROM 
    [dbo].[Data] WITH(NOLOCK)  
CROSS APPLY 
    STRING_SPLIT([bID],';') AS newID  
WHERE 
    newID.value IN ('O95833', 'Q96NY7-2')

Results:

ProjectID   value
---------------------
2           Q96NY7-2
2           O95833
2           O95833
2           Q96NY7-2
2           O95833
2           Q96NY7-2
4           Q96NY7-2
4           Q96NY7-2

Using the newly added STRING_AGG function (in SQL Server 2017) as it is shown in the following query I am able to get the result-set below.

SELECT 
    ProjectID,
    STRING_AGG( newID.value, ',') WITHIN GROUP (ORDER BY newID.value) AS 
NewField
FROM
    [dbo].[Data] WITH(NOLOCK)  
CROSS APPLY 
    STRING_SPLIT([bID],';') AS newID  
WHERE 
    newID.value IN ('O95833', 'Q96NY7-2')  
GROUP BY 
    ProjectID
ORDER BY 
    ProjectID

Results:

ProjectID   NewField
-------------------------------------------------------------
2           O95833,O95833,O95833,Q96NY7-2,Q96NY7-2,Q96NY7-2
4           Q96NY7-2,Q96NY7-2

I would like my final output to have only unique elements as below:

ProjectID   NewField
-------------------------------
2           O95833, Q96NY7-2
4           Q96NY7-2

Any suggestions about how to get this result? Please feel free to refine/redesign from scratch my query if needed.

So you have data stored as delimited values and now you want to split them, find distinct values and finally cram them all back into a delimited string? YUCK!!! Delimited data violates 1NF. That is why you are struggling so much here. You will have to use STUFF and FOR XML with DISTINCT thrown in to do this after you first split it. — Sean Lange
– Sean Lange, Commented May 29, 2018 at 16:38
Any simple example on how to use the STUFF and FOR XML with DISTINCT in my dataset? I can't avoid STRING_SPLIT as unfortunately the raw data is stored as delimited values as you realised. — gkoul
– gkoul, Commented May 29, 2018 at 16:44
And be careful with that NOLOCK hint. blogs.sentryone.com/aaronbertrand/bad-habits-nolock-everywhere — Sean Lange
– Sean Lange, Commented May 29, 2018 at 16:45

e-Fungus · Accepted Answer · 2024-03-14 21:47:30Z

73

Use the DISTINCT keyword in a subquery to remove duplicates before combining the results: SQL Fiddle

SELECT 
 ProjectID
,STRING_AGG(value, ',') WITHIN GROUP (ORDER BY value) AS 
NewField
FROM (
    SELECT DISTINCT 
      ProjectID
    , newId.value 
    FROM [dbo].[Data] WITH (NOLOCK)  
    CROSS APPLY STRING_SPLIT([bID],';') AS newId  
    WHERE newId.value IN (   'O95833' , 'Q96NY7-2'  )  
) x
GROUP BY ProjectID
ORDER BY ProjectID

edited Mar 14, 2024 at 21:47

e-Fungus

3434 silver badges18 bronze badges

answered May 29, 2018 at 16:43

JohnLBevan

24.8k16 gold badges107 silver badges201 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ms10 Feb 20 at 8:47

The problem with this solution is that the DISTINCT in the sub-select is across the rows rather than for each column. I have this issue: Col1Val1 | Col2Val1 Col1Val1 | Col2Val1 Col1Val1 | Col2Val2 Doing a DISTINCT on this dataset then STRING_AGG still means that Col1 outputs Val1 twice.

JohnLBevan Feb 20 at 9:26

@ms10 that sounds like a different problem; for your case you want to first select Col1 then union that with selecting Col2 so you have all values in different rows, then you can use the above. If you use union rather than union all you don't need the distinct, since union removes duplicates at the same time.

ttugates · Accepted Answer · 2021-07-07 19:14:44Z

20

This is a function that I wrote that answers the OP Title: Improvements welcome!

CREATE OR ALTER FUNCTION [dbo].[fn_DistinctWords]
(
  @String NVARCHAR(MAX)  
)
RETURNS NVARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
  DECLARE @Result NVARCHAR(MAX);
  WITH MY_CTE AS ( SELECT Distinct(value) FROM STRING_SPLIT(@String, ' ')  )
  SELECT @Result = STRING_AGG(value, ' ') FROM MY_CTE
  RETURN @Result
END
GO

Use like:

SELECT dbo.fn_DistinctWords('One Two      Three Two One');

answered Jul 7, 2021 at 19:14

ttugates

6,3724 gold badges48 silver badges58 bronze badges

Comments

Gordon Linoff · Accepted Answer · 2018-05-29 16:44:32Z

7

You can use distinct in the subquery used for the apply:

SELECT d.ProjectID,
       STRING_AGG(  newID.value, ',') WITHIN GROUP (ORDER BY newID.value) AS 
NewField
FROM [dbo].[Data] d CROSS APPLY
     (select distinct value
      from STRING_SPLIT(d.[bID], ';') AS newID 
     ) newID
WHERE newID.value IN (   'O95833' , 'Q96NY7-2'  ) 
group by projectid;

answered May 29, 2018 at 16:44

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

1 Comment

Sander de Jong Over a year ago

This is especially useful if you have more than one other column besides the one that needs to be split and aggregated.

Domagoj Peharda · Accepted Answer · 2022-04-12 08:17:33Z

7

Here is my improvement on @ttugates to make it more generic:

CREATE OR ALTER FUNCTION [dbo].[fn_DistinctList]
(
  @String NVARCHAR(MAX),
  @Delimiter char(1)
)
RETURNS NVARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
  DECLARE @Result NVARCHAR(MAX);
  WITH MY_CTE AS ( SELECT Distinct(value) FROM STRING_SPLIT(@String, 
@Delimiter)  )
  SELECT @Result = STRING_AGG(value, @Delimiter) FROM MY_CTE
  RETURN @Result
END

answered Apr 12, 2022 at 8:17

Domagoj Peharda

1431 silver badge6 bronze badges

1 Comment

Jay13 May 21 at 14:12

I think this is the most straight forward and flexible solution in the bunch. I made one small tweak by changing @ Delimiter to @ inDelimiter and added an @outDelimiter of nvarchar(5) so I could make the output string a little more readable with a comma space.

gil.fernandes · Accepted Answer · 2021-02-05 11:20:16Z

4

Another possibility to get unique strings from STRING_AGG would be to perform these three steps after fetching the comma separated string:

Split the string (STRING_SPLIT)
Select DISTINCT from the splits
Apply STRING_AGG again to a select with a group on a single key

Example:

(select STRING_AGG(CAST(value as VARCHAR(MAX)), ',') 
        from (SELECT distinct 1 single_key, value 
            FROM STRING_SPLIT(STRING_AGG(CAST(customer_division as VARCHAR(MAX)), ','), ',')) 
                q group by single_key) as customer_division

answered Feb 5, 2021 at 11:20

gil.fernandes

14.8k7 gold badges78 silver badges88 bronze badges

Comments

Martin Smith · Accepted Answer · 2023-12-16 19:01:52Z

4

For your particular case instead of exploding the values from the rows out and intermingling them and then needing to use GROUP BY to reassemble them you can just do the following (Fiddle).

SELECT ProjectId, 
      NewField = (SELECT  STRING_AGG( value, ',') WITHIN GROUP (ORDER BY value) FROM  (SELECT DISTINCT value FROM STRING_SPLIT(bID,';') WHERE value IN ('O95833', 'Q96NY7-2') )X)
FROM [data]

In the more general case - e.g. with the starting point in Darryl's answer you could use

 WITH T AS
 (
SELECT *,  
       ROW_NUMBER() OVER (PARTITION BY ProjectID, value ORDER BY ProjectID, value) AS RN
FROM #data d
)
SELECT ProjectID,
       SUM(Cost),
       STRING_AGG(CASE WHEN RN = 1 THEN value END, ',') WITHIN GROUP (ORDER BY value)
FROM T
GROUP BY ProjectID
ORDER BY ProjectID

This can use a single sort on ProjectID, value to both apply the row numbering and for the subsequent GROUP BY ProjectID and WITHIN GROUP (ORDER BY value)

Fiddle

edited Dec 16, 2023 at 19:01

answered Dec 16, 2023 at 18:50

Martin Smith

457k97 gold badges777 silver badges887 bronze badges

1 Comment

DJDave Apr 2 at 13:57

Wow, what a one-liner, Martin! Without really understanding what I'm doing, I was able to replace what I wanted (but MS haven't implemented) STRING_AGG(DISTINCT strOwner, ',') with (SELECT STRING_AGG( value, ',') WITHIN GROUP (ORDER BY value) FROM (SELECT DISTINCT value FROM STRING_SPLIT(STRING_AGG(strOwner, ','),',') ) X)

John Bustos · Accepted Answer · 2018-05-29 16:44:40Z

3

As @SeanLange pointed out in the comments, this is a terrible way to pull out the data, but if you had to, just make it 2 separate queries as follows:

SELECT 
    ProjectID
    ,STRING_AGG( val, ',') WITHIN GROUP (ORDER BY val) AS NewField
FROM
(
    SELECT DISTINCT 
        ProjectID
        ,newID.value AS val
    FROM 
        [dbo].[Data] WITH(NOLOCK)  
        CROSS APPLY STRING_SPLIT([bID],';') AS newID  
    WHERE 
        newID.value IN ('O95833' , 'Q96NY7-2') 
) t
GROUP BY
    ProjectID

That should do it.

answered May 29, 2018 at 16:44

John Bustos

19.7k18 gold badges102 silver badges162 bronze badges

Comments

Krzysztof Krysztofczyk · Accepted Answer · 2024-03-14 20:06:12Z

2

You can use that function to remove duplicates:

CREATE FUNCTION fn_DistinctSeparatedList (@InputString VARCHAR(MAX), @separator nvarchar(10))
RETURNS VARCHAR(MAX)
AS
BEGIN
    DECLARE @Items TABLE (Item VARCHAR(MAX));

    INSERT INTO @Items
    SELECT value 
    FROM STRING_SPLIT(replace(@InputString, @separator,'~'),'~')
    WHERE value IS NOT NULL AND value != '';

    WITH DistinctItems AS (
        SELECT DISTINCT Item AS Item
        FROM @Items
    )

    SELECT @InputString = STRING_AGG(Item, '~') 
    FROM DistinctItems;
    
    RETURN Replace(@InputString,'~',@separator);
END

you can use that this way: let's create the table to have some data:

drop table if exists #PetsOwner
Select 'Olivier' as Person, 'Cat' as Pet, 'Charlie' as PetName
into #PetsOwner
union
Select 'Olivier' as Person, 'cat' as Pet, 'Luna' as PetName
union
Select 'Olivier' as Person, 'Cat' as Pet, 'Cooper '  as PetName
union 
Select 'Leo' as Person, 'Cat' as Pet, 'Daisy'  as PetName
union 
Select 'Leo' as Person, 'Dog' as Pet, 'Milo'  as PetName
union
Select 'Michael' as Person, 'Fish' as Pet, 'Max'  as PetName

And now we can aggregate with duplicates:

select Person, STRING_AGG(Pet, ', ')
from #PetsOwner
group by Person

Or without duplicates with the usage of that function:

select Person, fn_DistinctSeparatedList(STRING_AGG(Pet, ', '),', ')
from #PetsOwner
group by Person

answered Mar 14, 2024 at 20:06

Krzysztof Krysztofczyk

5091 gold badge9 silver badges24 bronze badges

3 Comments

Jo G Over a year ago

Not sure how this would affect performance for large datasets, but it is certainly the simplest solution for my needs. Makes the SQL much easier to read and the function can be reused.

Krzysztof Krysztofczyk Over a year ago

@JoG I'm almost certain that this is not the best approach in terms of performance. As you mentioned, I chose this method because it's simple to use. However, for large datasets, it would be better to use a different method.

Kelly Over a year ago

This worked great! I agree with Jo G that it might hurt performance for larger data sets, but I am luckily only working with about 10,000 rows. I have multiple string_agg columns all based off the same table, so doing the distinct options mentioned above just didn't seem to work unless I really added a lot of sql (maybe a CTE to pre-calculate one of them)

Thomas Riedel · Accepted Answer · 2020-12-15 16:50:34Z

0

You can make a distinct view of the table, that holds the aggregate values, that is even simpler:

Create Table Test (field1 varchar(1), field2 varchar(1));

go

Create View DistinctTest as (Select distinct field1, field2 from test group by field1,field2);

go

insert into Test Select 'A', '1';
insert into Test Select 'A', '2';
insert into Test Select 'A', '2';
insert into Test Select 'A', '2';
insert into Test Select 'D', '1';
insert into Test Select 'D', '1';

select string_agg(field1, ',')  from Test where field2 = '1';  /* duplicates: A,D,D */;

select string_agg(field1, ',')  from DistinctTest where field2 = '1';  /* no duplicates: A,D  */;

answered Dec 15, 2020 at 16:50

Thomas Riedel

714 bronze badges

Comments

ms10 · Accepted Answer · 2025-02-20 09:17:18Z

The favoured solution didn't work for me. The dataset I'm having to work with looks like this:

Col1      Col2
---------------
a         b
a         b
a         c

Running a DISTINCT on these rows, then STRING_AGG outputs:

Col1    Col2
------------
a,a     b,c

The only solution I could arrive at is below (and I favour CTEs over sub-selects for readabilty). Admittedly it's not elegant. Every column to be string aggregated needs a pair of CTEs.

;Col1Cte1 AS
(  
   SELECT DISTINCT Col1, key_id FROM table
)
,Col2Cte1 AS 
(  
   SELECT DISTINCT Col2, key_id FROM table
)
,Col1Cte2 AS
(  
    SELECT key_id, col1 = STRING_AGG (Col1)
    FROM Col1Cte1 
    GROUP BY key_id
)
,Col2Cte2 AS
(  
    SELECT key_id, col2 = STRING_AGG (Col2)
    FROM Col2Cte1 
    GROUP BY key_id
)  
SELECT key_id = COALESCE(col1.key_id, col2.key_id)
      ,col1.col1
      ,col2.col2
FROM Col1Cte2 col1
FULL OUTER JOIN Col2Cte2 col2 on col1.key_id = col2.key_id

Darryl McKenna · Accepted Answer · 2023-12-16 17:51:36Z

-1

In case you want to include other aggregates with your query, you can do:

DROP TABLE IF EXISTS #data
CREATE TABLE #data (row_id INT IDENTITY(1,1), projectID INT, value NVARCHAR(40), cost FLOAT)
INSERT INTO #data(projectID, value, cost )
VALUES 
 (2,'Q96NY7-2',100) 
,(2,'O95833'  ,100) 
,(2,'O95833'  ,100) 
,(2,'Q96NY7-2',100) 
,(2,'O95833'  ,100) 
,(2,'Q96NY7-2',100) 
,(4,'Q96NY7-2',100) 
,(4,'Q96NY7-2',100) 
 
SELECT projectID  = d.projectID
     , value      = REPLACE(STRING_AGG(IIF(x.row_id = d.row_id, x.value, '(x)'),',')   WITHIN GROUP (ORDER BY IIF(x.row_id = d.row_id, x.value, '(x)')), '(x),','')
     , Cost       = SUM(d.COST)
FROM #data d
JOIN (  SELECT DISTINCT projectid, value, row_id = MIN(row_id) 
        FROM #data 
        GROUP BY projectid, value 
     ) x ON x.projectid = d.projectid AND x.value = d.value
GROUP BY d.projectID

projectID	value	Cost
2	O95833,Q96NY7-2	600
4	Q96NY7-2	200

answered Dec 16, 2023 at 17:51

Darryl McKenna

1

1 Comment

Jeremy Caney Over a year ago

Thank you for contributing to the Stack Overflow community. This may be a correct answer, but it’d be really useful to provide additional explanation of your code so developers can understand your reasoning. This is especially useful for new developers who aren’t as familiar with the syntax or struggling to understand the concepts. Would you kindly edit your answer to include additional details for the benefit of the community?

sasynkamil · Accepted Answer · 2021-07-28 07:35:07Z

-7

Oracle (since version 19c) suports listagg (DISTINCT ..., but Microsoft SQL Server not probably.

answered Jul 28, 2021 at 7:35

sasynkamil

9442 gold badges15 silver badges25 bronze badges

1 Comment

Daniel L. VanDenBosch Over a year ago

OP was not asking about Oracle

Collectives™ on Stack Overflow

Get unique values using STRING_AGG in SQL Server

12 Answers 12

2 Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

3 Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

2 Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

3 Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related