Finding duplicate rows in table with many columns

Question

I have a table with 122 columns and ~200K rows. There are duplicate rows in this table. What query can I use to return these duplicate rows? Normally I would GROUP BY all rows and COUNT, but with 122 columns, this becomes unwieldy.

Basically, I'm looking for a query that does this:

SELECT *, COUNT(*) AS NoOfOccurrences
FROM TableName
GROUP BY *
HAVING COUNT(*) > 1

Big, unwieldy tables yield big, unwieldy SQL... If you're looking for duplicates across all columns you're looking at your GROUP BY and COUNT. There are tools (Redgate SQL Prompt, dbForge SQL Complete, etc) that will do auto-expand of things like SELECT *..., other than that you have to type the SQL. I suppose you could use dynamic SQL to generate a query from sys.tables and sys.columns, but... — squillman
– squillman, Commented May 3, 2022 at 14:15
@philipxy, I didn't necessarily want to remove duplicate rows, I want to see them. I had a typo in my initial question, I meant "query" instead of "something". By my example, I meant how to group by all columns without writing them all out (all 122 of them) twice, once in the SELECT and once in the GROUP BY. I don't think this question requires an example since I think what I'm asking is clear, but I can include one if you would like. — Python Developer
– Python Developer, Commented May 5, 2022 at 14:27
That doesn't answer what RDBMS you are using though, @PythonDeveloper . Why should I "tag my RDBMS"? — Thom A
– Thom A ♦, Commented May 5, 2022 at 14:41
Please clarify via edits, not comments. PS If "duplicate rows" is just your motivation for GROUP BY it seems unnecessary to mention. PS Please before considering posting read the manual/reference & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. If asking reflect research. How to Ask PS Please avoid greetings, thanks, etc. — philipxy
– philipxy, Commented May 5, 2022 at 20:36
PS A "basically" or "essentially" or "in other words" that doesn't introduce or summarize a clear, precise & full description that you also give just means "unclearly" or "it is false that". — philipxy
– philipxy, Commented May 5, 2022 at 20:44

CHill60 · Accepted Answer · 2022-05-05 14:46:13Z

If you are using SSMS you can right-click on the table and pick "Select Top 1000 rows..." - SSMS will generate the select query for you - then all you have to do is add GROUP BY then copy the column list and paste it after that. Add the HAVING COUNT(*) > 1 and the COUNT(*) AS NoOfOccurrences and run.

I suggest that you include an ORDER BY clause as well so that the duplicated rows are displayed together in your results.

If you are not using SSMS then you can run this dynamic SQL

-- Get the list of columns
-- There are loads of ways of doing this
declare @colList nvarchar(MAX) = '';
 
select  @colList = @colList + c.[name] +','
from sys.columns c
join sys.tables t on c.object_id =t.object_id
where t.[name] = 'tblFees';

-- remove the trailing comma
set @colList = LEFT(@colList,LEN(@colList)-1);

-- generate dynamic SQL
declare @sql1 nvarchar(max) = 'SELECT *, COUNT(*) AS NoOfOccurrences FROM TableName GROUP BY '
declare @sql2 nvarchar(max) = ' HAVING COUNT(*) > 1'
declare @sql nvarchar(max) = CONCAT(@sql1,@colList, @sql2)
--print @sql

-- run the SQL
exec sp_executesql @sql

For other ways of generating comma separated lists see Converting row values in a table to a single concatenated string

Collectives™ on Stack Overflow

Finding duplicate rows in table with many columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related