0

I have a table containing duplicates of records. These duplicates are grouped in duplicate groups and also have an index (recordnumber) within the corresponding group. In the relevant table I have all records, even those which are not duplicates.

I need to select only those records, which have a minimum of 2 entries in a aduplicate group. so I used count, group by and having.

the issue is that I get strange result when doing so. The following screenshot shows all records including those with only one entry in a duplicate group. There are about 10k groups containing 2 or more duplicates

The issue is that as soon I uncomment the commented section, I only get 16 records instead of all with > 1 entries in a group and only groupid's 2 to 8...

does anybody see what I am missing here?

SELECT new_firstname AS firstname,
       new_lastname AS lastname,
       DubGroupID AS groupid,
       RecNumberInDupGroup AS recnr_ingroup
FROM [SOMETABLE]
WHERE BatchCheckJobID = '59aae39d7ee949fc8c9cce2a5efc2a5e'
  AND DubGroupID IN (SELECT COUNT(DubGroupID)
                     FROM [SOMETABLE]
                     GROUP BY DubGroupID
                     HAVING COUNT(DubGroupID) > 1)
ORDER BY groupid,
         recnr_ingroup ASC;

Any hint is highly appreciated.

2
  • 3
    Should be and DubGroupID in (select DubGroupID FROM [SOMETABLE]... Commented May 24, 2019 at 8:48
  • aargghh, you're right, missed it completely...thank you very much! Commented May 24, 2019 at 9:03

2 Answers 2

2

This is too long for a comment (as it contains SQL), but couldn't the above be written as the below?

WITH CTE AS(
    SELECT new_firstname AS firstname,
           new_lastname AS lastname,
           DubGroupID AS groupid,
           RecNumberInDupGroup AS recnr_ingroup,
           COUNT(DubGroupID) OVER (PARTITION BY DubGroupID) AS [Count]
    FROM SOMETABLE
    WHERE BatchCheckJobID = '59aae39d7ee949fc8c9cce2a5efc2a5e')
SELECT *
FROM CTE
WHERE [Count] > 1;

That would return all rows where there is more than 1 row with the same value for DubGroupID, where BatchCheckJobID has a value of '59aae39d7ee949fc8c9cce2a5efc2a5e'.

Unlike your query using an IN this won't cause 2 scans of the table either.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you very much...this woul be even better but I need to do it the other way unfortunately...but thank you very much!
What do you mean "the other way" @MartinFelber? Using an IN? Why? if that is an requirement, you should be explaining it in your question.
I meant to use the query with the IN clause. this has a technical background in which I cannot use the "WITH" version unfortunately. @Larnu Yes sorry, forgot to mention it
Why can you not use WITH @MartinFelber? CTEs were introduced in SQL Server 2005 (if I recall correctly, might be 2008) and are available in every supported version of SQL Server. If you can't use WITH, that suggests you're using a (very) old version of SQL Server; which if so should be tagged in your question as the volunteers here will assume you are using supported technology
I know CTE's but there are some limitations in a 3rd party API call which I have to use and which does not accept CTE's. And as I said, I am sorry for not mention this fact....sorry for that.
1

You checking DubGroupID IN(but selecting count here). Do something as below-

......
AND DubGroupID IN (SELECT DubGroupID 
                 FROM [SOMETABLE]
                 GROUP BY DubGroupID
                 HAVING COUNT(DubGroupID) > 1)
.........

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.