Count the number of distinct values of each row (SQL)

Question

How can I create a new column that returns the number of distinct values in each row inside my table? For instance,

ID   Description   Pay1    Pay2   Pay3    #UniquePays     
1    asdf1         10      20     10      2
2    asdf2         0       10     20      3
3    asdf3         100     100    100     1
4    asdf4                 0      10      3

The query may return >1million rows so it needs to be somewhat efficient. There are 8 'Pay' columns in total, which are either NULL or an integer. Also note that '0' should be counted distinct from NULL.

The most I've been able to accomplish thus far (which I just realized isn't even accurate) is counting the total number of Pay entries in each row:

nvl(length(length(Pay1)),0)
+nvl(length(length(Pay2)),0)
+nvl(length(length(Pay3)),0) "NumPays"

The typical row only has 4 of the 8 columns populated, with the rest being null, and the max integer in the Pay column is '999' (hence the length-length conversion attempt..)

My SQL skills are primitive but any help is appreciated!

Is your current output the result of a query that does a pivot? If so it would be easier to start from the base data. Otherwise it looks like you might have denormalised data. — Alex Poole
– Alex Poole, Commented Jan 27, 2017 at 18:05
Your table is poorly designed. You shouldn't have 8 columns for the Pay, you should insert one row for each pay. — Vincent Savard
– Vincent Savard, Commented Jan 27, 2017 at 18:05
The Pay columns represent different types/categories of payment (not historical/transactional payments being made), hence the reason for there being 8 instances of them, so they are not the same. And yes, multiple NULL values should be counted as the same. — KevinT
– KevinT, Commented Jan 27, 2017 at 18:07
@AlexPoole - with what you said... the first thought, of course, was to UNPIVOT (one way or another, depending on Oracle version); but if that's how the rows are stored, then unpivoting would be quite inefficient, better to write long and ugly code that just processes each row as it comes in. Of course, if the "input" is the result of a PIVOT operation, that's another matter. — user5683823
– user5683823, Commented Jan 27, 2017 at 18:08

Alex Poole · Accepted Answer · 2017-01-27 18:34:12Z

3

If you have, or can create, a user-defined table of numbers, you could use create a collection, use the set function to get rid of duplicates, and then use the cardinality function to count the remaining values:

cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays

To include all eight of your columns, just add the extra column names to list passed to the tnum() constructor.

cardinality(set(t_num(pay1, pay2, pay3, pay4, pay5, pay6, pay7, pay8))) as uniquepays

Demo with your sample table generated as a CTE:

create type t_num as table of number
/

with t (ID, Description, Pay1, Pay2, Pay3) as (
  select 1, 'asdf1', 10, 20, 10 from dual
  union all select 2, 'asdf2', 0, 10, 20 from dual
  union all select 3, 'asdf3', 100, 100, 100 from dual
  union all select 4, 'asdf4', null, 0, 10 from dual
)
select id, description, pay1, pay2, pay3,
  cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays
from t
order by id;

        ID DESCR       PAY1       PAY2       PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
         1 asdf1         10         20         10          2
         2 asdf2          0         10         20          3
         3 asdf3        100        100        100          1
         4 asdf4                     0         10          3

Whether that is efficient enough with millions of rows will need to be tested.

edited Jan 27, 2017 at 18:34

answered Jan 27, 2017 at 18:23

Alex Poole

192k11 gold badges198 silver badges349 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Matthew McPeak Over a year ago

I like how your use of CARDINALITY handles the nulls the way the OP wants. Personally, I wouldn't count a null pay value toward UNIQUEPAYS, but the question is the question.

Hogan Over a year ago

I think VALUES() is better than from dual union all

Alex Poole Over a year ago

@Hogan - as far as I'm aware only SQL Server and PostgreSQL support a construct like values(1, 'asdf1', 10, 20, 10), (2, 'asdf2', ...), .... If that's what you mean. Oracle only allows values in an insert statement, and even then only a single.. er.. set of values. Another part of the SQL standard Oracle doesn't support?

Hogan Over a year ago

I just did a google search docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj11277.html

Alex Poole Over a year ago

@Hogan - OK, then JavaDB supports it too. Oracle RDBMS still doesn't though, unfortunately, even in 12c. (Oracle's doc site covers all their products, including MySQL, which also causes much confusion.) Interestingly one of the contributors here requested it be added last year.

Bohemian · Accepted Answer · 2017-01-27 18:20:25Z

1

Split out each value into its own row (like it should have been stored in the first place), then union then up and (since union discards duplicates) just count the rows:

select id, description, count(*) unique_pays from (
    select id, description, nvl(pay1, -1) from mytable
    union select id, description, nvl(pay2, -1) from mytable
    union select id, description, nvl(pay3, -1) from mytable
    union select id, description, nvl(pay4, -1) from mytable
    union select id, description, nvl(pay5, -1) from mytable
    union select id, description, nvl(pay6, -1) from mytable
    union select id, description, nvl(pay7, -1) from mytable
    union select id, description, nvl(pay8, -1) from mytable
) x
group by id, description

I changed nulls into -1 so they would participate cleanly in the deduping.

answered Jan 27, 2017 at 18:20

Bohemian♦

427k103 gold badges604 silver badges750 bronze badges

15 Comments

user5683823 Over a year ago

This will read a table with millions of rows, eight times. OP was concerned about efficiency...

Bohemian Over a year ago

@mathguy then maybe OP should fix his schema. Your point has merit, but "millions of rows" is not that many these days (100's of millions is getting large). This may perform OK anyway - let's see which answer OP chooses.

KevinT Over a year ago

Per my comment on original post, the 'Pay' columns are not historical but instead categorical. Think of them as 'Tiers' instead of 'payment history'. They're different from another, hence the reason for there being 8 of them (and only 8).

Vincent Savard Over a year ago

@mathguy Which is an optimization concern you have after you evaluated that the benefits outweight the disadvantages, which is clearly not what happened here. If that's the route OP wants to go, then that's his call, he can pick any answer here and go with it. But when I read "My SQL skills are primitive", my first reaction is to help him identify design flaws, not to encourage him to go down this route.

KevinT Over a year ago

Just wanted to add this method appears to work as well! Thank you!

|

user5683823 · Accepted Answer · 2017-01-27 18:24:23Z

1

Here is a solution that reads the base table just once, and takes advantage of the data being organized in rows already. (Unpivoting would be inefficient, since this information would be lost resulting in massive additional work.)

It assumes all NULLs are counted as the same. If instead they should be considered different from each other, change the -1 in nvl to distinct values: -1 for Pay1, -2 for Pay2, etc.

with
     inputs( ID, Description, Pay1, Pay2, Pay3 ) as (     
       select 1, 'asdf1',                   10,  20,  10 from dual union all
       select 2, 'asdf2',                    0,  10,  20 from dual union all
       select 3, 'asdf3',                  100, 100, 100 from dual union all
       select 4, 'asdf4', cast(null as number),   0,  10 from dual
     )
--  End of TEST data (not part of solution!) SQL query begins BELOW THIS LINE.
select   id, description, pay1, pay2, pay3,
           1
         + case when nvl(pay2, -1) not in (nvl(pay1, -1)) 
                then 1 else 0 end
         + case when nvl(pay3, -1) not in (nvl(pay1, -1), nvl(pay2, -1))
                then 1 else 0 end
                                       as distinct_pays
from     inputs
order by id   --  if needed
;

ID DESCRIPTION     PAY1    PAY2    PAY3 DISTINCT_PAYS
-- ------------ ------- ------- ------- -------------
 1 asdf1             10      20      10             2
 2 asdf2              0      10      20             3
 3 asdf3            100     100     100             1
 4 asdf4                      0      10             3

4 rows selected.

answered Jan 27, 2017 at 18:24

user5683823

12 Comments

Bohemian Over a year ago

You do realise that there are 8 pay columns. Your case statement is going to have to cater for a VERY large number of comparisons (which is why I used union btw). Let's see them!

user5683823 Over a year ago

@Bohemian - Yes, I realize that. The comparisons are trivial; the bottleneck (in most cases) will be accessing the rows repeatedly. Besides: any kind of counting does pretty much the same comparisons (how do you expect the Oracle engine to count distinct values anyway?)

Bohemian Over a year ago

oracle does it using O(n log n) efficiency, but you must use O(n2) efficiency because that's how many comparisons must be made using case statements

user5683823 Over a year ago

@Bohemian - actually it is O(n^2) (check your math). And editorial comments like "crappy" are best left out in writing - although I will admit I am also guilty of that sometimes. And n, here, is 8, not millions. So in fact it is still O(1) (a constant number of comparisons).

Matthew McPeak Over a year ago

@mathguy I work with Oracle e-Business Suite and negative payments are quite common. They're called "credit memos" and they're issued for things like returns and refunds. Maybe I wouldn't model them as a negative value in the pay column, but neither would I create a table with eight "pay" columns in the first place. When that's your starting data model, I'd say you're in the realm of "anything is possible".

|

Matthew McPeak · Accepted Answer · 2017-01-27 18:47:10Z

Here is one relatively simple way:

CREATE TYPE number_list AS TABLE OF NUMBER;

with t (ID, Description, Pay1, Pay2, Pay3) as (
  select 1, 'asdf1', 10, 20, 10 from dual
  union all select 2, 'asdf2', 0, 10, 20 from dual
  union all select 3, 'asdf3', 100, 100, 100 from dual
  union all select 4, 'asdf4', null, 0, 10 from dual
)
SELECT id,
       description,
       pay1,
       pay2,
       pay3,
       (SELECT COUNT (DISTINCT NVL (TO_CHAR (COLUMN_VALUE), '#NULL#')) 
        FROM TABLE (number_list (pay1, pay2, pay3))) uniquepays
FROM   t;

        ID DESCR       PAY1       PAY2       PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
         1 asdf1         10         20         10          2
         2 asdf2          0         10         20          3
         3 asdf3        100        100        100          1
         4 asdf4                     0         10          3

marcothesane · Accepted Answer · 2017-01-27 18:22:20Z

0

The solution would be:

Start with your initial table without the column #uniquePays.
Unpivot your table.

From this

ID   Description   Pay1    Pay2   Pay3 
1    asdf1         10      20     10

Make this:

ID seq Description Pay
 1   1 asdf1       10
 1   2 asdf1       20
 1   3 asdf1       10

From the unpivoted table, run a SELECT COUNT(DISTINCT Pay)
Re-pivot the table, adding the COUNT(DISTINCT Pay).

Will this do, or do you need an exemplary script? I've been posting quite a bit about pivoting and un-pivoting lately .... seems to be a popular need :-]

Marco the Sane

answered Jan 27, 2017 at 18:22

marcothesane

6,8421 gold badge13 silver badges24 bronze badges

2 Comments

Alex Poole Over a year ago

Count ignores nulls - so for ID 4, that would report 2 unique values rather than 3? This also may not be very efficient for a large table.

KevinT Over a year ago

I had a feeling pivots would be where this is headed, this makes sense to me. Example script would be helpful to get me on track, but either way I'll be brushing up on my pivoting skills! Thank you!

Aanand Kumar · Accepted Answer · 2017-01-27 18:12:55Z

-1

You can write on insert trigger or stored procedure to count total number of unique value for each insert statement and update in unique column.

answered Jan 27, 2017 at 18:12

Aanand Kumar

766 bronze badges

Collectives™ on Stack Overflow

Count the number of distinct values of each row (SQL)

6 Answers 6

5 Comments

15 Comments

12 Comments

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

5 Comments

15 Comments

12 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related