2

How can I create a new column that returns the number of distinct values in each row inside my table? For instance,

ID   Description   Pay1    Pay2   Pay3    #UniquePays     
1    asdf1         10      20     10      2
2    asdf2         0       10     20      3
3    asdf3         100     100    100     1
4    asdf4                 0      10      3

The query may return >1million rows so it needs to be somewhat efficient. There are 8 'Pay' columns in total, which are either NULL or an integer. Also note that '0' should be counted distinct from NULL.

The most I've been able to accomplish thus far (which I just realized isn't even accurate) is counting the total number of Pay entries in each row:

nvl(length(length(Pay1)),0)
+nvl(length(length(Pay2)),0)
+nvl(length(length(Pay3)),0) "NumPays"

The typical row only has 4 of the 8 columns populated, with the rest being null, and the max integer in the Pay column is '999' (hence the length-length conversion attempt..)

My SQL skills are primitive but any help is appreciated!

7
  • Is your current output the result of a query that does a pivot? If so it would be easier to start from the base data. Otherwise it looks like you might have denormalised data. Commented Jan 27, 2017 at 18:05
  • Your table is poorly designed. You shouldn't have 8 columns for the Pay, you should insert one row for each pay. Commented Jan 27, 2017 at 18:05
  • Are two NULL considered "the same"? Commented Jan 27, 2017 at 18:06
  • The Pay columns represent different types/categories of payment (not historical/transactional payments being made), hence the reason for there being 8 instances of them, so they are not the same. And yes, multiple NULL values should be counted as the same. Commented Jan 27, 2017 at 18:07
  • @AlexPoole - with what you said... the first thought, of course, was to UNPIVOT (one way or another, depending on Oracle version); but if that's how the rows are stored, then unpivoting would be quite inefficient, better to write long and ugly code that just processes each row as it comes in. Of course, if the "input" is the result of a PIVOT operation, that's another matter. Commented Jan 27, 2017 at 18:08

6 Answers 6

3

If you have, or can create, a user-defined table of numbers, you could use create a collection, use the set function to get rid of duplicates, and then use the cardinality function to count the remaining values:

cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays

To include all eight of your columns, just add the extra column names to list passed to the tnum() constructor.

cardinality(set(t_num(pay1, pay2, pay3, pay4, pay5, pay6, pay7, pay8))) as uniquepays

Demo with your sample table generated as a CTE:

create type t_num as table of number
/

with t (ID, Description, Pay1, Pay2, Pay3) as (
  select 1, 'asdf1', 10, 20, 10 from dual
  union all select 2, 'asdf2', 0, 10, 20 from dual
  union all select 3, 'asdf3', 100, 100, 100 from dual
  union all select 4, 'asdf4', null, 0, 10 from dual
)
select id, description, pay1, pay2, pay3,
  cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays
from t
order by id;

        ID DESCR       PAY1       PAY2       PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
         1 asdf1         10         20         10          2
         2 asdf2          0         10         20          3
         3 asdf3        100        100        100          1
         4 asdf4                     0         10          3

Whether that is efficient enough with millions of rows will need to be tested.

Sign up to request clarification or add additional context in comments.

5 Comments

I like how your use of CARDINALITY handles the nulls the way the OP wants. Personally, I wouldn't count a null pay value toward UNIQUEPAYS, but the question is the question.
I think VALUES() is better than from dual union all
@Hogan - as far as I'm aware only SQL Server and PostgreSQL support a construct like values(1, 'asdf1', 10, 20, 10), (2, 'asdf2', ...), .... If that's what you mean. Oracle only allows values in an insert statement, and even then only a single.. er.. set of values. Another part of the SQL standard Oracle doesn't support?
@Hogan - OK, then JavaDB supports it too. Oracle RDBMS still doesn't though, unfortunately, even in 12c. (Oracle's doc site covers all their products, including MySQL, which also causes much confusion.) Interestingly one of the contributors here requested it be added last year.
1

Split out each value into its own row (like it should have been stored in the first place), then union then up and (since union discards duplicates) just count the rows:

select id, description, count(*) unique_pays from (
    select id, description, nvl(pay1, -1) from mytable
    union select id, description, nvl(pay2, -1) from mytable
    union select id, description, nvl(pay3, -1) from mytable
    union select id, description, nvl(pay4, -1) from mytable
    union select id, description, nvl(pay5, -1) from mytable
    union select id, description, nvl(pay6, -1) from mytable
    union select id, description, nvl(pay7, -1) from mytable
    union select id, description, nvl(pay8, -1) from mytable
) x
group by id, description

I changed nulls into -1 so they would participate cleanly in the deduping.

15 Comments

This will read a table with millions of rows, eight times. OP was concerned about efficiency...
@mathguy then maybe OP should fix his schema. Your point has merit, but "millions of rows" is not that many these days (100's of millions is getting large). This may perform OK anyway - let's see which answer OP chooses.
Per my comment on original post, the 'Pay' columns are not historical but instead categorical. Think of them as 'Tiers' instead of 'payment history'. They're different from another, hence the reason for there being 8 of them (and only 8).
@mathguy Which is an optimization concern you have after you evaluated that the benefits outweight the disadvantages, which is clearly not what happened here. If that's the route OP wants to go, then that's his call, he can pick any answer here and go with it. But when I read "My SQL skills are primitive", my first reaction is to help him identify design flaws, not to encourage him to go down this route.
Just wanted to add this method appears to work as well! Thank you!
|
1

Here is a solution that reads the base table just once, and takes advantage of the data being organized in rows already. (Unpivoting would be inefficient, since this information would be lost resulting in massive additional work.)

It assumes all NULLs are counted as the same. If instead they should be considered different from each other, change the -1 in nvl to distinct values: -1 for Pay1, -2 for Pay2, etc.

with
     inputs( ID, Description, Pay1, Pay2, Pay3 ) as (     
       select 1, 'asdf1',                   10,  20,  10 from dual union all
       select 2, 'asdf2',                    0,  10,  20 from dual union all
       select 3, 'asdf3',                  100, 100, 100 from dual union all
       select 4, 'asdf4', cast(null as number),   0,  10 from dual
     )
--  End of TEST data (not part of solution!) SQL query begins BELOW THIS LINE.
select   id, description, pay1, pay2, pay3,
           1
         + case when nvl(pay2, -1) not in (nvl(pay1, -1)) 
                then 1 else 0 end
         + case when nvl(pay3, -1) not in (nvl(pay1, -1), nvl(pay2, -1))
                then 1 else 0 end
                                       as distinct_pays
from     inputs
order by id   --  if needed
;

ID DESCRIPTION     PAY1    PAY2    PAY3 DISTINCT_PAYS
-- ------------ ------- ------- ------- -------------
 1 asdf1             10      20      10             2
 2 asdf2              0      10      20             3
 3 asdf3            100     100     100             1
 4 asdf4                      0      10             3

4 rows selected.

12 Comments

You do realise that there are 8 pay columns. Your case statement is going to have to cater for a VERY large number of comparisons (which is why I used union btw). Let's see them!
@Bohemian - Yes, I realize that. The comparisons are trivial; the bottleneck (in most cases) will be accessing the rows repeatedly. Besides: any kind of counting does pretty much the same comparisons (how do you expect the Oracle engine to count distinct values anyway?)
oracle does it using O(n log n) efficiency, but you must use O(n2) efficiency because that's how many comparisons must be made using case statements
@Bohemian - actually it is O(n^2) (check your math). And editorial comments like "crappy" are best left out in writing - although I will admit I am also guilty of that sometimes. And n, here, is 8, not millions. So in fact it is still O(1) (a constant number of comparisons).
@mathguy I work with Oracle e-Business Suite and negative payments are quite common. They're called "credit memos" and they're issued for things like returns and refunds. Maybe I wouldn't model them as a negative value in the pay column, but neither would I create a table with eight "pay" columns in the first place. When that's your starting data model, I'd say you're in the realm of "anything is possible".
|
1

Here is one relatively simple way:

CREATE TYPE number_list AS TABLE OF NUMBER;

with t (ID, Description, Pay1, Pay2, Pay3) as (
  select 1, 'asdf1', 10, 20, 10 from dual
  union all select 2, 'asdf2', 0, 10, 20 from dual
  union all select 3, 'asdf3', 100, 100, 100 from dual
  union all select 4, 'asdf4', null, 0, 10 from dual
)
SELECT id,
       description,
       pay1,
       pay2,
       pay3,
       (SELECT COUNT (DISTINCT NVL (TO_CHAR (COLUMN_VALUE), '#NULL#')) 
        FROM TABLE (number_list (pay1, pay2, pay3))) uniquepays
FROM   t;

        ID DESCR       PAY1       PAY2       PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
         1 asdf1         10         20         10          2
         2 asdf2          0         10         20          3
         3 asdf3        100        100        100          1
         4 asdf4                     0         10          3

Comments

0

The solution would be:

  1. Start with your initial table without the column #uniquePays.
  2. Unpivot your table.

From this

ID   Description   Pay1    Pay2   Pay3 
1    asdf1         10      20     10  

Make this:

ID seq Description Pay
 1   1 asdf1       10
 1   2 asdf1       20
 1   3 asdf1       10
  1. From the unpivoted table, run a SELECT COUNT(DISTINCT Pay)
  2. Re-pivot the table, adding the COUNT(DISTINCT Pay).

Will this do, or do you need an exemplary script? I've been posting quite a bit about pivoting and un-pivoting lately .... seems to be a popular need :-]

Marco the Sane

2 Comments

Count ignores nulls - so for ID 4, that would report 2 unique values rather than 3? This also may not be very efficient for a large table.
I had a feeling pivots would be where this is headed, this makes sense to me. Example script would be helpful to get me on track, but either way I'll be brushing up on my pivoting skills! Thank you!
-1

You can write on insert trigger or stored procedure to count total number of unique value for each insert statement and update in unique column.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.