3

I have an SQL table with one column of integer values and I would like to find the median of those values without any sorting. I first tried finding the numbers in the table that are greater than and less than each value:

SELECT DISTINCT d1.v,       
  (SELECT COUNT(d2.v)
   FROM Values d2  
   WHERE d2.v > d1.v
   ) AS greater_than,
   (SELECT COUNT(d2.v)
   FROM Values d2  
   WHERE d2.v < d1.v
   ) AS less_than
FROM Values d1

enter image description here

I'm not sure how to proceed. I believe I want the values in the above table where greater_than and less_than are both equal to num_entries / 2, but that would only work for a table with an even number of entries. What's a good way to get the median from what I have above?

13
  • Yes I believe you need to add some sort of case work, depending on the parity of the number of entries in the table. Commented Mar 4, 2017 at 20:03
  • 1
    MySQL or SQL-Server, which one is it? Commented Mar 4, 2017 at 20:04
  • 3
    Just wondering, why do you want to do this without sorting? Commented Mar 4, 2017 at 20:11
  • This will also not work if duplicate values are possible. Example: [1, 1, 2] Commented Mar 4, 2017 at 20:19
  • @waka it is MySQL. Sorry I have changed the tags. Commented Mar 4, 2017 at 20:28

2 Answers 2

2

You could do it like this:

SELECT (MIN(v)+MAX(v))/2
FROM   (
        SELECT  CASE WHEN
                        LEAST((SELECT COUNT(*) FROM tbl WHERE v <= d1.v),
                              (SELECT COUNT(*) FROM tbl WHERE v >= d1.v))
                           >= (SELECT COUNT(*)/2 FROM tbl) 
                     THEN v
                END as v
        FROM    tbl d1
       ) as sub

The inner query is guaranteed to return 1 or 2 distinct non-null values among potentially many null values. The non-null values may repeat, but by taking the minimum and maximum those two values can be used to calculate the median.

NB: Don't name your table Values: it is a reserved word.

Sign up to request clarification or add additional context in comments.

7 Comments

But GROUP BY will either sort the table or use an index which is already sorted :-)
Yes unfortunately as @PaulSpiegel pointed out, a valid solution can not use any sorting.
@user6269144 Note that DISTINCT will also use some kind of sorting, so your original attemt would also be "invalid".
@PaulSpiegel yes but it is only really there for clarity. Worst case, it is removed and my final answer gives multiple "same" values for the median, which is acceptable.
I updated my answer with a version that does not use group by, nor distinct.
|
0

Look for total numbers for greater than and less than self number that is equal to the total number of rows/2.

create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);

select (MIN(count)+MAX(count))/2 from 
(select case when (select count(*) from 
med A where A.id<B.id)=(select count(*)/2 from med) OR 
(select count(*) from med A where A.id>B.id)=(select count(*)/2 
from med) then cast(B.id as float)end as count from med B) C;

 ?column? 
----------
  3.5
(1 row)

You need cast to convert to float, otherwise you would get 3 here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.