1

I have searched all over for an answer and although people say not to use the ORDER BY RAND() clause, I think for my purposes it is ok as this is for a competition which barely has more than a few hundred records at a time PER competition.

So basically i need to retrieve 5 random records from a competition entries table. However any loyalty customers will received an additional EXTRA entry so example:

compEntryid | firstName | lastName | compID |
1           |  bob      |  smith   | 100
2           |  bob      | smith    | 100
3           |  jane     | doe      | 100
4           |  sam      | citizen  | 100

etc

So we are giving the loyalty members a better chance at winning a prize. However im a little worried that the returned result from a usual ORDER BY RAND() can include 2 entries of the SAME person ? What is an optimised method to ensure that we truly have 5 random records but at the same time giving those extra entrants a better or (weighted) chance ? Happy to use multiple queries, sub-queries or even a mix of MySQL and PHP ? Any advice is deeply appreciated thank you !

Bass

EDIT:

These 2 queries both work!

query1

  SELECT concat(firstName, " ", lastName) name,id, email 
    FROM t WHERE 
      RAND()<(SELECT ((5/COUNT(id))*10) FROM t) 
      group by email ORDER BY RAND()  limit 5;

query2

 select distinct 
    email, id, firstName, lastName from 
    (
    select id ,
    email, firstName , lastName , compID, rand()/(select count(*) from t where 
                                             email=t1.email 
                                             ) as rank
    from t t1 
    where compID = 100 
    order by rank) t2 limit 5;

http://sqlfiddle.com/#!2/73470c/2

4
  • You could make use of DISTINCT() Commented Jun 5, 2014 at 12:53
  • What would the query look like with distinct ? also in my schema there may be holes between ID's however they are sequential no matter what in the ID's ? Commented Jun 5, 2014 at 12:59
  • I do not believe this is a duplicate. The posted response giving a link is for a different (and far simpler) problem to just get random records. This question wants random records without duplicates but biasing the random records returned by the number of duplicates for an entry. Commented Jun 5, 2014 at 14:32
  • 1
    Although it is similar to that question, that particular solution works for ONE record and if you were to run that through a PHP loop there is a big possibility in your iterations that you may get the same entrant again otherwise you would have to keep a log of each random generated number and check that it isn't already in your pool of 5 random numbers, i feel this can be done with SQL alone and simply trying to negotiate an answer i hope that makes sense. Commented Jun 5, 2014 at 21:53

2 Answers 2

1

If you have a few hundred record, I think that order by rand() solution should be fine: subquery will order weighting number of entries, but duplicates remains. Parent SELECT will take the first 5 distinct rows.

SELECT DISTINCT firstName , 
                lastName , 
                compID 
FROM
( SELECT compEntryid ,firstName , lastName , compID, rand()/(select count(*) 
  FROM   t 
  WHERE  firstName=t1.firstName AND
         lastName = t1.lastName) AS rank
  FROM   t t1 
  WHERE  compID = 100 
  ORDER BY rank) t2 
LIMIT 5

Fiddle

Sign up to request clarification or add additional context in comments.

8 Comments

As you can see in linked question, ORDER BY rand() is relatively slow.
i understand it is slow, but given the table will have no more than a few hundred records it seems most trivial ? Unless i use something like select cols from table where rand() < 5/(select count(*) from table ) where compid = 100 limit 5 but even that i tried and doesn't work
@Bass I've edited my answer. I think this could be the right way if you have not performance problems.
Interesting solution. I quite like that. It doesn't bring back a compEntryId. You might be able to get that using MIN / GROUP BY, but GROUP BY implicitly adds an order which might mess up the randomness for the limit.
Thanks kiks73 ! your answer and kikstarts answer both work a treat , however which is more optimised ? does your inner query still need to create that temporary table due to use of ORDER BY RAND() ? or because its's a subquery that doesn't apply ? Also i will need a WHERE clause for the COMPID so that i dont include other competition entrants ? thanks !
|
0

I think you will need to use a sub query if you want to return a compEntryid.

SELECT t.firstName, t.lastName, t.compID, MIN(compEntryid)
FROM t
INNER JOIN
(
    SELECT DISTINCT firstName, lastName, compID
    FROM t
    ORDER by rand() 
    LIMIT 5
) t2
ON t.firstName = t2.firstName
AND t.lastName = t2.lastName
AND t.compID = t2.compID
GROUP BY t.firstName, t.lastName, t.compID;

This uses a sub query to get 5 random firstName / lastName / compID. Then joins against the table to get the MIN compEntryId.

However not certain about this. Think it will eliminate the duplicates in the sub query before performing the order / limit, which would prevent someone with more entries having more chances.

EDIT

More of a play and I think I have found a solution. Although efficiency is not one of its strong points.

SELECT MIN(compEntryid), firstName, lastName, compID
FROM
(
    SELECT firstName, lastName, compID, compEntryid, @seq:=@seq+1 AS seq
    FROM
    (
        SELECT firstName, lastName, compID, compEntryid
        FROM t
        ORDER by rand()
    ) sub0
    CROSS JOIN (SELECT @seq:=0) sub1
) sub2
GROUP BY sub2.firstName, sub2.lastName, sub2.compID
ORDER BY MIN(seq)
LIMIT 5

This has an inner sub query that gets all the records in a random order. Around that another sub query adds a sequence number to the records. The outer query groups by the name, etc, and orders by the min sequence number for that name. The compEntryId is just grabbed as the MIN for the name / competition (I am assuming you don't care too much about this).

This way if someone had 5 entries the inner sub query would mix them up in the list. the next sub query would add a sequence number. At this stage those 5 entries could be sequence numbers 1 to 5. The outer one would order by the lowest sequence number for the name and ignore the others, so of those 5 only sequence number 1 would be used and 2 to 5 ignored, with the next selected person being the one with sequence number 6.

This way the more entries they have the more likely they are to be a winner, but can't be 2 of the 5 winners.

With thanks to kiks73 for setting up some sqlfiddle data:-

http://sqlfiddle.com/#!2/cd777/1

EDIT

A solution based on that above by @kiks73. Tweaked to use a non correlated sub query for the counts, and eliminates a few uncertainties. For example with his solution I am not quite sure whether MySQL will chose to do the DISTINCT by implicitly doing a GROUP BY, which would also implicitly do an orderering of the results prior to doing the limit (it doesn't seem to, but I am not sure this behaviour is defined).

SELECT t.firstName , 
        t.lastName , 
        t.compID,
        MIN(rand() / t1.entry_count) AS rank
FROM
(
    SELECT firstName, lastName, compID, COUNT(*) AS entry_count
    FROM   t 
    GROUP BY firstName, lastName, compID
) t1
INNER JOIN t
ON  t.firstName=t1.firstName 
AND t.lastName = t1.lastName
AND t.compID = t1.compID
GROUP BY t.firstName, t.lastName, t.compID
ORDER BY rank
LIMIT 5

6 Comments

Thank you kickstart, a little combersome to understand however i tested it and it works as per above solution too. I also should mention that i have an email field which would be unique to the user so we wouldn't need to check on firstname and lastname is that correct ? And finally i need to to a WHERE on compid ?
I have tried writing a stored procedure but im having problems with restricting it to a particular compid ! ?
Restricting it to a particular compid would just need a WHERE clause for it within the inner sub query. This query carries through the first name / last name from the inner sub query. You could just get the email address, but depends if you need the other details returned. It is cumbersome and I doubt it is that fast.
none the less your query works perfectly ! seems weighted as well, although a order by rand() limit 5 and a group by yeilds similar results it doesn't seem as weighted is that right ?
Kickstart you're an absolute genious mate ! I noticed in your profile you have zero artistic abilities ! im an Art director for profession and programmer enthusiast when im not designing, if you ever need any help in that department im more than happy to assist my friend :)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.