0

I am trying to write an algorithm to establish correlation between n bits integers for the value “1”.

Here is an exemple of a 5 bits integer: 0,1,0,0,1

I want to establish the percentage of correlation between this integer and a set of N other integers.

For example, Integer A(0,1,0,0,1) and Integer B(0,1,0,0,0) have a correlation of 0,5 for the value “1” as only the second bit is matching. In my Firebase database, I have one n bits integer attached to each user_ID that I want to match against the n bits integer of every other user of my application to get a type of correlation between each user. The distribution of the total correlations between users will follow a Gaussian curve that I want to use in the future to match users with each other.

For example: I want user A to be matched with every other user with these matches sorted by decreasing order of affinity (from high to low correlation between their n bits integers).

Do you guys have any idea how I could perform the algorithm to establish the correlation between the N number of users and then perform another algorithm to sort these correlations from high to low? Any help would be greatly appreciated.

Thank you for your time,

Maxime

8
  • Is that 0,5 or 0.5 ? I'm really curious. But I don't get this example of correlation. Can u please brief on your example of correlation a bit? Thanks Commented Mar 26, 2019 at 18:36
  • Hi, thanks a lot for the reply. Sorry if this wasn't clear, indeed I meant 0.5 as they have 1 match in common for the "1" value, out of two "1" values in their integers. Commented Mar 26, 2019 at 19:20
  • It sounds like you're asking for the hamming distance with a condition of bits. Commented Mar 26, 2019 at 19:27
  • I think there are at least two questions here: 1) Given M items, each with an N-bit value, plot the M^2 correlations between each pair of items. 2) From a single Item, find closely correlated users in the database. Commented Mar 26, 2019 at 19:31
  • How big is N? Is it less than 64? Are you just storing the keys as ints? Commented Mar 26, 2019 at 19:35

1 Answer 1

1

you can use the and operation to get the Result R.

Example:

A = 9  = 01001
B = 8  = 01000
C = 7  = 00111
D = 31 = 11111

R = A & B gives 8 = 01000, the correlation is counting the ones: R/A = 1/2 = 0,5. 

R = A & C gives 1 = 00001, the correlation: R/A = 1/2 = 0,5.

R = A & D gives 9 = 01001, R/A = 2/2 = 1.

Here we have a problem. you can solve this by using the max of the ones occuring in the num like R/max(A,D)

I believe it is better to use the total bit count (here 5).

results would be.

corr AB = 1/5 = 0,2
corr AC = 1/5 = 0,2
corr AD = 2/5 = 0,4
corr CD = 3/5 = 0,6
Sign up to request clarification or add additional context in comments.

4 Comments

Hi Aldert, thank you for replying. I took a very similar path as what you described in your post, figuring out correlation in the same way as this, basically going for Correlation = size(intersection(A, B)) / max( size(A), size(B) ). Except the sets A and B are bit integers and calculating R with the total bit count. I then bundled the same integers into buckets and sorted them by relevancy(highest to lowest correlation between them). All my struggle now resides into requesting this data from Firebase...
So if your "bit integers" are database keys, you can query for any users with exactly the same bits set. If that does not produce enough results, you can query for any of the N keys that differ by only 1 bit. Then the N^2 keys that differ by 2 bits...
I'm assuming you have one user and are trying to find a match. Are you actually trying to find the M users with the closest matches?
Hi AShelly, that is exactly what I'm implementing, I am trying to get the M users with the closest matches and I will be using the query you mentioned! You were spot on, thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.