2

I compute Pearson correlation (average user/item rating) many times, using my current code performance is very bad:

public double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
        {
            if (x.Length != y.Length)
                throw new ArgumentException("values must be the same length");

            double sumNum = 0;
            double sumDenom = 0;
            double denomX = 0;
            double denomY = 0;

            for (int a = 0; a < x.Length; a++)
            {
                sumNum += (x[a] - meanX[a]) * (y[a] - meanY[a]);
                denomX += Math.Pow(x[a] - meanX[a], 2);
                denomY += Math.Pow(y[a] - meanY[a], 2);
            }

            var sqrtDenomX = Math.Sqrt(denomX);
            var sqrtDenomY = Math.Sqrt(denomY);

            if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;

            sumDenom = Math.Sqrt(denomX) * Math.Sqrt(denomY);

            var correlation = sumNum / sumDenom;

            return correlation;
        }

I am using standard Pearson correlation with MathNet.Numerics, but this is modification of standard and it's not possible to use it. Is there a way to speed it up? How can it be optimizied regarding to time complexity?

2
  • I think it's better to ask this question here codereview.stackexchange.com Commented Apr 21, 2016 at 18:17
  • We can make some assumptions by looking at the code, but what we don't know is how much those assumptions actually improve the performance. You should run this through a profiler to truly see what is causing the slowness. Commented Apr 21, 2016 at 22:40

3 Answers 3

2

Adding some on MSE answer -- changing Pow(x,2) to diff*diff is definitely something you want to do, you may also want to avoid unnecessary bound-checking in your inner-most loop. This may be done using pointers in C#.

Could be done this way:

    public unsafe double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
    {
        if (x.Length != y.Length)
            throw new ArgumentException("values must be the same length");

        double sumNum = 0;
        double sumDenom = 0;
        double denomX = 0;
        double denomY = 0;
        double diffX;
        double diffY;

        int len = x.Length;

        fixed (double* xptr = &x[0], yptr = &y[0], meanXptr = &meanX[0], meanYptr = &meanY[0])
        {
            for (int a = 0; a < len; a++)
            {
                diffX = (xptr[a] - meanXptr[a]);
                diffY = (yptr[a] - meanYptr[a]);
                sumNum += diffX * diffY;
                denomX += diffX * diffX;
                denomY += diffY * diffY;
            }
        }

        var sqrtDenomX = Math.Sqrt(denomX);
        var sqrtDenomY = Math.Sqrt(denomY);

        if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;

        sumDenom = sqrtDenomX * sqrtDenomY;

        var correlation = sumNum / sumDenom;

        return correlation;
    }
Sign up to request clarification or add additional context in comments.

Comments

1

The best way to solve your performance problems is probably to avoid computing as many correlations, if possible. If you are using the correlations as part of another computation, it may be possible to use math to remove the need for some of them.

You should also consider whether you will be able to use the square of the Pearson correlation instead of the Pearson correlation itself. That way, you can save your calls to Math.Sqrt(), which are usually quite expensive.

If you do need to take the square root, you should use sqrtDenomX and sqrtDenomY again, rather than recompute the square roots.

Comments

1

The only possible optimizations that I see in your code are in the following code, if you are still looking for better performance then you may want use SIMD vectorization. It will allow you to use the full computation power of the CPU

public double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
    {
        if (x.Length != y.Length)
            throw new ArgumentException("values must be the same length");

        double sumNum = 0;
        double sumDenom = 0;
        double denomX = 0;
        double denomY = 0;
        double diffX;
        double diffY;

        for (int a = 0; a < x.Length; a++)
        {
            diffX = (x[a] - meanX[a]);
            diffY = (y[a] - meanY[a]);
            sumNum += diffX * diffY;
            denomX += diffX * diffX;
            denomY += diffY * diffY;
        }

        var sqrtDenomX = Math.Sqrt(denomX);
        var sqrtDenomY = Math.Sqrt(denomY);

        if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;

        sumDenom = sqrtDenomX * sqrtDenomY;

        var correlation = sumNum / sumDenom;

        return correlation;
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.