1

Why am I getting different results from these two functions? Is there something specific to math.h::pow() that I'm doing incorrectly? Or do I need to use a different abs() function than cmath::abs()?

library(Rcpp)
sourceCpp(...) # below C++ function

#include <Rcpp.h>
#include <cmath>
#include <math.h>
#include <iostream>
using namespace Rcpp;

// [[Rcpp::export]]
double dist_q (NumericVector& x, NumericVector& y, int& q) {
  int nx= x.size(), ny = y.size();

  if (nx != ny) {
    std::cout << "ERROR: Length of x and y differ." << std::endl;
    return -1;
  }

  double temp = 0.0;
  int m = 0;
  for (int i = 0; i < nx; i++) {
    if (!NumericVector::is_na(x[i]) && !NumericVector::is_na(y[i])) {
      ++m;
      temp += pow(abs(x[i] - y[i]), (double) q);
    }
  }
  temp = (1 / (double) m * temp);
  return pow(temp, (1 / (double) q));
}

#--------------------------------------------------
# R function
dist_qR <- function(x, y, q= 2) {
  if (!is.numeric(x) | !is.numeric(y)) stop("Both x and y must be numeric.")
  if (q < 1 | q %% 1 != 0) stop("q must be an integer >= 1")

  m <- sum(!is.na(x) & !is.na(y))

  return((1 / m * sum(abs(x - y)^q, na.rm=TRUE))^(1/q))
}


# test
set.seed(1415)
x <- rnorm(1000)
y <- rnorm(1000)
x2 <- x; x2[x2>1.5] <- NA
y2 <- y; y2[y2 > 1.5] <- NA

> dist_q(x,y,2)
[1] 1.089495
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 0.9119293
> dist_qR(x2,y2,2)
[1] 1.249269

Re: @Konrad -- R> ?"&&":

& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

Timing

Using fabs as below:

x <- list(x= rnorm(100),
          x2= rnorm(1000),
          x3= rnorm(10000),
          x4= rnorm(100000))

y <- list(x= rnorm(100),
          x2= rnorm(1000),
          x3= rnorm(10000),
          x4= rnorm(100000))


x2 <- lapply(x, function(l) {l[l>1.5] <- NA; return(l)})
y2 <- lapply(y, function(l) {l[l>1.5] <- NA; return(l)})

library(microbenchmark)

microbenchmark(n100_r= dist_qR(x[[1]], y[[1]], 3),
               n1000_r= dist_qR(x[[2]], y[[2]], 3),
               n10000_r= dist_qR(x[[3]], y[[3]], 3),
               n100000_r= dist_qR(x[[4]], y[[4]], 3),
               n100_c= dist_q(x[[1]], y[[1]], 3),
               n1000_c= dist_q(x[[2]], y[[2]], 3),
               n10000_c= dist_q(x[[3]], y[[3]], 3),
               n100000_c= dist_q(x[[4]], y[[4]], 3), times= 50)

Unit: microseconds
      expr       min        lq        mean     median        uq       max neval  cld
    n100_r    33.431    42.521    72.43820    83.8690    95.599   120.525    50 a   
   n1000_r   125.803   133.720   163.82538   167.1510   189.731   208.792    50 a   
  n10000_r   954.812  1061.260  1190.66434  1086.6260  1131.053  3577.318    50  b  
 n100000_r 10169.212 10806.144 11702.11890 11041.3280 12827.495 15562.607    50    d
    n100_c    12.317    14.662    20.83844    20.5270    26.099    39.002    50 a   
   n1000_c    91.200    97.651   104.58960   105.1295   109.674   119.938    50 a   
  n10000_c   875.928   939.270   953.09926   951.4395   970.060  1015.807    50  b  
 n100000_c  8272.492  9227.011  9472.04164  9339.7640  9531.108 12799.929    50   c 

# missing values not shown, but I get roughly the same timing

It looks like the speed up is minimal for large vectors--likely since the mathematics functions in R are implemented in C. But the speedup is substantial for short vectors. I'm guessing this is likely due to the overhead I use for type-checking in R.

3
  • You should use || and && rather than | and & for logical conditions of length 1. Commented Nov 3, 2015 at 17:17
  • Yeah. It’s unrelated to the problem but it makes it clear that a logical condition rather than an arbitrary-length vector is expected, and performs appropriate short-circuiting. Commented Nov 3, 2015 at 17:20
  • 1
    ... If that's accurate, I think you just blew my mind for the like 10 times I've read through the above help documentation over the past few years Commented Nov 3, 2015 at 17:23

1 Answer 1

4

It looks like abs is getting confused by the numericVector class for some reason. Changing it to the C version fabs solves the problem:

    ...
    temp += pow(fabs(x[i] - y[i]), (double) q);  // fabs instead of abs
    ...

> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 1.249269
> dist_qR(x2,y2,2)
[1] 1.249269
Sign up to request clarification or add additional context in comments.

1 Comment

Or std::abs as a compiler warning suggested on my system.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.