Rcpp vs R function. bug in my code?

Question

Why am I getting different results from these two functions? Is there something specific to math.h::pow() that I'm doing incorrectly? Or do I need to use a different abs() function than cmath::abs()?

library(Rcpp)
sourceCpp(...) # below C++ function

#include <Rcpp.h>
#include <cmath>
#include <math.h>
#include <iostream>
using namespace Rcpp;

// [[Rcpp::export]]
double dist_q (NumericVector& x, NumericVector& y, int& q) {
  int nx= x.size(), ny = y.size();

  if (nx != ny) {
    std::cout << "ERROR: Length of x and y differ." << std::endl;
    return -1;
  }

  double temp = 0.0;
  int m = 0;
  for (int i = 0; i < nx; i++) {
    if (!NumericVector::is_na(x[i]) && !NumericVector::is_na(y[i])) {
      ++m;
      temp += pow(abs(x[i] - y[i]), (double) q);
    }
  }
  temp = (1 / (double) m * temp);
  return pow(temp, (1 / (double) q));
}

#--------------------------------------------------
# R function
dist_qR <- function(x, y, q= 2) {
  if (!is.numeric(x) | !is.numeric(y)) stop("Both x and y must be numeric.")
  if (q < 1 | q %% 1 != 0) stop("q must be an integer >= 1")

  m <- sum(!is.na(x) & !is.na(y))

  return((1 / m * sum(abs(x - y)^q, na.rm=TRUE))^(1/q))
}


# test
set.seed(1415)
x <- rnorm(1000)
y <- rnorm(1000)
x2 <- x; x2[x2>1.5] <- NA
y2 <- y; y2[y2 > 1.5] <- NA

> dist_q(x,y,2)
[1] 1.089495
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 0.9119293
> dist_qR(x2,y2,2)
[1] 1.249269

Re: @Konrad -- R> ?"&&":

& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

Timing

Using fabs as below:

x <- list(x= rnorm(100),
          x2= rnorm(1000),
          x3= rnorm(10000),
          x4= rnorm(100000))

y <- list(x= rnorm(100),
          x2= rnorm(1000),
          x3= rnorm(10000),
          x4= rnorm(100000))


x2 <- lapply(x, function(l) {l[l>1.5] <- NA; return(l)})
y2 <- lapply(y, function(l) {l[l>1.5] <- NA; return(l)})

library(microbenchmark)

microbenchmark(n100_r= dist_qR(x[[1]], y[[1]], 3),
               n1000_r= dist_qR(x[[2]], y[[2]], 3),
               n10000_r= dist_qR(x[[3]], y[[3]], 3),
               n100000_r= dist_qR(x[[4]], y[[4]], 3),
               n100_c= dist_q(x[[1]], y[[1]], 3),
               n1000_c= dist_q(x[[2]], y[[2]], 3),
               n10000_c= dist_q(x[[3]], y[[3]], 3),
               n100000_c= dist_q(x[[4]], y[[4]], 3), times= 50)

Unit: microseconds
      expr       min        lq        mean     median        uq       max neval  cld
    n100_r    33.431    42.521    72.43820    83.8690    95.599   120.525    50 a   
   n1000_r   125.803   133.720   163.82538   167.1510   189.731   208.792    50 a   
  n10000_r   954.812  1061.260  1190.66434  1086.6260  1131.053  3577.318    50  b  
 n100000_r 10169.212 10806.144 11702.11890 11041.3280 12827.495 15562.607    50    d
    n100_c    12.317    14.662    20.83844    20.5270    26.099    39.002    50 a   
   n1000_c    91.200    97.651   104.58960   105.1295   109.674   119.938    50 a   
  n10000_c   875.928   939.270   953.09926   951.4395   970.060  1015.807    50  b  
 n100000_c  8272.492  9227.011  9472.04164  9339.7640  9531.108 12799.929    50   c 

# missing values not shown, but I get roughly the same timing

It looks like the speed up is minimal for large vectors--likely since the mathematics functions in R are implemented in C. But the speedup is substantial for short vectors. I'm guessing this is likely due to the overhead I use for type-checking in R.

You should use || and && rather than | and & for logical conditions of length 1. — Konrad Rudolph
– Konrad Rudolph, Commented Nov 3, 2015 at 17:17
Yeah. It’s unrelated to the problem but it makes it clear that a logical condition rather than an arbitrary-length vector is expected, and performs appropriate short-circuiting. — Konrad Rudolph
– Konrad Rudolph, Commented Nov 3, 2015 at 17:20
... If that's accurate, I think you just blew my mind for the like 10 times I've read through the above help documentation over the past few years — alexwhitworth
– alexwhitworth, Commented Nov 3, 2015 at 17:23

Hong Ooi · Accepted Answer · 2015-11-03 17:39:14Z

4

It looks like abs is getting confused by the numericVector class for some reason. Changing it to the C version fabs solves the problem:

    ...
    temp += pow(fabs(x[i] - y[i]), (double) q);  // fabs instead of abs
    ...

> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 1.249269
> dist_qR(x2,y2,2)
[1] 1.249269

answered Nov 3, 2015 at 17:39

Hong Ooi

57.9k13 gold badges145 silver badges193 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Roland Over a year ago

Or std::abs as a compiler warning suggested on my system.

Collectives™ on Stack Overflow

Rcpp vs R function. bug in my code?

Timing

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Timing

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related