Best way to replace a nested for loop in R

Question

for loops in R are generally considered slow: it's hard to avoid unintended memory read/writes. But how to replace a nested for loop? Which is the best approach?

Please note that this is a generic question: the f function below is just an example, it could be much more complicated or return different objects. I just want to see all the different approaches that one can take in R to avoid nested for loops.

Consider this as an example:

al <- c(2,3,4)
bl <- c("foo", "bar")
f <- function(n, c) { #Just one simple example function, could be much more complicated
    data.frame(n=n, c=c, val=n*nchar(c))
}
d <- data.frame()
for (a in al) { 
    for (b in bl) {
        d <- rbind(d, f(a, b))
        #one could undoubtedly do this a lot better
        #even keeping to nested for loops
    }
}

One could replace it in this absolutely horrible way (take this only as a crude example):

eg <- expand.grid(al, bl)
d <- do.call(rbind,
    lapply(1:dim(eg)[1],
           function(i) {f(as.numeric(eg[i,1]), as.character(eg[i, 2]))}
           )
)

or using library(purrr), which is a little bit less inelegant:

d <- map_dfr(bl, function(b) map2_dfr(al, b, f))

... there are countless different methods. Which one is the simplest, and which one the fastest?

Here is a very quick evaluation of the performance of the three previous methods on my laptop:

Parfait · Accepted Answer · 2018-02-22 17:37:15Z

2

Simply vectorize with expand.grid and nchar. No for or apply loops needed:

eg <- expand.grid(c=bl, n=al, stringsAsFactors = FALSE)
eg$val <- eg$n * nchar(eg$c)

# RE-ORDER COLUMNS
eg <- eg[c("n", "c", "val")]

Or one-line with transform:

eg <- transform(expand.grid(c=bl, n=al, stringsAsFactors = FALSE),
                val=n * nchar(c))[c("n", "c", "val")]

And if you set stringsAsFactors = FALSE in f function:

f <- function(n, c) {
  data.frame(n=n, c=c, val=n*nchar(c), stringsAsFactors = FALSE)
}

Output is equivalent to for loop dataframe:

all.equal(d, eg)
# [1] TRUE

answered Feb 22, 2018 at 17:37

Parfait

108k19 gold badges103 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AF7 Over a year ago

I should have been more clear that my question was more general, i.e. the f function is just a generic example, and I wanted to have an overview of all the possible approaches one can take in R to replace nested for loops.

Parfait Over a year ago

Vectorization is possibly the best possible approach one can take in R to replace nested for loops. It really depends on the function itself. There is no one-size fits all or general rule of thumb.

Onyambu · Accepted Answer · 2018-02-22 17:14:34Z

1

n=rep(al,length(bl));e=rep(bl,length(al))
> cbind.data.frame(n,c=e,val=mapply(function(x,y)x*nchar(y),n,e))
  n   c val
1 2 foo   6
2 3 bar   9
3 4 foo  12
4 2 bar   6
5 3 foo   9
6 4 bar  12

or:

n=rep(al,length(bl));e=rep(bl,length(al))
cbind.data.frame(n,c=e,val=c(outer(al,bl,function(x,y)x*nchar(y))))
  n   c val
1 2 foo   6
2 3 bar   9
3 4 foo  12
4 2 bar   6
5 3 foo   9
6 4 bar  12

answered Feb 22, 2018 at 17:14

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

1 Comment

AF7 Over a year ago

I should have been more clear that my question was more general, i.e. the f function is just a generic example, and I wanted to have an overview of all the possible approaches one can take in R to replace nested for loops. Edited the question.

Collectives™ on Stack Overflow

Best way to replace a nested for loop in R

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related