2

for loops in R are generally considered slow: it's hard to avoid unintended memory read/writes. But how to replace a nested for loop? Which is the best approach?

Please note that this is a generic question: the f function below is just an example, it could be much more complicated or return different objects. I just want to see all the different approaches that one can take in R to avoid nested for loops.

Consider this as an example:

al <- c(2,3,4)
bl <- c("foo", "bar")
f <- function(n, c) { #Just one simple example function, could be much more complicated
    data.frame(n=n, c=c, val=n*nchar(c))
}
d <- data.frame()
for (a in al) { 
    for (b in bl) {
        d <- rbind(d, f(a, b))
        #one could undoubtedly do this a lot better
        #even keeping to nested for loops
    }
}

One could replace it in this absolutely horrible way (take this only as a crude example):

eg <- expand.grid(al, bl)
d <- do.call(rbind,
    lapply(1:dim(eg)[1],
           function(i) {f(as.numeric(eg[i,1]), as.character(eg[i, 2]))}
           )
)

or using library(purrr), which is a little bit less inelegant:

d <- map_dfr(bl, function(b) map2_dfr(al, b, f))

... there are countless different methods. Which one is the simplest, and which one the fastest?

Here is a very quick evaluation of the performance of the three previous methods on my laptop: enter image description here

2 Answers 2

2

Simply vectorize with expand.grid and nchar. No for or apply loops needed:

eg <- expand.grid(c=bl, n=al, stringsAsFactors = FALSE)
eg$val <- eg$n * nchar(eg$c)

# RE-ORDER COLUMNS
eg <- eg[c("n", "c", "val")]

Or one-line with transform:

eg <- transform(expand.grid(c=bl, n=al, stringsAsFactors = FALSE),
                val=n * nchar(c))[c("n", "c", "val")]

And if you set stringsAsFactors = FALSE in f function:

f <- function(n, c) {
  data.frame(n=n, c=c, val=n*nchar(c), stringsAsFactors = FALSE)
}

Output is equivalent to for loop dataframe:

all.equal(d, eg)
# [1] TRUE
Sign up to request clarification or add additional context in comments.

2 Comments

I should have been more clear that my question was more general, i.e. the f function is just a generic example, and I wanted to have an overview of all the possible approaches one can take in R to replace nested for loops.
Vectorization is possibly the best possible approach one can take in R to replace nested for loops. It really depends on the function itself. There is no one-size fits all or general rule of thumb.
1
n=rep(al,length(bl));e=rep(bl,length(al))
> cbind.data.frame(n,c=e,val=mapply(function(x,y)x*nchar(y),n,e))
  n   c val
1 2 foo   6
2 3 bar   9
3 4 foo  12
4 2 bar   6
5 3 foo   9
6 4 bar  12

or:

n=rep(al,length(bl));e=rep(bl,length(al))
cbind.data.frame(n,c=e,val=c(outer(al,bl,function(x,y)x*nchar(y))))
  n   c val
1 2 foo   6
2 3 bar   9
3 4 foo  12
4 2 bar   6
5 3 foo   9
6 4 bar  12

1 Comment

I should have been more clear that my question was more general, i.e. the f function is just a generic example, and I wanted to have an overview of all the possible approaches one can take in R to replace nested for loops. Edited the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.