Sum the vectors stored in the list using Rcpp

Question

Suppose I have the following list of vector

List <- list(c(1:3), c(4:6), c(7:9))

To get the required result I have the following code in Rcpp

 totalCpp <- {"#include <Rcpp.h>
using namespace Rcpp;
  // [[Rcpp::export]]
List t_list(List r_list) {
  List results;
  for (int i = 0; i < r_list.size(); ++i) {
    NumericVector vec = as<NumericVector>(r_list[i]);
    int sum = 0;
    for (int j = 0; j < vec.size(); ++j) {
      sum += vec[j];
    }
    results.push_back(sum); // Add the sum to the results list
  }
  return results;
}
  "}
sourceCpp(code = totalCpp)

which returns the following

> t_list(List)
[[1]]
[1] 6

[[2]]
[1] 15

[[3]]
[1] 24

Is it possible to write this Rcpp code without using two for loops or is there any elegant way to write this code in the Rcpp?

Two for loops seems like the natural way to do this. Why do you want to do it a different way? — Gregor Thomas
– Gregor Thomas, Commented Jul 15 at 17:56
@GregorThomas I just want to know. If my answer is fine then that is okay for me. — Cantor_Set
– Cantor_Set, Commented Jul 15 at 17:57
stackoverflow.com/a/3221813/28479453 and gallery.rcpp.org/articles/parallel-vector-sum might help. — lailaps
– lailaps, Commented Jul 15 at 18:04
This isn't fast, but handy colSums(list2DF(List)). In case, you are running an XY problen. — Friede
– Friede, Commented Jul 15 at 19:00

Ada Lovelace · Accepted Answer · 2025-07-15 23:33:24Z

7

For completeness, here is an approach which (as per discussion in the comments) improves on the previous answer by a) preallocating which is a must and b) skipping an Rcpp feature we do not use here.

Code

#include <Rcpp/Lighter>

using namespace Rcpp;

// [[Rcpp::export(rng=false)]]
List dualloop(List r_list) {
    int n = r_list.size();
    List results(n);
    for (int i = 0; i < n; ++i) {
        NumericVector vec = as<NumericVector>(r_list[i]);
        int sum = 0;
        for (int j = 0; j < vec.size(); ++j) {
            sum += vec[j];
        }
        results[i] = sum; // Add the sum to the results list
    }
    return results;
}

// [[Rcpp::export(rng=false)]]
List rcppsum(List r_list) {
    int n = r_list.size();
    List results(n);
    for (int i = 0; i < n; ++i) {
        NumericVector vec = as<NumericVector>(r_list[i]);
        double s = sum(vec);
        results[i] = s;
    }
    return results;
}

// [[Rcpp::export(rng=false)]]
NumericVector rcppsumvec(List input_list) {
    int n = input_list.size();
    NumericVector results(n);  // Pre-allocate numeric vector
    for (int i = 0; i < n; ++i) {
        NumericVector vec = as<NumericVector>(input_list[i]);
        double s = sum(vec);
        results[i] = s;
    }
    return results;
}

/*** R
set.seed(42)
large_list <- replicate(10000, sample(1:100, 50), simplify = FALSE)

res <- microbenchmark::microbenchmark(dualloop = dualloop(large_list),
                                      lapply = lapply(large_list, sum),
                                      rcppsum = rcppsum(large_list),
                                      rcppsumvec = rcppsumvec(large_list),
                                      times = 100)
print(res)
*/

You can Rcpp::sourceCpp(filename) this from a saved file and it will run the benchmark automagically.

Results

On a standard Linux laptop

> print(res)
Unit: milliseconds
       expr     min      lq    mean  median      uq      max neval cld
   dualloop 3.37030 3.44607 4.49539 3.52045 3.91622 13.18909   100  a 
     lapply 2.79632 2.88682 3.92920 2.95303 3.08097 13.34211   100  a 
    rcppsum 2.57658 2.63277 3.09002 2.67488 2.74510  9.70052   100   b
 rcppsumvec 2.43014 2.49604 2.85826 2.53277 2.68225  9.26517   100   b
>

Chart

answered Jul 15 at 23:33

Ada Lovelace

1,6502 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SamR Jul 16 at 8:24

Can you explain a little more about Rcpp/Lighter? I tend not to bother with it - wondering if I should. Presumably it reduces compile time but doesn't affect runtime? Also - as this function has nothing to do with random numbers - it sounds like you're saying that by default Rcpp resets the rng state every time and that this adds runtime overhead? Which seems related to this answer? So if trying to optimise runtime then usually add [[Rcpp::export(rng=false)]]?

Ada Lovelace Jul 16 at 17:47

I think the Rcpp docs may cover this -- the different header files (full, light, lighter, lightest) activate different C++ components used by Rcpp ... but for most 'normal' uses (just like here) we do not need all and using a lighter one speeds up compilation by a small amount. The way compilation works this has nothing to do with run-time at all. However, the rng=false is a very small gain at runtime turning off preservation of RNG state when we know we will not make RNG calls in the C++ code exposed this way.

M-- · Accepted Answer · 2025-07-15 18:38:44Z

6

{Rcpp} has a built in sum:

library(inline)

builtin_sum <- cxxfunction(
  signature(r_list = "list"), 
  body = '
   List input_list(r_list);
   List results;
   for (int i = 0; i < input_list.size(); ++i) {
     NumericVector vec = as<NumericVector>(input_list[i]);
     double vec_sum = sum(vec);
     results.push_back(vec_sum);
   }
   return results;
 ', 
  plugin = "Rcpp")

This is besides the fact that lapply() works here:

lapply(List, sum)

Then if we want to be more elegant and actually gain some performance, we can pre-allocate the results vector and use direct assignment, instead of push_back.

improved_sum <- cxxfunction(
  signature(r_list = "list"),
  body = '
    List input_list(r_list);
    int n = input_list.size();
    NumericVector results(n);  // Pre-allocate numeric vector
                             
    for (int i = 0; i < n; ++i) {
      NumericVector vec = input_list[i];
      results[i] = sum(vec);  // Direct assignment, no push_back
    }
    return results;
    ', 
  plugin = "Rcpp")

Here's a benchmark:

set.seed(42)
large_list <- replicate(10000, sample(1:100, 50), simplify = FALSE)

microbenchmark::microbenchmark(
  lapply = lapply(large_list, sum),
  two_loops = two_loops(large_list),
  builtin_sum = builtin_sum(large_list),
  improved = improved_sum(large_list),
  times = 100
) -> res

res
ggplot2::autoplot(res) +
  ggplot2::theme_bw()

Unit: milliseconds
       expr      min       lq       mean    median        uq      max neval cld
     lapply   2.4638   2.7633   3.224807   3.04370   3.51925   5.6379   100   a 
  two_loops 265.7754 307.4380 327.912011 320.43895 336.63080 631.5728   100   b
builtin_sum 273.9828 309.8691 328.088739 324.40175 336.75415 608.7544   100   b
   improved   1.5470   1.7755   2.390364   1.89355   2.12300  19.0634   100   a

edited Jul 15 at 18:38

answered Jul 15 at 17:55

M--

33.7k12 gold badges74 silver badges115 bronze badges

8 Comments

Ada Lovelace Jul 15 at 18:45

It is pretty well established that growing Rcpp objects is not efficient and should always be avoided so I would stress the 'improved' approach here as both 'two_loops' and 'builtin_sum' have this issue. Also, if you use Rcpp Attributes instead of the long-obsolete use of lnline you can use // [[Rcpp::export(rng=false)]] after the which I find the 'improved' solution to be even a little faster than lapply -- they both just loop and accumulate a sum.

M-- Jul 15 at 20:56

I see your point, but isn't the improved solution already a little faster than lapply?

Ada Lovelace Jul 16 at 17:49

Yes it does but the main point is that your example, for all its other strength this uses dynamic growth of Rcpp vectors (a very bad idea) as well as inline use when Rcpp Attributes is easier and better allowing to further improves. Which I then showed in my post, But you laid the groundwork well.

M-- Jul 16 at 17:50

Agreed. Again, your point regarding proper setup is valid and I appreciate you sharing it. Cheers.

ThomasIsCoding Jul 15 at 21:10

This is nice benchmarking, +1! But, I guess it depends on the size of values of large_list. When I ran large_list <- replicate(10000, sample(1:1000, 500), simplify = FALSE), I saw that lapply(large_list, sum) outperforms over other candidates.

M-- Jul 15 at 21:33

It makes sense. R's sum() is probably far more optimized than Rcpp's Sugar sum(). The conversion overhead catches up as we process larger vectors.

Ada Lovelace Jul 15 at 23:23

As I said, a properly setup Rcpp solution is slightly faster than lapply, sum) on my (standard Linux) machine.

M-- Jul 16 at 0:44

To be clear, what Thomas is talking about is different from your point as evident from this: i.sstatic.net/LhyKjVgd.png But your point about proper setup is very much valid. Thanks for posting an answer.

Collectives™ on Stack Overflow

Sum the vectors stored in the list using Rcpp

2 Answers 2

Code

Results

Chart

2 Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Code

Results

Chart

2 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related