You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: index.Rmd
+62-19Lines changed: 62 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -17,13 +17,13 @@ output:
17
17
18
18
RcppParallel provides a complete toolkit for creating portable, high-performance parallel algorithms without requiring direct manipulation of operating system threads. RcppParallel includes:
19
19
20
-
*[Intel Thread Building Blocks](https://www.threadingbuildingblocks.org/), a C++ library for task parallelism with a wide variety of parallel algorithms and data structures.
20
+
*[Intel Thread Building Blocks](https://www.threadingbuildingblocks.org/), a C++ library for task parallelism with a wide variety of parallel algorithms and data structures (Windows, OS X, Linux, and Solaris x86 only).
21
21
22
-
*`RVector` and `RMatrix` wrapper classes for safe and convenient access to R data structures in a multi-threaded environment.
22
+
*[TinyThread](http://tinythreadpp.bitsnbites.eu/), a C++ library for portable use of operating system threads.
23
23
24
-
*High level functions (`parallelFor` and `parallelReduce`) that provide a straightforward wrapper for the most common parallel algorithms.
24
+
*`RVector` and `RMatrix` wrapper classes for safe and convenient access to R data structures in a multi-threaded environment.
25
25
26
-
In many cases, RcppParallel can achieve significantly better performance than traditional use of threads or even [OpenMP](http://openmp.org/wp/). This is accomplished via dynamic task scheduling that attempts to optimize locality of reference (and therefore cache hit rates) as well as work stealing (detecting idle threads and pushing work to them).
26
+
* High level parallel functions (`parallelFor` and `parallelReduce`) that use Intel TBB as a back-end on systems that support it and TinyThread on other platforms.
Note that the Windows variation (Makevars.win) requires an extra `PKG_CXXFLAGS` entry that enables the use of TBB. This is because TBB is not used by default on Windows (for backward compatibility with a previous version of RcppParallel which lacked support for TBB on Windows).
95
+
94
96
After you've added the above to the package you can simply include the main RcppParallel package header in source files that need to use it:
When using RcppParallel you typically do not need to worry about explicit locking, as the mechanics of `parallelFor` and `parallelReduce` (explained below) take care of providing safe windows into input and output data that have no possibility of contention. Nevertheless, if for some reason you do need to synchronize access to shared data, you can use one of the following facilities provided by TBB:
149
+
When using RcppParallel you typically do not need to worry about explicit locking, as the mechanics of `parallelFor` and `parallelReduce` (explained below) take care of providing safe windows into input and output data that have no possibility of contention. Nevertheless, if for some reason you do need to synchronize access to shared data, you can use the TinyThread locking classes (automatically available via `RcppParallel.h`). See the [TinyThread documentation](http://tinythreadpp.bitsnbites.eu/doc/) for additional details.
150
+
151
+
The TinyThread locking primitives will work on all platforms. You can alternatively use the synchronization classes provided by TBB:
3. TBB atomic operations (see <https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Atomic_Operations.htm>).
154
158
159
+
If you use TBB classes directly and want to submit your package to CRAN you should review the [Using TBB] section below and particularly the section on [Portability] to ensure that your package will still compile on platforms that don't support TBB (e.g. Solaris Sparc).
160
+
155
161
### Algorithms
156
162
157
163
RcppParallel provides two high level parallel algorithms: `parallelFor` can be used to convert the work of a standard serial "for" loop into a parallel one and `parallelReduce` can be used for accumulating aggregate or other values.
RcppParallel provides the `parallelFor` and `parallelReduce` functions however the TBB library includes a wealth of other tools for parallelization, including:
* Task Scheduler: direct access to control the creation and activation of tasks
283
-
284
-
See the [Intel TBB User Guide](https://software.intel.com/en-us/node/506045) for documentation on using these features.
285
-
286
-
287
279
### Tuning
288
280
289
281
There are several settings available for tuning the behavior of parallel algorithms. These settings as well as benchmarking techniques are covered below.
@@ -343,6 +335,57 @@ res[,1:4]
343
335
```
344
336
345
337
338
+
### Using TBB
339
+
340
+
RcppParallel provides the `parallelFor` and `parallelReduce` functions however the TBB library includes a wealth of other tools for parallelization. The motivation for `parallelFor` and `parallelReduce` is portability: you can write a single algorithm that uses TBB on Windows, OS X, Linux, and Solaris x86 but falls back to a lower-performance implementation based on TinyThread on other platforms.
341
+
342
+
If however you are okay with targeting only the supported platforms you can use TBB directly and bypass `parallelFor` and `parallelReduce`. Note that if you are doing this within an R package you plan on submitting to CRAN you should also provide a fallback serial implementation so the package still compiles on platforms that don't currently support TBB (e.g. Solaris Sparc). Details on doing this are in the *Portability* section below.
343
+
344
+
#### TBB APIs
345
+
346
+
TBB includes a wide variety of tools for parallel programming, including:
* Task Scheduler: direct access to control the creation and activation of tasks
354
+
355
+
See the [Intel TBB User Guide](https://software.intel.com/en-us/node/506045) for documentation on using these features.
356
+
357
+
#### Portability
358
+
359
+
When using TBB directly in a CRAN package you should check the value of the `RCPP_PARALLEL_USE_TBB` macro and conditionally include a serial implementation of your algorithm if it's not `TRUE`. Note that this macro is defined in `RcppParallel.h` so you should include this in all cases (it will in turn automatically include `<tbb/tbb.h>` on platforms where it's supported). For example, your source file might look like this:
360
+
361
+
```cpp
362
+
#include<RcppParallel.h>
363
+
364
+
#if RCPP_PARALLEL_USE_TBB
365
+
366
+
IntegerVector transformData(IntegerVector x) {
367
+
368
+
// Implement by calling TBB APIs directly
369
+
370
+
}
371
+
372
+
#else
373
+
374
+
IntegerVector transformData(IntegerVector x) {
375
+
376
+
// Implement serially
377
+
378
+
}
379
+
380
+
#endif
381
+
```
382
+
383
+
Note that the two functions have the same name (only one will be compiled and linked based on whether the target platform supports TBB).
<p>RcppParallel provides a complete toolkit for creating portable, high-performance parallel algorithms without requiring direct manipulation of operating system threads. RcppParallel includes:</p>
164
164
<ul>
165
-
<li><p><ahref="https://www.threadingbuildingblocks.org/">Intel Thread Building Blocks</a>, a C++ library for task parallelism with a wide variety of parallel algorithms and data structures.</p></li>
165
+
<li><p><ahref="https://www.threadingbuildingblocks.org/">Intel Thread Building Blocks</a>, a C++ library for task parallelism with a wide variety of parallel algorithms and data structures (Windows, OS X, Linux, and Solaris x86 only).</p></li>
166
+
<li><p><ahref="http://tinythreadpp.bitsnbites.eu/">TinyThread</a>, a C++ library for portable use of operating system threads.</p></li>
166
167
<li><p><code>RVector</code> and <code>RMatrix</code> wrapper classes for safe and convenient access to R data structures in a multi-threaded environment.</p></li>
167
-
<li><p>High level functions (<code>parallelFor</code> and <code>parallelReduce</code>) that provide a straightforward wrapper for the most common parallel algorithms.</p></li>
168
+
<li><p>High level parallel functions (<code>parallelFor</code> and <code>parallelReduce</code>) that use Intel TBB as a back-end on systems that support it and TinyThread on other platforms.</p></li>
168
169
</ul>
169
-
<p>In many cases, RcppParallel can achieve significantly better performance than traditional use of threads or even <ahref="http://openmp.org/wp/">OpenMP</a>. This is accomplished via dynamic task scheduling that attempts to optimize locality of reference (and therefore cache hit rates) as well as work stealing (detecting idle threads and pushing work to them).</p>
<p>Note that the Windows variation (Makevars.win) requires an extra <code>PKG_CXXFLAGS</code> entry that enables the use of TBB. This is because TBB is not used by default on Windows (for backward compatibility with a previous version of RcppParallel which lacked support for TBB on Windows).</p>
207
208
<p>After you’ve added the above to the package you can simply include the main RcppParallel package header in source files that need to use it:</p>
<p>When using RcppParallel you typically do not need to worry about explicit locking, as the mechanics of <code>parallelFor</code> and <code>parallelReduce</code> (explained below) take care of providing safe windows into input and output data that have no possibility of contention. Nevertheless, if for some reason you do need to synchronize access to shared data, you can use one of the following facilities provided by TBB:</p>
250
+
<p>When using RcppParallel you typically do not need to worry about explicit locking, as the mechanics of <code>parallelFor</code> and <code>parallelReduce</code> (explained below) take care of providing safe windows into input and output data that have no possibility of contention. Nevertheless, if for some reason you do need to synchronize access to shared data, you can use the TinyThread locking classes (automatically available via <code>RcppParallel.h</code>). See the <ahref="http://tinythreadpp.bitsnbites.eu/doc/">TinyThread documentation</a> for additional details.</p>
251
+
<p>The TinyThread locking primitives will work on all platforms. You can alternatively use the synchronization classes provided by TBB:</p>
<li><p>TBB atomic operations (see <ahref="https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Atomic_Operations.htm" class="uri">https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Atomic_Operations.htm</a>).</p></li>
254
256
</ol>
257
+
<p>If you use TBB classes directly and want to submit your package to CRAN you should review the <ahref="#using-tbb">Using TBB</a> section below and particularly the section on <ahref="#portability">Portability</a> to ensure that your package will still compile on platforms that don’t support TBB (e.g. Solaris Sparc).</p>
<p>RcppParallel provides the <code>parallelFor</code> and <code>parallelReduce</code> functions however the TBB library includes a wealth of other tools for parallelization, including:</p>
<li>Timing: portable fine grained global time stamp</li>
365
-
<li>Task Scheduler: direct access to control the creation and activation of tasks</li>
366
-
</ul>
367
-
<p>See the <ahref="https://software.intel.com/en-us/node/506045">Intel TBB User Guide</a> for documentation on using these features.</p>
368
-
</div>
369
359
</div>
370
360
<divid="tuning" class="section level3">
371
361
<h3>Tuning</h3>
@@ -406,6 +396,48 @@ <h4>Benchmarking</h4>
406
396
1 matrixSqrt(m) 100 0.755 2.568</code></pre>
407
397
</div>
408
398
</div>
399
+
<divid="using-tbb" class="section level3">
400
+
<h3>Using TBB</h3>
401
+
<p>RcppParallel provides the <code>parallelFor</code> and <code>parallelReduce</code> functions however the TBB library includes a wealth of other tools for parallelization. The motivation for <code>parallelFor</code> and <code>parallelReduce</code> is portability: you can write a single algorithm that uses TBB on Windows, OS X, Linux, and Solaris x86 but falls back to a lower-performance implementation based on TinyThread on other platforms.</p>
402
+
<p>If however you are okay with targeting only the supported platforms you can use TBB directly and bypass <code>parallelFor</code> and <code>parallelReduce</code>. Note that if you are doing this within an R package you plan on submitting to CRAN you should also provide a fallback serial implementation so the package still compiles on platforms that don’t currently support TBB (e.g. Solaris Sparc). Details on doing this are in the <em>Portability</em> section below.</p>
403
+
<divid="tbb-apis" class="section level4">
404
+
<h4>TBB APIs</h4>
405
+
<p>TBB includes a wide variety of tools for parallel programming, including:</p>
<li>Timing: portable fine grained global time stamp</li>
412
+
<li>Task Scheduler: direct access to control the creation and activation of tasks</li>
413
+
</ul>
414
+
<p>See the <ahref="https://software.intel.com/en-us/node/506045">Intel TBB User Guide</a> for documentation on using these features.</p>
415
+
</div>
416
+
<divid="portability" class="section level4">
417
+
<h4>Portability</h4>
418
+
<p>When using TBB directly in a CRAN package you should check the value of the <code>RCPP_PARALLEL_USE_TBB</code> macro and conditionally include a serial implementation of your algorithm if it’s not <code>TRUE</code>. Note that this macro is defined in <code>RcppParallel.h</code> so you should include this in all cases (it will in turn automatically include <code><tbb/tbb.h></code> on platforms where it’s supported). For example, your source file might look like this:</p>
0 commit comments