Skip to content

Commit ef424e2

Browse files
committed
revise docs to reflect the fact that we've still got one CRAN platform (Solaris Sparc) that doesn't support TBB
1 parent 51273c1 commit ef424e2

File tree

2 files changed

+111
-36
lines changed

2 files changed

+111
-36
lines changed

index.Rmd

Lines changed: 62 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,13 @@ output:
1717

1818
RcppParallel provides a complete toolkit for creating portable, high-performance parallel algorithms without requiring direct manipulation of operating system threads. RcppParallel includes:
1919

20-
* [Intel Thread Building Blocks](https://www.threadingbuildingblocks.org/), a C++ library for task parallelism with a wide variety of parallel algorithms and data structures.
20+
* [Intel Thread Building Blocks](https://www.threadingbuildingblocks.org/), a C++ library for task parallelism with a wide variety of parallel algorithms and data structures (Windows, OS X, Linux, and Solaris x86 only).
2121

22-
* `RVector` and `RMatrix` wrapper classes for safe and convenient access to R data structures in a multi-threaded environment.
22+
* [TinyThread](http://tinythreadpp.bitsnbites.eu/), a C++ library for portable use of operating system threads.
2323

24-
* High level functions (`parallelFor` and `parallelReduce`) that provide a straightforward wrapper for the most common parallel algorithms.
24+
* `RVector` and `RMatrix` wrapper classes for safe and convenient access to R data structures in a multi-threaded environment.
2525

26-
In many cases, RcppParallel can achieve significantly better performance than traditional use of threads or even [OpenMP](http://openmp.org/wp/). This is accomplished via dynamic task scheduling that attempts to optimize locality of reference (and therefore cache hit rates) as well as work stealing (detecting idle threads and pushing work to them).
26+
* High level parallel functions (`parallelFor` and `parallelReduce`) that use Intel TBB as a back-end on systems that support it and TinyThread on other platforms.
2727

2828
### Examples
2929

@@ -91,6 +91,8 @@ PKG_LIBS += $(shell "${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" \
9191
-e "RcppParallel::RcppParallelLibs()")
9292
```
9393

94+
Note that the Windows variation (Makevars.win) requires an extra `PKG_CXXFLAGS` entry that enables the use of TBB. This is because TBB is not used by default on Windows (for backward compatibility with a previous version of RcppParallel which lacked support for TBB on Windows).
95+
9496
After you've added the above to the package you can simply include the main RcppParallel package header in source files that need to use it:
9597

9698
```cpp
@@ -144,14 +146,18 @@ IntegerVector transformVector(IntegerVector x) {
144146

145147
#### Locking
146148

147-
When using RcppParallel you typically do not need to worry about explicit locking, as the mechanics of `parallelFor` and `parallelReduce` (explained below) take care of providing safe windows into input and output data that have no possibility of contention. Nevertheless, if for some reason you do need to synchronize access to shared data, you can use one of the following facilities provided by TBB:
149+
When using RcppParallel you typically do not need to worry about explicit locking, as the mechanics of `parallelFor` and `parallelReduce` (explained below) take care of providing safe windows into input and output data that have no possibility of contention. Nevertheless, if for some reason you do need to synchronize access to shared data, you can use the TinyThread locking classes (automatically available via `RcppParallel.h`). See the [TinyThread documentation](http://tinythreadpp.bitsnbites.eu/doc/) for additional details.
150+
151+
The TinyThread locking primitives will work on all platforms. You can alternatively use the synchronization classes provided by TBB:
148152

149153
1. TBB concurrent container classes (see: <https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Containers.htm>).
150154

151155
2. TBB mutual exclusion classes (see: <https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Mutual_Exclusion.htm>)
152156

153157
3. TBB atomic operations (see <https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Atomic_Operations.htm>).
154158

159+
If you use TBB classes directly and want to submit your package to CRAN you should review the [Using TBB] section below and particularly the section on [Portability] to ensure that your package will still compile on platforms that don't support TBB (e.g. Solaris Sparc).
160+
155161
### Algorithms
156162

157163
RcppParallel provides two high level parallel algorithms: `parallelFor` can be used to convert the work of a standard serial "for" loop into a parallel one and `parallelReduce` can be used for accumulating aggregate or other values.
@@ -270,20 +276,6 @@ double parallelVectorSum(NumericVector x) {
270276
}
271277
```
272278

273-
#### Using TBB Directly
274-
275-
RcppParallel provides the `parallelFor` and `parallelReduce` functions however the TBB library includes a wealth of other tools for parallelization, including:
276-
277-
* Advanced algorithms: `parallel_scan`, `parallel_while`, `parallel_do`, `parallel_pipeline`, `parallel_sort`
278-
* Containers: `concurrent_queue`, `concurrent_priority_queue`, `concurrent_vector`, `concurrent_hash_map`
279-
* Mutual exclusion: `mutex`, `spin_mutex`, `queuing_mutex`, `spin_rw_mutex`, `queuing_rw_mutex`, `recursive_mutex`
280-
* Atomic operations: `fetch_and_add`, `fetch_and_increment`, `fetch_and_decrement`, `compare_and_swap`, `fetch_and_store`
281-
* Timing: portable fine grained global time stamp
282-
* Task Scheduler: direct access to control the creation and activation of tasks
283-
284-
See the [Intel TBB User Guide](https://software.intel.com/en-us/node/506045) for documentation on using these features.
285-
286-
287279
### Tuning
288280

289281
There are several settings available for tuning the behavior of parallel algorithms. These settings as well as benchmarking techniques are covered below.
@@ -343,6 +335,57 @@ res[,1:4]
343335
```
344336

345337

338+
### Using TBB
339+
340+
RcppParallel provides the `parallelFor` and `parallelReduce` functions however the TBB library includes a wealth of other tools for parallelization. The motivation for `parallelFor` and `parallelReduce` is portability: you can write a single algorithm that uses TBB on Windows, OS X, Linux, and Solaris x86 but falls back to a lower-performance implementation based on TinyThread on other platforms.
341+
342+
If however you are okay with targeting only the supported platforms you can use TBB directly and bypass `parallelFor` and `parallelReduce`. Note that if you are doing this within an R package you plan on submitting to CRAN you should also provide a fallback serial implementation so the package still compiles on platforms that don't currently support TBB (e.g. Solaris Sparc). Details on doing this are in the *Portability* section below.
343+
344+
#### TBB APIs
345+
346+
TBB includes a wide variety of tools for parallel programming, including:
347+
348+
* Advanced algorithms: `parallel_scan`, `parallel_while`, `parallel_do`, `parallel_pipeline`, `parallel_sort`
349+
* Containers: `concurrent_queue`, `concurrent_priority_queue`, `concurrent_vector`, `concurrent_hash_map`
350+
* Mutual exclusion: `mutex`, `spin_mutex`, `queuing_mutex`, `spin_rw_mutex`, `queuing_rw_mutex`, `recursive_mutex`
351+
* Atomic operations: `fetch_and_add`, `fetch_and_increment`, `fetch_and_decrement`, `compare_and_swap`, `fetch_and_store`
352+
* Timing: portable fine grained global time stamp
353+
* Task Scheduler: direct access to control the creation and activation of tasks
354+
355+
See the [Intel TBB User Guide](https://software.intel.com/en-us/node/506045) for documentation on using these features.
356+
357+
#### Portability
358+
359+
When using TBB directly in a CRAN package you should check the value of the `RCPP_PARALLEL_USE_TBB` macro and conditionally include a serial implementation of your algorithm if it's not `TRUE`. Note that this macro is defined in `RcppParallel.h` so you should include this in all cases (it will in turn automatically include `<tbb/tbb.h>` on platforms where it's supported). For example, your source file might look like this:
360+
361+
```cpp
362+
#include <RcppParallel.h>
363+
364+
#if RCPP_PARALLEL_USE_TBB
365+
366+
IntegerVector transformData(IntegerVector x) {
367+
368+
// Implement by calling TBB APIs directly
369+
370+
}
371+
372+
#else
373+
374+
IntegerVector transformData(IntegerVector x) {
375+
376+
// Implement serially
377+
378+
}
379+
380+
#endif
381+
```
382+
383+
Note that the two functions have the same name (only one will be compiled and linked based on whether the target platform supports TBB).
384+
385+
386+
387+
388+
346389

347390

348391

index.html

Lines changed: 49 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -162,11 +162,11 @@ <h3>Overview</h3>
162162
<p><img src="images/RcppParallelLogo.png" width="643" height="90"></img></p>
163163
<p>RcppParallel provides a complete toolkit for creating portable, high-performance parallel algorithms without requiring direct manipulation of operating system threads. RcppParallel includes:</p>
164164
<ul>
165-
<li><p><a href="https://www.threadingbuildingblocks.org/">Intel Thread Building Blocks</a>, a C++ library for task parallelism with a wide variety of parallel algorithms and data structures.</p></li>
165+
<li><p><a href="https://www.threadingbuildingblocks.org/">Intel Thread Building Blocks</a>, a C++ library for task parallelism with a wide variety of parallel algorithms and data structures (Windows, OS X, Linux, and Solaris x86 only).</p></li>
166+
<li><p><a href="http://tinythreadpp.bitsnbites.eu/">TinyThread</a>, a C++ library for portable use of operating system threads.</p></li>
166167
<li><p><code>RVector</code> and <code>RMatrix</code> wrapper classes for safe and convenient access to R data structures in a multi-threaded environment.</p></li>
167-
<li><p>High level functions (<code>parallelFor</code> and <code>parallelReduce</code>) that provide a straightforward wrapper for the most common parallel algorithms.</p></li>
168+
<li><p>High level parallel functions (<code>parallelFor</code> and <code>parallelReduce</code>) that use Intel TBB as a back-end on systems that support it and TinyThread on other platforms.</p></li>
168169
</ul>
169-
<p>In many cases, RcppParallel can achieve significantly better performance than traditional use of threads or even <a href="http://openmp.org/wp/">OpenMP</a>. This is accomplished via dynamic task scheduling that attempts to optimize locality of reference (and therefore cache hit rates) as well as work stealing (detecting idle threads and pushing work to them).</p>
170170
</div>
171171
<div id="examples" class="section level3">
172172
<h3>Examples</h3>
@@ -204,6 +204,7 @@ <h4>R Packages</h4>
204204

205205
PKG_LIBS += $(shell &quot;${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe&quot; \
206206
-e &quot;RcppParallel::RcppParallelLibs()&quot;)</code></pre>
207+
<p>Note that the Windows variation (Makevars.win) requires an extra <code>PKG_CXXFLAGS</code> entry that enables the use of TBB. This is because TBB is not used by default on Windows (for backward compatibility with a previous version of RcppParallel which lacked support for TBB on Windows).</p>
207208
<p>After you’ve added the above to the package you can simply include the main RcppParallel package header in source files that need to use it:</p>
208209
<pre class="cpp"><code>#include &lt;RcppParallel.h&gt;</code></pre>
209210
</div>
@@ -246,12 +247,14 @@ <h4>Safe Accessors</h4>
246247
</div>
247248
<div id="locking" class="section level4">
248249
<h4>Locking</h4>
249-
<p>When using RcppParallel you typically do not need to worry about explicit locking, as the mechanics of <code>parallelFor</code> and <code>parallelReduce</code> (explained below) take care of providing safe windows into input and output data that have no possibility of contention. Nevertheless, if for some reason you do need to synchronize access to shared data, you can use one of the following facilities provided by TBB:</p>
250+
<p>When using RcppParallel you typically do not need to worry about explicit locking, as the mechanics of <code>parallelFor</code> and <code>parallelReduce</code> (explained below) take care of providing safe windows into input and output data that have no possibility of contention. Nevertheless, if for some reason you do need to synchronize access to shared data, you can use the TinyThread locking classes (automatically available via <code>RcppParallel.h</code>). See the <a href="http://tinythreadpp.bitsnbites.eu/doc/">TinyThread documentation</a> for additional details.</p>
251+
<p>The TinyThread locking primitives will work on all platforms. You can alternatively use the synchronization classes provided by TBB:</p>
250252
<ol style="list-style-type: decimal">
251253
<li><p>TBB concurrent container classes (see: <a href="https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Containers.htm" class="uri">https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Containers.htm</a>).</p></li>
252254
<li><p>TBB mutual exclusion classes (see: <a href="https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Mutual_Exclusion.htm" class="uri">https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Mutual_Exclusion.htm</a>)</p></li>
253255
<li><p>TBB atomic operations (see <a href="https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Atomic_Operations.htm" class="uri">https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Atomic_Operations.htm</a>).</p></li>
254256
</ol>
257+
<p>If you use TBB classes directly and want to submit your package to CRAN you should review the <a href="#using-tbb">Using TBB</a> section below and particularly the section on <a href="#portability">Portability</a> to ensure that your package will still compile on platforms that don’t support TBB (e.g. Solaris Sparc).</p>
255258
</div>
256259
</div>
257260
<div id="algorithms" class="section level3">
@@ -353,19 +356,6 @@ <h4>parallelReduce</h4>
353356
return sum.value;
354357
}</code></pre>
355358
</div>
356-
<div id="using-tbb-directly" class="section level4">
357-
<h4>Using TBB Directly</h4>
358-
<p>RcppParallel provides the <code>parallelFor</code> and <code>parallelReduce</code> functions however the TBB library includes a wealth of other tools for parallelization, including:</p>
359-
<ul>
360-
<li>Advanced algorithms: <code>parallel_scan</code>, <code>parallel_while</code>, <code>parallel_do</code>, <code>parallel_pipeline</code>, <code>parallel_sort</code></li>
361-
<li>Containers: <code>concurrent_queue</code>, <code>concurrent_priority_queue</code>, <code>concurrent_vector</code>, <code>concurrent_hash_map</code></li>
362-
<li>Mutual exclusion: <code>mutex</code>, <code>spin_mutex</code>, <code>queuing_mutex</code>, <code>spin_rw_mutex</code>, <code>queuing_rw_mutex</code>, <code>recursive_mutex</code></li>
363-
<li>Atomic operations: <code>fetch_and_add</code>, <code>fetch_and_increment</code>, <code>fetch_and_decrement</code>, <code>compare_and_swap</code>, <code>fetch_and_store</code></li>
364-
<li>Timing: portable fine grained global time stamp</li>
365-
<li>Task Scheduler: direct access to control the creation and activation of tasks</li>
366-
</ul>
367-
<p>See the <a href="https://software.intel.com/en-us/node/506045">Intel TBB User Guide</a> for documentation on using these features.</p>
368-
</div>
369359
</div>
370360
<div id="tuning" class="section level3">
371361
<h3>Tuning</h3>
@@ -406,6 +396,48 @@ <h4>Benchmarking</h4>
406396
1 matrixSqrt(m) 100 0.755 2.568</code></pre>
407397
</div>
408398
</div>
399+
<div id="using-tbb" class="section level3">
400+
<h3>Using TBB</h3>
401+
<p>RcppParallel provides the <code>parallelFor</code> and <code>parallelReduce</code> functions however the TBB library includes a wealth of other tools for parallelization. The motivation for <code>parallelFor</code> and <code>parallelReduce</code> is portability: you can write a single algorithm that uses TBB on Windows, OS X, Linux, and Solaris x86 but falls back to a lower-performance implementation based on TinyThread on other platforms.</p>
402+
<p>If however you are okay with targeting only the supported platforms you can use TBB directly and bypass <code>parallelFor</code> and <code>parallelReduce</code>. Note that if you are doing this within an R package you plan on submitting to CRAN you should also provide a fallback serial implementation so the package still compiles on platforms that don’t currently support TBB (e.g. Solaris Sparc). Details on doing this are in the <em>Portability</em> section below.</p>
403+
<div id="tbb-apis" class="section level4">
404+
<h4>TBB APIs</h4>
405+
<p>TBB includes a wide variety of tools for parallel programming, including:</p>
406+
<ul>
407+
<li>Advanced algorithms: <code>parallel_scan</code>, <code>parallel_while</code>, <code>parallel_do</code>, <code>parallel_pipeline</code>, <code>parallel_sort</code></li>
408+
<li>Containers: <code>concurrent_queue</code>, <code>concurrent_priority_queue</code>, <code>concurrent_vector</code>, <code>concurrent_hash_map</code></li>
409+
<li>Mutual exclusion: <code>mutex</code>, <code>spin_mutex</code>, <code>queuing_mutex</code>, <code>spin_rw_mutex</code>, <code>queuing_rw_mutex</code>, <code>recursive_mutex</code></li>
410+
<li>Atomic operations: <code>fetch_and_add</code>, <code>fetch_and_increment</code>, <code>fetch_and_decrement</code>, <code>compare_and_swap</code>, <code>fetch_and_store</code></li>
411+
<li>Timing: portable fine grained global time stamp</li>
412+
<li>Task Scheduler: direct access to control the creation and activation of tasks</li>
413+
</ul>
414+
<p>See the <a href="https://software.intel.com/en-us/node/506045">Intel TBB User Guide</a> for documentation on using these features.</p>
415+
</div>
416+
<div id="portability" class="section level4">
417+
<h4>Portability</h4>
418+
<p>When using TBB directly in a CRAN package you should check the value of the <code>RCPP_PARALLEL_USE_TBB</code> macro and conditionally include a serial implementation of your algorithm if it’s not <code>TRUE</code>. Note that this macro is defined in <code>RcppParallel.h</code> so you should include this in all cases (it will in turn automatically include <code>&lt;tbb/tbb.h&gt;</code> on platforms where it’s supported). For example, your source file might look like this:</p>
419+
<pre class="cpp"><code>#include &lt;RcppParallel.h&gt;
420+
421+
#if RCPP_PARALLEL_USE_TBB
422+
423+
IntegerVector transformData(IntegerVector x) {
424+
425+
// Implement by calling TBB APIs directly
426+
427+
}
428+
429+
#else
430+
431+
IntegerVector transformData(IntegerVector x) {
432+
433+
// Implement serially
434+
435+
}
436+
437+
#endif</code></pre>
438+
<p>Note that the two functions have the same name (only one will be compiled and linked based on whether the target platform supports TBB).</p>
439+
</div>
440+
</div>
409441

410442

411443
</div> <!--span-9-->

0 commit comments

Comments
 (0)